Configuring Analytics Engine

You can configure IBM Analytics Engine instance to connect to the IBM® watsonx.data instance by setting watsonx.data configurations and Spark related configuration as the default configuration for the IBM Analytics Engine instance.

You can configure Analytics Engine instance with default settings in one of the following ways:

Configure by using the IBM Cloud® console.
Configure by using the Analytics Engine API.
Configure by using the Analytics Engine CLI.

Prerequisites

Ensure you have the following instances ready:

IBM® watsonx.data instance.
IBM Analytics Engine instance.

Fetch the following information from IBM® watsonx.data:

MDS URL from watsonx.data.For more information on getting the MDS credentials, see Getting Metadata Service (MDS) Credentials.
MDS Credentials from watsonx.data. For more information on getting the MDS credentials, see Getting Metadata Service (MDS) Credentials.

Configuring Analytics Engine instance by using IBM Cloud® console

To configure your Analytics Engine instance from the IBM Cloud® Resource list, complete the following steps:

Log in to your IBM Cloud® account.
Access the IBM Cloud® Resource list.
Search your Analytics Engine instance and click the instance to see the details.
Click Manage > Configuration to view the configuration.
In the Default Spark configuration section, click Edit.

Add the following configuration to the Default Spark configuration section.

spark.sql.catalogImplementation = hive
spark.sql.extensions = org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.iceberg.vectorization.enabled = false
spark.sql.catalog.lakehouse = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.lakehouse.type = hive
spark.sql.catalog.lakehouse.uri = <mds-thrift-endpoint-from-watsonx.data> for example (thrift://81823aaf-8a88-4bee-a0a1-6e76a42dc833.cfjag3sf0s5o87astjo0.databases.appdomain.cloud:32683)
spark.hive.metastore.client.auth.mode = PLAIN
spark.hive.metastore.client.plain.username = <mds-user-from-watsonx.data> (for example, ibmlhapikey)
spark.hive.metastore.client.plain.password = <mds-password-from-watsonx.data>
spark.hive.metastore.use.SSL = true
spark.hive.metastore.truststore.type = JKS
spark.hive.metastore.truststore.path = file:///opt/ibm/jdk/lib/security/cacerts
spark.hive.metastore.truststore.password = changeit

Parameter value:

mds-thrift-endpoint-from-watsonx.Data: Specify the credentials for watsonx.data.
mds-user-from-watsonx.Data: The watsonx.data username.
mds-password-from-watsonx.Data: The watsonx.data password.

Configuring Analytics Engine instance by using Analytics Engine API

To configure your IBM Analytics Engine instance from the Analytics Engine API, complete the following steps:

Generate an IAM token to connect to the IBM Analytics Engine API. For more information about how to generate an IAM token, see IAM token.
Run the following API command to invoke the Analytics Engine API by using the generated IAM token.

curl -X PATCH --location --header "Authorization: Bearer {IAM_TOKEN}" --header "Accept: application/json" --header "Content-Type: application/merge-patch+json" --data '{
<CONFIGURATION_DETAILS>
}' "{BASE_URL}/v3/analytics_engines/{INSTANCE_ID/default_configs"

Parameter value:

IAM_TOKEN: Specify the API token generated for the Analytics Engine API.

CONFIGURATION_DETAILS: Copy and paste the following command:

{
"spark.sql.catalogImplementation": "hive",
"spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"spark.sql.iceberg.vectorization.enabled": "false",
"spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.lakehouse.type": "hive",
"spark.sql.catalog.lakehouse.uri": "<mds-thrift-endpoint-from-watsonx.data> for example (thrift://81823aaf-8a88-4bee-a0a1-6e76a42dc833.cfjag3sf0s5o87astjo0.databases.appdomain.cloud:32683) ",
"spark.hive.metastore.client.auth.mode": "PLAIN",
"spark.hive.metastore.client.plain.username": "<mds-user-from-watsonx.data> (for example, ibmlhapikey)",
"spark.hive.metastore.client.plain.password": "<mds-password-from-watsonx.data>",
"spark.hive.metastore.use.SSL": "true",
"spark.hive.metastore.truststore.type": "JKS",
"spark.hive.metastore.truststore.path": "file:///opt/ibm/jdk/lib/security/cacerts",
"spark.hive.metastore.truststore.password": "changeit"
}

BASE_URL: The Analytics Engine URL for the region where you provisioned the instance. For example, api.region.ae.ibmcloud.com.
INSTANCE_ID: The Analytics Engine instance ID. For more information about how to retrieve an instance ID, see Obtaining the service endpoints.
mds-thrift-endpoint-from-watsonx.data: Specify the credentials for watsonx.data.
mds-user-from-watsonx.data: The watsonx.data username.
mds-password-from-watsonx.data: The watsonx.data password.

Configuring Analytics Engine instance by using Analytics Engine CLI

To specify the configuration settings for your IBM Analytics Engine instance from CLI, complete the following steps:

Run the following command :

ibmcloud analytics-engine-v3 instance default-configs-update [--id INSTANCE_ID] --body BODY

Parameter value:

BODY: Copy and paste the following configuration information:

{
"spark.sql.catalogImplementation": "hive",
"spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"spark.sql.iceberg.vectorization.enabled": "false",
"spark.sql.catalog.lakehouse": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.lakehouse.type": "hive",
"spark.sql.catalog.lakehouse.uri": "<mds-thrift-endpoint-from-watsonx.data> for example (thrift://81823aaf-8a88-4bee-a0a1-6e76a42dc833.cfjag3sf0s5o87astjo0.databases.appdomain.cloud:32683) ",
"spark.hive.metastore.client.auth.mode": "PLAIN",
"spark.hive.metastore.client.plain.username": "<mds-user-from-watsonx.data> (for example, ibmlhapikey)",
"spark.hive.metastore.client.plain.password": "<mds-password-from-watsonx.data>",
"spark.hive.metastore.use.SSL": "true",
"spark.hive.metastore.truststore.type": "JKS",
"spark.hive.metastore.truststore.path": "file:///opt/ibm/jdk/lib/security/cacerts",
"spark.hive.metastore.truststore.password": "changeit"
}

INSTANCE_ID: The Analytics Engine instance ID. For more information about how to retrieve an instance ID, see Obtaining the service endpoints
mds-thrift-endpoint-from-watsonx.data: Specify the credentials for watsonx.data. For more information on getting the MDS credentials, see Getting Metadata Service (MDS) Credentials.
mds-user-from-watsonx.data: The watsonx.data username. For more information on getting the MDS credentials, see Getting Metadata Service (MDS) Credentials.
mds-password-from-watsonx.data: The watsonx.data password. For more information on getting the MDS credentials, see Getting Metadata Service (MDS) Credentials.

To view logs of Spark application ran on IBM Analytics Engine you have to enable logging. For more information, see Configuring and viewing logs.