IBM Cloud Docs
Provisioning Analytics Engine

Provisioning Analytics Engine

For watsonx.data, it is recommended to use IBM Analytics Engine Spark to achieve below use-cases:

  1. Ingesting large volumes of data into watsonx.data tables. You can also cleanse and transform data before ingestion.
  2. Table maintenance operation to enhance watsonx.data performance of the table
  3. Complex analytics workload which are difficult to represent as queries.

You can create an IBM Analytics Engine instance:

You must have access to either the IBM Cloud® us-south (Dallas) or the eu-de (Frankfurt) region. When you add a region for provisioning an Analytics Engine instance, choose one that is closer to the region where you have provisioned watsonx.data to avoid data latency issues.

Creating a service instance from the IBM Cloud® console

You can create an instance by using the IBM Cloud® console.

To create an IBM Analytics Engine instance:

  1. Log in to the IBM Cloud® console.

  2. Click Services and select the category Analytics.

  3. Search for Analytics Engine and then click the tile to open the service instance creation page.

  4. Choose the location that is closer to the region where you have provisioned watsonx.data for deploying the service instance. Currently, us-south and eu-de are the only supported regions.

  5. Select a plan. Currently, Standard Serverless for Apache Spark is the only supported serverless plan.

  6. Configure the instance by entering a name of your choice, selecting a resource group and adding tags.

  7. Select latest runtime version available (for example 3.3).

  8. Select the IBM Cloud Object Storage instance from your account that you want to use as the Analytics Engine instance home to store instance-related data.

  9. Click Create to provision the service instance in the background.

    The newly created service is listed in your IBM Cloud® resource list under Services.

Creating a service instance by using the IBM Cloud® command-line interface

To create a service instance by using the IBM Cloud® command-line interface:

  1. Download and configure the IBM Cloud® CLI. Follow the instructions in Getting started with the IBM Cloud® CLI.

  2. Set the API endpoint for your region and log in:

    ibmcloud api https://DOMAIN_NAME
    ibmcloud login
    

    Parameter value:

    • DOMAIN_NAME: The API endpoint for your region. For example, cloud.ibm.com
  3. Get the list of the resource groups for your account and select one of the returned resource groups as the target resource group in which to create the IBM Analytics Engine serverless instance:

    ibmcloud resource groups
    ibmcloud target -g <RESOURCE_GROUP_NAME>
    

    Parameter value:

    • RESOURCE_GROUP_NAME: Provide the same name as you specified while provisioning watsonx.data for efficient organizing.
  4. Create a service instance:

    ibmcloud resource service-instance-create <SERVICE_INSTANCE_NAME> ibmanalyticsengine <PLAN_NAME> <REGION> -p @<PATH_to JSON file with cluster parameters>
    

    Parameter value:

    • SERVICE_INSTANCE_NAME: Specify a name for the instance.
    • PLAN_NAME: Specify the plan name as plan_name8afde05e-5fd8-4359-a597-946d8432dd45.
    • REGION: Specify the region where you like to provision the instance.

    Note that currently, standard-serverless-spark is the only supported serverless plan and us-south and eu-de the only supported regions. Choose one that is closer to the region where you have provisioned watsonx.data.

    • PATH_to JSON file: Include the path to the JSON file that contains the provisioning parameters.

    For example, for the Dallas region:

    ibmcloud resource service-instance-create MyServiceInstance ibmanalyticsengine standard-serverless-spark us-south -p @provision.json
    

    You can give the service instance any name you choose. Note that currently, standard-serverless-spark is the only supported serverless plan and us-south and eu-de the only supported regions.

    The provision.json file contains the provisioning parameters for the instance that you want to create.

    The endpoint to your IBM Cloud® Object Storage instance in the payload JSON file must be the direct endpoint. Direct endpoints provide better performance than public endpoints and do not incur charges for any outgoing or incoming bandwidth.

    Following is a sample provision.json file.

    {
      "default_runtime": {
        "spark_version": "3.3"
        },
      "instance_home": {
        "region": "us-south",
        "endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud",
        "hmac_access_key": "<your-hmac-access-key",
        "hmac_secret_key": "<your-hmac-secret-key"
        },
      "default_config": {
        "key1": "value1",
        "key2": "value2"
        }
    }
    

    The IBM Cloud® response to the create instance command:

    Creating service instance MyServiceInstance in resource group Default of account <your account name> as <your user name>...
    OK
    Service instance MyServiceInstance was created.
    
    Name:                MyServiceInstance
    ID:                  crn:v1:staging:public:ibmanalyticsengine:us-south:a/d628eae2cc7e4373bb0c9d2229f2ece5:1e32e***-afd9-483a-b1**-724ba5cf4***::
    GUID:                1e32e***-afd9-483a-b1**-724ba5cf4***
    Location:            us-south
    State:               provisioning
    Type:                service_instance
    Sub Type:
    Service Endpoints:   public
    Allow Cleanup:       false
    Locked:              false
    Created at:          2021-11-29T07:20:40Z
    Updated at:          2021-11-29T07:20:42Z
    Last Operation:
                        Status    create in progress
                        Message   Started create instance operation
    

    The sample response to the create instance command is:

    Creating service instance MyServiceInstance in resource group Default of account <your account name> as <your user name>...
    OK
    Service instance MyServiceInstance was created.
    
    Name:                MyServiceInstance
    ID:                  crn:v1:staging:public:ibmanalyticsengine:us-south:a/d628eae2cc7e4373bb0c9d2229f2ece5:1e32e***-afd9-483a-b1**-724ba5cf4***::
    GUID:                1e32e***-afd9-483a-b1**-724ba5cf4***
    Location:            us-south
    State:               provisioning
    Type:                service_instance
    Sub Type:
    Service Endpoints:   public
    Allow Cleanup:       false
    Locked:              false
    Created at:          2021-11-29T07:20:40Z
    Updated at:          2021-11-29T07:20:42Z
    Last Operation:
                    Status    create in progress
                    Message   Started create instance operation
    

    Make a note of the instance ID from the output. You will need the instance ID when you call instance management or Spark application management APIs. See Spark application REST API.

  5. Track instance readiness.

Creating a service instance by using the Resource controller REST API

An IBM Analytics Engine serverless instance must reside in an IBM Cloud® resource group. As a first step toward creating an IBM Analytics Engine serverless instance through the Resource controller REST API, you must have the resource group ID and serverless plan ID close at hand.

To create a service instance by using the Resource controller REST API:

  1. Get the resource group ID by logging in to the IBM Cloud® CLI and running the following command:

    ibmcloud resource groups
    

    Sample result:

    Retrieving all resource groups under account <Account details..>
    OK
    Name      ID      Default Group   State
    Default   XXXXX   true            ACTIVE
    
  2. Use the following resource plan ID for the Standard Serverless for Apache Spark plan:

    8afde05e-5fd8-4359-a597-946d8432dd45
    
  3. Get the IAM token. For instructions, see steps.

  4. Create an instance by using the Resource controller REST API:

    curl -X POST https://resource-controller.cloud.ibm.com/v2/resource_instances/
    --header "Authorization: Bearer $token" -H 'Content-Type: application/json' -d @provision.json
    

    The provision.json file contains the provisioning parameters for the instance that you want to create. See Architecture and concepts in serverless instances for a description of the provisioning parameters in the payload.

    Following is a sample of the provision.json file.

    {
      "name": "your-service-instance-name",
      "resource_plan_id": "8afde05e-5fd8-4359-a597-946d8432dd45",
      "resource_group": "resource-group-id",
      "target": "us-south",
      "parameters": {
        "default_runtime": {
          "spark_version": "3.3"
            },
            "instance_home": {
              "region": "us-south",
              "endpoint": "s3.direct.us-south.cloud-object-storage.appdomain.cloud",
              "hmac_access_key": "your-access-key",
              "hmac_secret_key": "your-secret-key"
              }
        }
    }
    
  5. Track instance readiness.

For more information on the Resource controller REST API for creating an instance, see Create (provision) a new resource instance.

Tracking instance readiness

To run applications on a newly created serverless instance, the instance must be in active state.

To track instance readiness:

  1. Enter the following command:
    curl -X GET https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/{instance_id} -H "Authorization: Bearer $token"
    
    Sample response:
    {
      "id": "dc0e****-eab2-4t9e-9441-56620949****",
      "state": "created",
      "state_change_time": "2021-04-21T04:24:01Z",
      "default_runtime": {
        "spark_version": "3.3",
        "instance_home": {
          "provider": "ibm-cos",
          "type": "objectstore",
          "region": "us-south",
          "endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud",
          "bucket": "ae-bucket-do-not-delete-dc0e****-eab2-4t**-9441-566209499546",
          "hmac_access_key": "eH****g=",
          "hmac_secret_key": "4d********76"
        },
        "default_config": {
          "spark.driver.memory": "4g",
          "spark.driver.cores": 1
        }
      }
    }
    
  2. Check the value of the state attribute. It must be active before you can start running applications in the instance.

Learn more

When provisioning serverless instances, follow the recommended Best practices.