Explore creating serverless instances and submitting applications using the CLI

Learn how to use the IBM Analytics Engine CLI to create the services that you need to create and manage a serverless instance, and submit and monitor your Spark applications.

You create a serverless instance by selecting the IBM Analytics Engine Standard serverless plan. When a serverless instance is provisioned, an Apache Spark cluster is created, which you can customize with library packages of your choice, and is where you run your Spark applications.

Objectives

You will learn how to install and set up the following services and components that you will need to use the CLI:

An IBM Cloud Object Storage instance in which your IBM Analytics Engine instance stores custom application libraries and Spark history events.
A Object Storage bucket for application files and data files.
An IBM Analytics Engine serverless instance. This instance is allocated compute and memory resources on demand whenever Spark workloads are deployed. When an application is not in running state, no computing resources are allocated to the instance. The price is based on the actual usage of resources consumed by the instance, billed on a per second basis.
Logging service to help you to troubleshoot issues that might occur in the IBM Analytics Engine instance and submitted application, as well as to view any output generated by your application. When you run applications with logging enabled, logs are forwarded to an IBM Log Analysis service where they are indexed, enabling full-text search through all generated messages and convenient querying based on specific fields.

Before you begin

To start using the the Analytics Engine V3 CLI you need:

An IBM Cloud® account.
To install the IBM Cloud CLI. See Getting started with the IBM Cloud CLI for the instructions to download and install the CLI.

Now you can start using the Analytics Engine V3 CLI. You must follow the instructions in steps 1 and 2 to install the required services before you start with step 3 to upload and submit Spark applications. Step 4 shows you how to create a logging instance and enable logging. Step 5 shows you how to delete an Analytics Engine instance, although this step is optional.

Create a Cloud Object Storage instance and retrieve credentials

Create an IBM Cloud Object Storage instance and retrieve the Cloud Object Storage credentials (service keys) by using the Analytics Engine Serverless CLI.

The Cloud Object Storage instance that you create is required by the IBM Analytics Engine instance as its instance home storage for Spark history events and any custom libraries or packages that you want to use in your applications. See Instance home.

For more information on how you can create a library set with custom packages that is stored in Cloud Object Storage and referenced from your application, see Using a library set.

Action :Enter:

    ibmcloud api <URL>
    ibmcloud login

Example :Enter:

    ibmcloud api https://cloud.ibm.com
    ibmcloud login

Select the resource group. Get the list of the resource groups for your account and select one in which to create the IBM Analytics Engine serverless instance:

Action :Enter:
```
    ibmcloud target -g RESOURCE_GROUP_NAME
```
Parameter values:
- RESOURCE_GROUP_NAME: The name of resource group in which the serverless instance is to reside
Example :Enter:
```
    ibmcloud resource groups
    ibmcloud target -g default
```
Install the IBM Cloud Object Storage service and then the Analytics Engine V3 CLI:

Action :Enter:
```
    ibmcloud plugin install cloud-object-storage
```
Action :Enter:
```
    ibmcloud plugin install analytics-engine-v3
```

Create a Cloud Object Storage instance:

Action :Enter:

    ibmcloud resource service-instance-create INSTANCE_NAME cloud-object-storage PLAN global

Parameter values:

INSTANCE_NAME: Any name of your choice
PLAN: The Cloud Object Storage plan to use when creating the instance

Example :Enter:

    ibmcloud resource service-instance-create test-cos-object cloud-object-storage standard global

Response :The example returns:

    Service instance test-cos-object was created.
    Name:             test-cos-object
    ID:               crn:v1:bluemix:public:cloud-object-storage:global:a/867d444f64594fd68c7ebf4baf8f6c90:ebad3176-8a1a-41f2-a803-217621bf6309::
    GUID:             ebad3176-8a1a-41f2-a803-217621bf6309
    Location:         global
    State:            active
    Type:             service_instance
    Sub Type:
    Allow Cleanup:    false
    Locked:           false
    Created at:       2021-12-27T07:57:56Z
    Updated at:       2021-12-27T07:57:58Z
    Last Operation:
                     Status    create succeeded
                     Message   Completed create instance operation

Configure the CRN by copying the value of ID from the response the of Cloud Object Storage creation call in the previous step:

Action :Enter:

    ibmcloud cos config crn
    Resource Instance ID CRN: ID

Parameter values:

ID: The value of ID from the response the of Cloud Object Storage creation call

Example :Enter:

    ibmcloud cos config crn
    Resource Instance ID CRN: crn:v1:bluemix:public:cloud-object-storage:global:a/867d444f64594fd68c7ebf4baf8f6c90:ebad3176-8a1a-41f2-a803-217621bf6309::

Create a Cloud Object Storage bucket:

Action :Enter:
```
    ibmcloud cos bucket-create --bucket BUCKET_NAME [--class CLASS_NAME] [--ibm-service-instance-id ID] [--region REGION] [--output FORMAT]
```
Parameter values:
- BUCKET_NAME: Any name of your choice
- ID: The value of GUID from the response the of Cloud Object Storage creation call
- REGION: The IBM Cloud region in which the Cloud Object Storage instance was created
- FORMAT: Output format can be JSON or text.
Example :Enter:
```
    ibmcloud cos bucket-create --bucket test-cos-storage-bucket --region us-south --ibm-service-instance-id ebad3176-8a1a-41f2-a803-217621bf6309 --output json
```

Create Cloud Object Storage service keys:

Action :Enter:

    ibmcloud resource service-key-create NAME [ROLE_NAME] ( --instance-id SERVICE_INSTANCE_ID | --instance-name SERVICE_INSTANCE_NAME | --alias-id SERVICE_ALIAS_ID | --alias-name SERVICE_ALIAS_NAME) [--service-id SERVICE_ID] [-p, --parameters @JSON_FILE|JSON_TEXT] [-g RESOURCE_GROUP] [--service-endpoint SERVICE_ENDPOINT_TYPE] [--output FORMAT] [-f, --force] [-q, --quiet]

 Parameter values:
 - NAME: Any name of your choice
 - [ROLE_NAME]: This parameter is optional. The access role, for example, `Writer` or `Reader`
 - SERVICE_INSTANCE_ID: The value of GUID from the response the of Cloud Object Storage creation call
 - SERVICE_INSTANCE_NAME: The value of NAME from the response the of Cloud Object Storage creation call
 - JSON_TEXT: The authentication to access Cloud Object Storage. Currently only HMAC keys are supported.

Example :Enter:

    ibmcloud resource service-key-create test-service-key-cos-bucket Writer --instance-name test-cos-object --parameters '{"HMAC":true}'

Response

The example returns:

Creating service key of service instance test-cos-object under account Test
OK

Service key crn:v1:bluemix:public:cloud-object-storage:global:a/183**93b485e:9ee135f9-4667-4797-8478-b20**ce-key:21a310e1-bbd6-**bf1f4 was created.
Name:          test-service-key-cos-bucket
ID:            crn:v1:bluemix:public:cloud-object-**
Created At:    Mon Dec 27 12:52:49 UTC 2021
State:         active
Credentials:
    apikey: 3a4Ncm**o-WJGFaEzwfY
    cos_hmac_keys:
       access_key_id: 21a31**f1f4
       secret_access_key: c5a23**b6792d3e0a6c
       endpoints: https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints
       iam_apikey_description: Auto-generated for key crn:v1:bluemix:public:cloud-object-storage:global:a/1836f778**c93b485e:9ee**8478-b2019a4b4e20:resource-key:21a3**05a9bf1f4
       iam_apikey_name: test-service-key-cos-bucket
       iam_role_crn: crn:v1:bluemix:public:iam::::serviceRole:Writer
       iam_serviceid_crn: crn:v1:bluemix:public:iam-identity::a/1836f77885e521c5ab2523aac93b485e::serviceid:ServiceId-702ca222-3615-464c-92d3-1849c03170cc
       resource_instance_id: crn:v1:bluemix:public:cloud-object-storage:global:a/1836f7**3b485e:9ee135f9-4667-479**4e20::

Create an Analytics Engine serverless instance

Create an serverless Analytics Engine instance by using the CLI.

Create the Analytics Engine service instance:

Action :Enter:

    ibmcloud resource service-instance-create INSTANCE_NAME ibmanalyticsengine standard-serverless-spark us-south -p @provision.json

Parameter values:

INSTANCE_NAME: Any name of your choice
@provision.json: Structure the JSON file as shown in the following example. Use the access and secret key from the response the of Cloud Object Storage service key creation call.

Example of the provision.json file :Sample JSON file:

    {
       "default_runtime": {
          "spark_version": "3.4" },
          "instance_home": {
             "region": "us-south",
             "endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud",
             "hmac_access_key": "<your-hmac-access-key>",
             "hmac_secret_key": "<your-hmac-secret-key>"}
    }

Example :Enter:

    ibmcloud resource service-instance-create test-ae-service ibmanalyticsengine standard-serverless-spark us-south -p @ae_provision.json

Response :The example returns: ```text Creating service instance test-ae-service in resource group of account as ...

 OK

 Service instance test-ae-service was created.
 Name:                test-ae-service
 ID:                  crn:v1:bluemix:public:ibmanalyticsengine:us-south:a/183**aac93b485e:181ea**be1-70978**1b::
 GUID:                181ea**9ee01b
 Location:            us-south
 State:               provisioning
 Type:                service_instance
 Sub Type:
 Service Endpoints:   public
 Allow Cleanup:       false
 Locked:              false
 Created at:          2022-01-03T08:40:25Z
 Updated at:          2022-01-03T08:40:26Z
 Last Operation:
     Status    create in progress
     Message   Started create instance operation
 ```

Check the status of the Analytics Engine service:

Action :Enter:

    ibmcloud ae-v3 instance show –id INSTANCE_ID

Parameter values:

INSTANCE_ID: The value of GUID from the response the of Analytics Engine instance creation call

Example :Enter:

    ibmcloud ae-v3 instance show –id 181ea**9ee01b

Response :The example returns:

    {
       "default_runtime": {
          "spark_version": "3.4" },
       "id": "181ea**9ee01b ",
       "instance_home": {
          "bucket": "do-not-delete-ae-bucket-e96**5d-b7**a82",
          "endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud",
          "hmac_access_key": "**",
          "hmac_secret_key": "**",
          "provider": "ibm-cos",
          "region": "us-south",
          "type": "objectstore" },
       "state": "active",
       "state_change_time": "**"
    }

Only submit your Spark application when the state of the Analytics Engine service is active.

Upload and submit a Spark application

Upload an application file to Cloud Object Storage and submit a Spark application.

This tutorial shows you how to add the Spark application to the Cloud Object Storage instance bucket that is used as instance home by the Analytics Engine instance. If you want to separate the instance related files from the files you use to run your applications, for example the applications files themselves, data files, and any results of your analysis, you can use a different bucket in the same Cloud Object Storage instance or use a different Cloud Object Storage instance.

Upload the Spark application file:

Action :Enter:

    ibmcloud cos upload --bucket BUCKET_NAME --key KEY --file PATH [--concurrency VALUE] [--max-upload-parts PARTS] [--part-size SIZE] [--leave-parts-on-errors] [--cache-control CACHING_DIRECTIVES] [--content-disposition DIRECTIVES] [--content-encoding CONTENT_ENCODING] [--content-language LANGUAGE] [--content-length SIZE] [--content-md5 MD5] [--content-type MIME] [--metadata STRUCTURE] [--region REGION] [--output FORMAT] [--json]

Parameter values:

BUCKET_NAME: Name of bucket used when bucket was created
KEY: Application file name
PATH: file name and path to the Spark application file

Example :Enter:

    ibmcloud cos upload --bucket test-cos-storage-bucket --key test-math.py --file test-math.py

Sample application file: Sample of test-math.py:

from pyspark.sql import SparkSession
import time
import random
import cmath

def init_spark():
    spark = SparkSession.builder.appName("test-math").getOrCreate()
    sc = spark.sparkContext
    return spark,sc

def transformFunc(x):
    return cmath.sqrt(x)+cmath.log(x)+cmath.log10(x)

def main():
    spark,sc = init_spark()
    partitions=[10,5]
    for i in range (0,2):
        data=range(1,20000000)
        v0 = sc.parallelize(data, partitions[i])
        v1 = v0.map(transformFunc)
        print(f"v1.count is {v1.count()}. Done")
        time.sleep(60)

if __name__ == '__main__':
    main()

Check the status of the Analytics Engine service:

Action :Enter:

    ibmcloud ae-v3 instance show –id INSTANCE ID

Parameter values:

INSTANCE_ID: The value of GUID from the response the of Analytics Engine instance creation call

Example :Enter:

    ibmcloud ae-v3 instance show –id 181ea**9ee01b

Response :The example returns:

    {
       "default_runtime": {
          "spark_version": "3.4" },
       "id": "181ea**9ee01b ",
       "instance_home": {
          "bucket": "do-not-delete-ae-bucket-e96**5d-b7**a82",
          "endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud",
          "hmac_access_key": "**",
          "hmac_secret_key": "**",
          "provider": "ibm-cos",
          "region": "us-south",
          "type": "objectstore" },
       "state": "active",
       "state_change_time": "**"
    }

Only submit your Spark application when the state of the Analytics Engine service is active.

Submit the Spark application:

Action :Enter:

    ibmcloud ae-v3 spark-app submit --instance-id INSTANCE_ID -–app APPLICATION_PATH

Parameter values:

INSTANCE_ID: The value of GUID from the response the of Analytics Engine instance creation call
APPLICATION_PATH: The file name and path to the Spark application file

Example for IOS and Linux :Enter:

    ibmcloud ae-v3 spark-app submit --instance-id 181ea**9ee01b --app "cos://test-cos-storage-bucket.mycos/test-math.py" --conf '{"spark.hadoop.fs.cos.mycos.endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud", "spark.hadoop.fs.cos.mycos.access.key": "21**bf1f4", "spark.hadoop.fs.cos.mycos.secret.key": "c5a**d3e0a6c"}'

Example for Windows (Not Powershell). Note that on Windows, the quotes needs to be escaped. :Enter:

    ibmcloud ae-v3 spark-app submit --instance-id myinstanceid --app "cos://matrix.mycos/test-math.py" --conf "{\"spark.hadoop.fs.cos.mycos.endpoint\": \"https://s3.direct.us-south.cloud-object-storage.appdomain.cloud\", \"spark.hadoop.fs.cos.mycos.access.key\": \"mykey\", \"spark.hadoop.fs.cos.mycos.secret.key\": \"mysecret\"}"

Response :The example returns:

    id      7f7096d2-5c44-4d9a-ac01-b904c7611b7b
    state   accepted

Check the details or status of the application that you submitted:

Action :Enter:

    ibmcloud ae-v3 spark-app show --instance-id INSTANCE_ID --app-id APPLICATION_ID

Parameter values:

INSTANCE_ID: The value of GUID from the response the of Analytics Engine creation call
APPLICATION_ID: The value of id from the response of the spark-app submit call

Example :Enter:

    ibmcloud ae-v3 spark-app show --instance-id 181ea**9ee01b --app-id 7f7096d2-5c44-4d9a-ac01-b904c7611b7b

Response :The example returns:

    application_details   <Nested Object>
    id                    7f7096d2-5c44-4d9a-ac01-b904c7611b7b
    state                 finished
    start_time            2022-03-01T12:58:54.000Z
    finish_time           2022-03-01T13:09:14.000Z

The application might take between 2 to 5 minutes to complete.

Create logging service to see logs

You can use use the Analytics Engine CLI to enable logging to help you troubleshoot issues in IBM Analytics Engine. Before you can enable logging, you need to create an IBM Log Analysis service instance to which the logs are forwarded.

Create a logging instance:

Action :Enter:
```
    ibmcloud resource service-instance-create NAME logdna SERVICE_PLAN_NAME LOCATION
```
Parameter values:
- NAME: Any name of your choice for the IBM Log Analysis service instance
- SERVICE_PLAN_NAME: The name of the service plan. For valid values, see Service plans.
- LOCATION: Locations where Analytics Engine is enabled to send logs to IBM Log Analysis. For valid locations, see Compute serverless services.
Example :Enter:
```
    ibmcloud resource service-instance-create my-log-instance logdna 7-day us-south
```
Once the logging service is created, you can log in to IBM Cloud®, search for the logging service instance, and click on the monitoring dashboard. There you can view the driver and executor logs, as well as all application logs for your Spark application.

Search using the application_id or instance_id.
Enable platform logging:

To view IBM Analytics Engine platform logs, you must use the Observability dashboard in IBM Cloud to configure platform logging. See Configuring platform logs through the Observability dashboard for the steps you need to follow to enable logging through the Observability dashboard.
Enable logging for Analytics Engine:

Action :Enter:
```
    ibmcloud analytics-engine-v3 log-config COMMAND [arguments...] [command options]
```
Parameter values:
- analytics-engine-v3: Use ae-v3 to use the v3 CLI commands
- COMMAND: Use the update command to enable logging
Example :Enter:
```
   ibmcloud ae-v3 log-config update --instance-id 181ea**9ee01b --enable --output json
```

Delete Analytics Engine instance

You can use the CLI to delete an instance, for example if you need an instance with a completely different configuration to handle greater workloads.

You can retain an Analytics Engine instance as long as you want and submit your Spark applications against the same instance on an as-needed basis.

If you want to delete an Analytics Engine instance:

Action :Enter:

    ibmcloud resource service-instance-delete NAME|ID [-g RESOURCE_GROUP] -f

Parameter values:

NAME|ID: The value of Name or GUID from the response of the Analytics Engine instance creation call
RESOURCE_GROUP: Optional parameter. The name of resource group in which the serverless instance is resides

Example :Enter:

    ibmcloud resource service-instance-delete MyServiceInstance  -g default  -f

Learn more

See the IBM Analytics Engine serverless CLI reference.