IBM Cloud Docs
Explore creating serverless instances and submitting applications using the CLI

Explore creating serverless instances and submitting applications using the CLI

Learn how to use the IBM Analytics Engine CLI to create the services that you need to create and manage a serverless instance, and submit and monitor your Spark applications.

You create a serverless instance by selecting the IBM Analytics Engine Standard serverless plan. When a serverless instance is provisioned, an Apache Spark cluster is created, which you can customize with library packages of your choice, and is where you run your Spark applications.

Objectives

You will learn how to install and set up the following services and components that you will need to use the CLI:

  • An IBM Cloud Object Storage instance in which your IBM Analytics Engine instance stores custom application libraries and Spark history events.
  • A Object Storage bucket for application files and data files.
  • An IBM Analytics Engine serverless instance. This instance is allocated compute and memory resources on demand whenever Spark workloads are deployed. When an application is not in running state, no computing resources are allocated to the instance. The price is based on the actual usage of resources consumed by the instance, billed on a per second basis.
  • Logging service to help you to troubleshoot issues that might occur in the IBM Analytics Engine instance and submitted application, as well as to view any output generated by your application. When you run applications with logging enabled, logs are forwarded to an IBM Log Analysis service where they are indexed, enabling full-text search through all generated messages and convenient querying based on specific fields.

Before you begin

To start using the the Analytics Engine V3 CLI you need:

Now you can start using the Analytics Engine V3 CLI. You must follow the instructions in steps 1 and 2 to install the required services before you start with step 3 to upload and submit Spark applications. Step 4 shows you how to create a logging instance and enable logging. Step 5 shows you how to delete an Analytics Engine instance, although this step is optional.

Create a Cloud Object Storage instance and retrieve credentials

Create an IBM Cloud Object Storage instance and retrieve the Cloud Object Storage credentials (service keys) by using the Analytics Engine Serverless CLI.

The Cloud Object Storage instance that you create is required by the IBM Analytics Engine instance as its instance home storage for Spark history events and any custom libraries or packages that you want to use in your applications. See Instance home.

For more information on how you can create a library set with custom packages that is stored in Cloud Object Storage and referenced from your application, see Using a library set.

  1. Log in to IBM Cloud® using your IBM Cloud® account.

    Action :Enter:

        ibmcloud api <URL>
        ibmcloud login
    

    Example :Enter:

        ibmcloud api https://cloud.ibm.com
        ibmcloud login
    
  2. Select the resource group. Get the list of the resource groups for your account and select one in which to create the IBM Analytics Engine serverless instance:

    Action :Enter:

        ibmcloud target -g RESOURCE_GROUP_NAME
    

    Parameter values:

    • RESOURCE_GROUP_NAME: The name of resource group in which the serverless instance is to reside

    Example :Enter:

        ibmcloud resource groups
        ibmcloud target -g default
    
  3. Install the IBM Cloud Object Storage service and then the Analytics Engine V3 CLI:

    Action :Enter:

        ibmcloud plugin install cloud-object-storage
    

    Action :Enter:

        ibmcloud plugin install analytics-engine-v3
    
  4. Create a Cloud Object Storage instance:

    Action :Enter:

        ibmcloud resource service-instance-create INSTANCE_NAME cloud-object-storage PLAN global
    

    Parameter values:

    • INSTANCE_NAME: Any name of your choice
    • PLAN: The Cloud Object Storage plan to use when creating the instance

    Example :Enter:

        ibmcloud resource service-instance-create test-cos-object cloud-object-storage standard global
    

    Response :The example returns:

        Service instance test-cos-object was created.
        Name:             test-cos-object
        ID:               crn:v1:bluemix:public:cloud-object-storage:global:a/867d444f64594fd68c7ebf4baf8f6c90:ebad3176-8a1a-41f2-a803-217621bf6309::
        GUID:             ebad3176-8a1a-41f2-a803-217621bf6309
        Location:         global
        State:            active
        Type:             service_instance
        Sub Type:
        Allow Cleanup:    false
        Locked:           false
        Created at:       2021-12-27T07:57:56Z
        Updated at:       2021-12-27T07:57:58Z
        Last Operation:
                         Status    create succeeded
                         Message   Completed create instance operation
    
  5. Configure the CRN by copying the value of ID from the response the of Cloud Object Storage creation call in the previous step:

    Action :Enter:

        ibmcloud cos config crn
        Resource Instance ID CRN: ID
    

    Parameter values:

    • ID: The value of ID from the response the of Cloud Object Storage creation call

    Example :Enter:

        ibmcloud cos config crn
        Resource Instance ID CRN: crn:v1:bluemix:public:cloud-object-storage:global:a/867d444f64594fd68c7ebf4baf8f6c90:ebad3176-8a1a-41f2-a803-217621bf6309::
    
  6. Create a Cloud Object Storage bucket:

    Action :Enter:

        ibmcloud cos bucket-create --bucket BUCKET_NAME [--class CLASS_NAME] [--ibm-service-instance-id ID] [--region REGION] [--output FORMAT]
    

    Parameter values:

    • BUCKET_NAME: Any name of your choice
    • ID: The value of GUID from the response the of Cloud Object Storage creation call
    • REGION: The IBM Cloud region in which the Cloud Object Storage instance was created
    • FORMAT: Output format can be JSON or text.

    Example :Enter:

        ibmcloud cos bucket-create --bucket test-cos-storage-bucket --region us-south --ibm-service-instance-id ebad3176-8a1a-41f2-a803-217621bf6309 --output json
    
  7. Create Cloud Object Storage service keys:

    Action :Enter:

        ibmcloud resource service-key-create NAME [ROLE_NAME] ( --instance-id SERVICE_INSTANCE_ID | --instance-name SERVICE_INSTANCE_NAME | --alias-id SERVICE_ALIAS_ID | --alias-name SERVICE_ALIAS_NAME) [--service-id SERVICE_ID] [-p, --parameters @JSON_FILE|JSON_TEXT] [-g RESOURCE_GROUP] [--service-endpoint SERVICE_ENDPOINT_TYPE] [--output FORMAT] [-f, --force] [-q, --quiet]
    
     Parameter values:
     - NAME: Any name of your choice
     - [ROLE_NAME]: This parameter is optional. The access role, for example, `Writer` or `Reader`
     - SERVICE_INSTANCE_ID: The value of GUID from the response the of Cloud Object Storage creation call
     - SERVICE_INSTANCE_NAME: The value of NAME from the response the of Cloud Object Storage creation call
     - JSON_TEXT: The authentication to access Cloud Object Storage. Currently only HMAC keys are supported.
    

    Example :Enter:

        ibmcloud resource service-key-create test-service-key-cos-bucket Writer --instance-name test-cos-object --parameters '{"HMAC":true}'
    
    Response
    The example returns:
    Creating service key of service instance test-cos-object under account Test
    OK
    
    Service key crn:v1:bluemix:public:cloud-object-storage:global:a/183**93b485e:9ee135f9-4667-4797-8478-b20**ce-key:21a310e1-bbd6-**bf1f4 was created.
    Name:          test-service-key-cos-bucket
    ID:            crn:v1:bluemix:public:cloud-object-**
    Created At:    Mon Dec 27 12:52:49 UTC 2021
    State:         active
    Credentials:
        apikey: 3a4Ncm**o-WJGFaEzwfY
        cos_hmac_keys:
           access_key_id: 21a31**f1f4
           secret_access_key: c5a23**b6792d3e0a6c
           endpoints: https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints
           iam_apikey_description: Auto-generated for key crn:v1:bluemix:public:cloud-object-storage:global:a/1836f778**c93b485e:9ee**8478-b2019a4b4e20:resource-key:21a3**05a9bf1f4
           iam_apikey_name: test-service-key-cos-bucket
           iam_role_crn: crn:v1:bluemix:public:iam::::serviceRole:Writer
           iam_serviceid_crn: crn:v1:bluemix:public:iam-identity::a/1836f77885e521c5ab2523aac93b485e::serviceid:ServiceId-702ca222-3615-464c-92d3-1849c03170cc
           resource_instance_id: crn:v1:bluemix:public:cloud-object-storage:global:a/1836f7**3b485e:9ee135f9-4667-479**4e20::
    

Create an Analytics Engine serverless instance

Create an serverless Analytics Engine instance by using the CLI.

  1. Create the Analytics Engine service instance:

    Action :Enter:

        ibmcloud resource service-instance-create INSTANCE_NAME ibmanalyticsengine standard-serverless-spark us-south -p @provision.json
    

    Parameter values:

    • INSTANCE_NAME: Any name of your choice
    • @provision.json: Structure the JSON file as shown in the following example. Use the access and secret key from the response the of Cloud Object Storage service key creation call.

    Example of the provision.json file :Sample JSON file:

        {
           "default_runtime": {
              "spark_version": "3.3" },
              "instance_home": {
                 "region": "us-south",
                 "endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud",
                 "hmac_access_key": "<your-hmac-access-key>",
                 "hmac_secret_key": "<your-hmac-secret-key>"}
        }
    

    Example :Enter:

        ibmcloud resource service-instance-create test-ae-service ibmanalyticsengine standard-serverless-spark us-south -p @ae_provision.json
    

    Response :The example returns: ```text Creating service instance test-ae-service in resource group of account as ...

     OK
    
     Service instance test-ae-service was created.
     Name:                test-ae-service
     ID:                  crn:v1:bluemix:public:ibmanalyticsengine:us-south:a/183**aac93b485e:181ea**be1-70978**1b::
     GUID:                181ea**9ee01b
     Location:            us-south
     State:               provisioning
     Type:                service_instance
     Sub Type:
     Service Endpoints:   public
     Allow Cleanup:       false
     Locked:              false
     Created at:          2022-01-03T08:40:25Z
     Updated at:          2022-01-03T08:40:26Z
     Last Operation:
         Status    create in progress
         Message   Started create instance operation
     ```
    
  2. Check the status of the Analytics Engine service:

    Action :Enter:

        ibmcloud ae-v3 instance show –id INSTANCE_ID
    

    Parameter values:

    • INSTANCE_ID: The value of GUID from the response the of Analytics Engine instance creation call

    Example :Enter:

        ibmcloud ae-v3 instance show –id 181ea**9ee01b
    

    Response :The example returns:

        {
           "default_runtime": {
              "spark_version": "3.3" },
           "id": "181ea**9ee01b ",
           "instance_home": {
              "bucket": "do-not-delete-ae-bucket-e96**5d-b7**a82",
              "endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud",
              "hmac_access_key": "**",
              "hmac_secret_key": "**",
              "provider": "ibm-cos",
              "region": "us-south",
              "type": "objectstore" },
           "state": "active",
           "state_change_time": "**"
        }
    

    Only submit your Spark application when the state of the Analytics Engine service is active.

Upload and submit a Spark application

Upload an application file to Cloud Object Storage and submit a Spark application.

This tutorial shows you how to add the Spark application to the Cloud Object Storage instance bucket that is used as instance home by the Analytics Engine instance. If you want to separate the instance related files from the files you use to run your applications, for example the applications files themselves, data files, and any results of your analysis, you can use a different bucket in the same Cloud Object Storage instance or use a different Cloud Object Storage instance.

  1. Upload the Spark application file:

    Action :Enter:

        ibmcloud cos upload --bucket BUCKET_NAME --key KEY --file PATH [--concurrency VALUE] [--max-upload-parts PARTS] [--part-size SIZE] [--leave-parts-on-errors] [--cache-control CACHING_DIRECTIVES] [--content-disposition DIRECTIVES] [--content-encoding CONTENT_ENCODING] [--content-language LANGUAGE] [--content-length SIZE] [--content-md5 MD5] [--content-type MIME] [--metadata STRUCTURE] [--region REGION] [--output FORMAT] [--json]
    

    Parameter values:

    • BUCKET_NAME: Name of bucket used when bucket was created
    • KEY: Application file name
    • PATH: file name and path to the Spark application file

    Example :Enter:

        ibmcloud cos upload --bucket test-cos-storage-bucket --key test-math.py --file test-math.py
    
    Sample application file
    Sample of test-math.py:
    from pyspark.sql import SparkSession
    import time
    import random
    import cmath
    
    def init_spark():
        spark = SparkSession.builder.appName("test-math").getOrCreate()
        sc = spark.sparkContext
        return spark,sc
    
    def transformFunc(x):
        return cmath.sqrt(x)+cmath.log(x)+cmath.log10(x)
    
    def main():
        spark,sc = init_spark()
        partitions=[10,5]
        for i in range (0,2):
            data=range(1,20000000)
            v0 = sc.parallelize(data, partitions[i])
            v1 = v0.map(transformFunc)
            print(f"v1.count is {v1.count()}. Done")
            time.sleep(60)
    
    if __name__ == '__main__':
        main()
    
  2. Check the status of the Analytics Engine service:

    Action :Enter:

        ibmcloud ae-v3 instance show –id INSTANCE ID
    

    Parameter values:

    • INSTANCE_ID: The value of GUID from the response the of Analytics Engine instance creation call

    Example :Enter:

        ibmcloud ae-v3 instance show –id 181ea**9ee01b
    

    Response :The example returns:

        {
           "default_runtime": {
              "spark_version": "3.3" },
           "id": "181ea**9ee01b ",
           "instance_home": {
              "bucket": "do-not-delete-ae-bucket-e96**5d-b7**a82",
              "endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud",
              "hmac_access_key": "**",
              "hmac_secret_key": "**",
              "provider": "ibm-cos",
              "region": "us-south",
              "type": "objectstore" },
           "state": "active",
           "state_change_time": "**"
        }
    

    Only submit your Spark application when the state of the Analytics Engine service is active.

  3. Submit the Spark application:

    Action :Enter:

        ibmcloud ae-v3 spark-app submit --instance-id INSTANCE_ID -–app APPLICATION_PATH
    

    Parameter values:

    • INSTANCE_ID: The value of GUID from the response the of Analytics Engine instance creation call
    • APPLICATION_PATH: The file name and path to the Spark application file

    Example for IOS and Linux :Enter:

        ibmcloud ae-v3 spark-app submit --instance-id 181ea**9ee01b --app "cos://test-cos-storage-bucket.mycos/test-math.py" --conf '{"spark.hadoop.fs.cos.mycos.endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud", "spark.hadoop.fs.cos.mycos.access.key": "21**bf1f4", "spark.hadoop.fs.cos.mycos.secret.key": "c5a**d3e0a6c"}'
    

    Example for Windows (Not Powershell). Note that on Windows, the quotes needs to be escaped. :Enter:

        ibmcloud ae-v3 spark-app submit --instance-id myinstanceid --app "cos://matrix.mycos/test-math.py" --conf "{\"spark.hadoop.fs.cos.mycos.endpoint\": \"https://s3.direct.us-south.cloud-object-storage.appdomain.cloud\", \"spark.hadoop.fs.cos.mycos.access.key\": \"mykey\", \"spark.hadoop.fs.cos.mycos.secret.key\": \"mysecret\"}"
    

    Response :The example returns:

        id      7f7096d2-5c44-4d9a-ac01-b904c7611b7b
        state   accepted
    
  4. Check the details or status of the application that you submitted:

    Action :Enter:

        ibmcloud ae-v3 spark-app show --instance-id INSTANCE_ID --app-id APPLICATION_ID
    

    Parameter values:

    • INSTANCE_ID: The value of GUID from the response the of Analytics Engine creation call
    • APPLICATION_ID: The value of id from the response of the spark-app submit call

    Example :Enter:

        ibmcloud ae-v3 spark-app show --instance-id 181ea**9ee01b --app-id 7f7096d2-5c44-4d9a-ac01-b904c7611b7b
    

    Response :The example returns:

        application_details   <Nested Object>
        id                    7f7096d2-5c44-4d9a-ac01-b904c7611b7b
        state                 finished
        start_time            2022-03-01T12:58:54.000Z
        finish_time           2022-03-01T13:09:14.000Z
    

    The application might take between 2 to 5 minutes to complete.

Create logging service to see logs

You can use use the Analytics Engine CLI to enable logging to help you troubleshoot issues in IBM Analytics Engine. Before you can enable logging, you need to create an IBM Log Analysis service instance to which the logs are forwarded.

  1. Create a logging instance:

    Action :Enter:

        ibmcloud resource service-instance-create NAME logdna SERVICE_PLAN_NAME LOCATION
    

    Parameter values:

    • NAME: Any name of your choice for the IBM Log Analysis service instance
    • SERVICE_PLAN_NAME: The name of the service plan. For valid values, see Service plans.
    • LOCATION: Locations where Analytics Engine is enabled to send logs to IBM Log Analysis. For valid locations, see Compute serverless services.

    Example :Enter:

        ibmcloud resource service-instance-create my-log-instance logdna 7-day us-south
    

    Once the logging service is created, you can log in to IBM Cloud®, search for the logging service instance, and click on the monitoring dashboard. There you can view the driver and executor logs, as well as all application logs for your Spark application.

    Search using the application_id or instance_id.

  2. Enable platform logging:

    To view IBM Analytics Engine platform logs, you must use the Observability dashboard in IBM Cloud to configure platform logging. See Configuring platform logs through the Observability dashboard for the steps you need to follow to enable logging through the Observability dashboard.

  3. Enable logging for Analytics Engine:

    Action :Enter:

        ibmcloud analytics-engine-v3 log-config COMMAND [arguments...] [command options]
    

    Parameter values:

    • analytics-engine-v3: Use ae-v3 to use the v3 CLI commands
    • COMMAND: Use the update command to enable logging

    Example :Enter:

       ibmcloud ae-v3 log-config update --instance-id 181ea**9ee01b --enable --output json
    

Delete Analytics Engine instance

You can use the CLI to delete an instance, for example if you need an instance with a completely different configuration to handle greater workloads.

You can retain an Analytics Engine instance as long as you want and submit your Spark applications against the same instance on an as-needed basis.

If you want to delete an Analytics Engine instance:

Action :Enter:

    ibmcloud resource service-instance-delete NAME|ID [-g RESOURCE_GROUP] -f

Parameter values:

  • NAME|ID: The value of Name or GUID from the response of the Analytics Engine instance creation call
  • RESOURCE_GROUP: Optional parameter. The name of resource group in which the serverless instance is resides

Example :Enter:

    ibmcloud resource service-instance-delete MyServiceInstance  -g default  -f

Learn more

See the IBM Analytics Engine serverless CLI reference.