FAQs

What is IBM Analytics Engine serverless?

The IBM Analytics Engine Standard serverless plan for Apache Spark offers a new consumption model using Apache Spark. An Analytics Engine serverless instance does not consume any resources when no workloads are running. When you submit Spark applications, Spark clusters are created in seconds and are spun down as soon as the applications finish running. You can develop and deploy Spark SQL, data transformation, data science, or machine learning jobs using the Spark application API.

What are the advantages of IBM Analytics Engine serverless instances?

With IBM Analytics Engine serverless, compute and memory resources are allocated on demand when Spark workloads are deployed. When an application is not in running state, no computing resources are allocated to the IBM Analytics Engine serverless instance. Pricing is based on the actual amount of resources consumed by the instance, billed on a per second basis.

Does the IBM Analytics Engine Standard serverless plan for Apache Spark support Hadoop?

No, currently, the IBM Analytics Engine Standard serverless plan for Apache Spark only supports Apache Spark.

Can I change the instance home storage of a serverless instance?

No, you can't. After an instance home storage is associated with an IBM Analytics Engine serverless instance, it cannot be changed because instance home contains all instance relevant data, such as the Spark events and custom libraries. Changing instance home would result in the loss of the Spark history data and custom libraries.

How is user management and access control managed in a serverless instance?

User management and access control of an IBM Analytics Engine serverless instance and its APIs is done through IBM Cloud® Identity and Access Management (IAM). You use IAM access policies to invite users to collaborate on your instance and grant them the necessary privileges. See Granting permissions to users.

How do I define the size of the cluster to run my Spark application?

You can specify the size of the cluster either at the time the instance is created or when submitting Spark applications. You can choose the CPU and memory requirements of your Spark driver and executor, as well the number of executors if you know those requirements up-front. Alternatively, you can choose to let the IBM Analytics Engine service autoscale the Spark cluster based on the application's demand. To override default Spark configuration settings at instance creation or when submitting an application, see Default Spark configuration. For details on autoscaling, Enabling application autoscaling.

How do I install custom libraries to my serverless instance?

You can use custom libraries in Python, R, Scala or Java and make them available to your Spark application by creating a library set and referencing it in your application at the time you submit the Spark application. See Creating a library set.

How can the serverless instance be monitored?

Currently, you can monitor Spark applications in the following ways:

By tracking the state of the Spark application. For details, see Getting the state of a submitted application.
By viewing the Spark history events that are forwarded to the IBM Cloud Object Storage instance that you specified as instance home. You can download these events and view them in the Spark history server UI installed on your desktop. At a later stage, you will be able to launch Spark history server UI from within the IBM Analytics Engine service instance details UI page.

How do I set up autoscaling policies for my serverless instance?

You can enable autoscaling for all applications at instance level at the time you create an instance of the Analytics Engine Standard serverless plan for Apache Spark or per application at the time you submit the application. For details, see Enabling application autoscaling.

Can I connect to a serverless instance with the Apache Livy API?

Yes, the IBM Analytics Engine Standard serverless plan for Apache Spark provides an API interface similar to Livy batch API. For details, see Livy API.

Where can I find the logs for my Spark applications?

You can aggregate the logs from your Spark applications to Log Analysis. For details, see Configuring and viewing logs.

How can I track actions performed by users on a serverless Spark instance?

You can use the Activity Tracker service to track how users and applications interact with IBM Analytics Engine in IBM Cloud®. You can use this service to investigate abnormal activity and critical actions and to comply with regulatory audit requirements. In addition, you can be alerted about actions as they happen. The events that are collected comply with the Cloud Auditing Data Federation (CADF) standard. See Auditing events for IBM Analytics Engine serverless instances.