IBM Cloud Docs
Architecture and concepts in serverless instances

Architecture and concepts in serverless instances

This topic shows you the architecture of IBM Analytics Engine serverless instances and describes some key concepts and definitions.

Instance architecture

The IBM Analytics Engine service is managed by using IBM Cloud® Identity and Access Management (IAM). As an IBM Cloud account owner, you are assigned the account administrator role.

With an IBM Cloud account, you can provision and manage your serverless Analytics Engine instance by using the:

  • IBM Cloud console
  • CLI
  • REST API

The Analytics Engine microservices in the control plane, accessed through an API gateway handle instance creation, capacity provisioning, customization and runtime management while your Spark applications run in isolated namespaces in the data plane. Each Spark application that you submit runs in its own Spark cluster, which is a combination of Spark master and executor nodes. See Isolation and network access.

Each Analytics Engine instance is associated with an IBM Cloud Object Storage instance for instance related data that is accessible by all applications that run in the instance. Currently, all Spark events are stored in this instance as well. Spark application logs are aggregated to a Log Analysis log server.

Shows the IBM Analytics Engine serverless instance architecture.
Figure 1. Architecture flow diagram of IBM Analytics Engine

Key concepts

With IBM Analytics Engine serverless instances, you can spin up Apache Spark clusters as needed and customize the Spark runtime and default Spark configuration options.

The following sections describe key concepts when provisioning serverless instances.

IBM Analytics Engine service instance

An IBM Cloud® service is cloud extension that provides ready-for-use functionality, such as database, messaging, and web software for running code, or application management or monitoring capabilities. Services usually do not require installation or maintenance and can be combined to create applications. An instance of a service is an entity that consists of resources that are reserved for a particular application or a service.

When you create an IBM Analytics Engine from the catalog, you will give the service instance a name of your choice, select the default Spark runtime you want to associate with the instance and provide the default Spark configuration to use with the instance. Additionally, you need have to specify the Instance home, which is the storage attached to the instance for instance related data only.

Note:

  • When you create an IBM Analytics Engine service instance, no costs are incurred unless you have Spark applications running or the Spark history server is accessed.
  • Costs are incurred if IBM Cloud Object Storage if accessed through public endpoints, and when you enable forwarding IBM Analytics Engine logs to IBM Log Analysis.
  • There is a default limit on the number of service instances permitted per IBM Cloud® account and on the amount of CPU and memory that can be used in any given IBM Analytics Engine service instance. See Limits and quotas for Analytics Engine instances. If you need to adjust these limits, open an IBM Support ticket.
  • There is no limit on the number of Spark applications that can be run in an IBM Analytics Engine service instance.

Default Spark runtime

At the time of instance provisioning, you can select the Spark version to be used. Currently, you can choose between Spark 3.3 and Spark 3.4. Spark 3.3 is considered as the default version.

The runtime includes open source Spark binaries and the configuration helps you to quickly proceed with the instance creation and run Spark applications in the instance. In addition to the Spark binaries, the runtime also includes the geospatial, data skipping, and Parquet modular encryption libraries.

Across all Spark runtime version, you can submit Spark applications written in the following languages:

  • Scala
  • Python
  • R

The following table shows the Spark runtime version and runtime language version.

Table 1. Spark runtime version and runtime language version
Spark version Apache Spark release status Supported Languages
3.1 3.1.2 Removed(Not Supported) Java 8, Scala 2.12, Python 3.10 and R 4.2
3.3 3.3.2 Default Java 11, Scala 2.12, Python 3.10 and R 4.2
3.4 3.4.1 Latest Java 11, Scala 2.12, Python 3.10 and R 4.2

The language versions are upgraded periodically to keep the runtime free from any security vulnerabilities. You can always override the Spark runtime version when you submit an application. For details on what to add to the payload, see Passing the runtime Spark version when submitting an application.

Instance home

Instance home is the storage attached to the instance for instance related data only, such as custom application libraries and Spark history events. Currently, only IBM Cloud Object Storage is accepted for instance home. This instance can be an instance in your IBM Cloud® account or an instance from a different account.

When you provision an instance using the IBM Cloud console, the IBM Cloud Object Storage instances in your IBM Cloud® account are auto discovered and displayed in a list for you to select from. If no IBM Cloud Object Storage instances are found in your account, you can use the REST APIs to update instance home after instance creation.

You can't change instance home after instance creation. You can only edit the access keys.

Default Spark configuration

You can specify default Spark configurations at the time of provisioning an Analytics Engine instance (See Provisioning an IBM Analytics Engine serverless instance). The configurations are automatically applied to the Spark applications submitted on the instance. You can also update the configurations after creating the instance. You can edit the configuration from the Configuration section in the Analytics Engine Instance details page, Analytics Engine Rest APIs or IAE CLI . Values specified as instance level defaults can be overridden at the time of submitting Spark applications.

To learn more about the various Apache Spark configurations, see Spark Configuration.

Serverless instance features and execution methods

The following table shows the supported serverless instance features by access role and execution methods.

Table 2 Supported serverless instance features by access role and execution methods
Operation Access roles IBM Console API CLI
Provision instances Administrator the confirm icon the confirm icon the confirm icon
Delete instances Administrator the confirm icon the confirm icon the confirm icon
Grant users permission Administrator the confirm icon the confirm icon the confirm icon
Manage instance home storage Administrator the confirm icon the confirm icon the confirm icon
Configure logging Administrator
Developer
Devops
the confirm icon the confirm icon Not available
Submit Spark applications Administrator
Developer
Not available the confirm icon the confirm icon
View list of submitted Spark applications Administrator
Developer
DevOps
Not available the confirm icon the confirm icon
Stop submitted Spark applications Administrator
Developer
DevOps
Not available the confirm icon the confirm icon
Customize libraries Administrator
Developer
Not available the confirm icon Not available
Access job logs Administrator
Developer
DevOps
the confirm icon from the Log Analysis console Not applicable Not applicable
View instance details; shown details might vary depending on access role Administrator
Developer
DevOps
the confirm icon the confirm icon the confirm icon
Manage Spark history server Administrator
Developer
the confirm icon the confirm icon the confirm icon
Access Spark history Administrator
Developer
DevOps
the confirm icon the confirm icon the confirm icon