IBM Cloud Docs
Security model

Security model

IBM Analytics Engine serverless instances provide a security architecture that is designed to enable administrators and developers to create secure Spark clusters.

The following sections describe how the security model of IBM Analytics Engine serverlesss instances manages the access to and control of the secure instances.

Controlling access to IBM Analytics Engine activities

Access to IBM Analytics Engine serverless instances is controlled by IAM authentication and authorization. IAM is the Identity and Access Management service of IBM Cloud®. User authentication and access control happens through IAM when you log in with your IBMId. See how to retrieve the IAM token.

As an administrator or creator of the service instance, you can grant or deny access to other users with whom you may want to share the service instance. All activities on the service instance life cycle management, like modifying the instance configuration, submitting and tracking Spark applications or customizing the instance with custom library sets are controlled through IAM authentication and authorization. See Granting permissions to users to understand which operations are supported and what is the level of access required for each of those operations.

Encrypting at Rest

IBM Cloud Object Storage is the recommended data store to store the data required for executing Spark jobs on the cluster. IBM Cloud Object Storage comes with default built-in encryption. See Encrypting your data.

In addition, or as an alternative to using IBM Cloud Object Storage storage encryption in analytic scenarios for large-scale data, you can use Parquet modular encryption, especially when fine-grained access control is important. See Working with Parquet modular encryption.

Encrypting endpoints

All service endpoints to the cluster are SSL encrypted (TLS 1.2 enabled). In addition, when you use IBM Analytics Engine with IBM Cloud Object Storage, the link between the Object Storage service instance and IBM Analytics Engine is encrypted.

Isolation and network access

Each IBM Analytics Engine serverless instance gets is own isolated sandbox that is disconnected from other instances from a network and security stand point.

Spark workloads deployed in an instance can:

  • Communicate with other Spark workloads deployed in the same instance.
  • Communicate with public internet
  • Can connect with other IBM Cloud® services over private end points

Spark workloads in one IBM Analytics Engine instance cannot communicate with Spark workloads in another instance. See Instance architecture for more on instance isolation.

Ensuring code security

You are advised to be cautious when applying libraries or package customization to your instance. You must use secure code from trusted sources only, so as not to compromise the overall security of the instances.

IBM recommends that you scan any source code, libraries, and packages you use before uploading them to your instance. While the use of non-trusted code will not impact others, it might impact you.

Encrypting internal network data for Spark workload

IBM Analytics Engine allows encrypting the internal communication between the Spark application components. To enable encryption in the private network, specify the configuration in any of the following two ways:

  • At the time of provisioning an IBM Analytics Engine instance, specify the configuration under the default_config attribute.

    Example :

    
    "default_config": {
        "spark.ssl.enabled":"true"
    }
    
  • At the time of submitting a job, specify the options in the payload under conf.

    Example :

    
    {
     "conf": {
    "spark.ssl.enabled":"true"
     }
    }