Best practices
Use the following set of recommended guidelines when provisioning and managing your serverless instances and when running Spark applications.
Best Practice | Description | Reference Link |
---|---|---|
Use separate IBM Analytics Engine service instances for your development and production environments. | This is a general best practice. By creating separate IBM Analytics Engine instances for different environments, you can test any configuration and code changes before applying them on the production instance. | NA |
Upgrade to the latest Spark version | As open source Spark versions are released, they are made available in IBM Analytics Engine after a time interval required for internal testing. Watch out for the announcement of a new Spark versions in the Release Notes section and upgrade the runtime of your instance to move your applications to latest Spark runtime. Older runtimes are be deprecated and eventually removed as newer versions are released. Make sure you test your applications on the new runtime before making changes on the production instances. | |
Grant role-based access | You should grant role-based access to all users on the IBM Analytics Engine instances based on their requirements. For example, only your automation team should have permissions to submit applications because it has access to secrets and your DevOps team should only be able to see the list of all applications and their states. | |
Choose the right IBM Cloud Object Storage configuration |
|
|
Use private endpoints for the external Hive metastore | If you are using Spark SQL and want to use an external metastore such as use IBM Cloud Databases for PostgreSQL as your Hive metastore, you must use the private endpoint for the database connection for better performance and cost savings. | |
Running applications with resource overcommitment | There is a quota associated with each Analytics Engine Serverless instance. When applications are submitted on an instance, they are allocated resources from the instance quota. If an application requests resources beyond the available quota, the application will either not start or will run with less than the requested resources, which might result in the application running slower than expected or, in some cases, in the application failing. You should always monitor the current resource consumption on an instance to ensure that your applications are running comfortably within the given limits. You can adjust the limits through a support ticket if required. | |
Static allocation of resources versus autoscaling |
When you submit applications, you can specify the number of executors upfront (static allocation) or use the autoscaling option (dynamic allocation). Before you decide whether to use static allocation or autoscaling, you might want to run a few benchmarking tests by varying different data sets with both static and autoscaling to find the right configuration. General considerations:
|
|
Enable and fine-tune forward logging |
|
|
Customize your service instance |
|
|
Apply filters when retrieving list of applications | When you need to retrieve list of applications either in UI or using the API or CLI, it is better to apply the appropriate filters and retrieve the set that you need. | |
Use other services or tools for supporting functions | Apart from using an IBM Log Analysis and IBM Cloud Object Storage instance and depending on your use case, you might want to use other supporting tools and services. For instance, you can use Apache Airflow (managed by you) for orchestrating, scheduling and automating your applications. You can also make use of IBM Secrets Manager to store the secrets required for your applications and use your automation scripts to read the secrets from the Secrets Manager before submitting your applications. You can also get creative with your application arguments, passing a token required to read the required secrets from the Secrets Manager directly from within your application. | |
Use instances in alternate regions for backup and disaster recovery | Currently, IBM Analytics Engine Serverless instances can be created in two regions, namely Dallas(us-south ) and Frankfurt(eu-de ). Although it is advisable to create your instances in the same region where your data
is located, it is always useful to create a backup instance in an alternate region with the same set of configurations as your primary instance, in case the primary instance becomes unavailable or unusable. Your automations should enable
switching application submissions between the two regions if required. |
NA |
Use separate buckets and service credentials for application files, data files, and home instance |
Use the "separation of concerns" principle to distinguish the access between different resources.
|
|
Applications must run within 72 hours | There is a limit on the number of hours an application or kernel can run. For security and compliance patching, all runtimes that run for more than 72 hours are stopped. If you do have a large application, break your application into smaller chunks that will run within 72 hours. If you are running Spark streaming applications, make sure that you configure checkpoints and have monitoring in place to restart your applications if they are stopped | |
Start and Stop Spark History only when needed | Always stop the Spark history server when you no longer need to use it. Keep in mind that the Spark history server consumes CPU and memory resources continuously while its state is started. |