Provisioning a Spark engine
IBM® watsonx.data allows you to add Spark engines. You can either provision a native Spark engine or register an external Spark engine. Native Spark engine is a compute engine that resides within IBM® watsonx.data. External Spark engines are engines that exist in a different environment from where watsonx.data is available.
Support for Spark 3.3 runtime is deprecated and the default version will be changed to Spark 3.4 runtime. To ensure a seamless experience and to leverage the latest features and improvements, switch to Spark 3.4.
To add a Spark engine, complete the following steps.
-
Log in to watsonx.data console.
-
From the navigation menu, select Infrastructure manager.
-
To add a Spark engine, click Add component and click Next.
-
In the Add component page, from the Engines section, select IBM Spark.
-
In the Add component - IBM Spark page, configure the following details:
a. In the Add component - IBM Spark window, enter the Display name for your Spark engine.
b. Choose the Registration mode. Based on your requirement, you can select one of the following options:
- Create a native Spark engine : To provision a native Spark engine.
- Register an external Spark engine : To register an external Spark engine.
c. If you choose Create a native Spark engine, configure the following details:
Provisioning Spark engine Field Description Default Spark version Select the Spark runtime version that must be considered for processing the applications. Engine home bucket Select the registered Cloud Object Storage bucket from the list to store the Spark events and logs that are generated while running spark applications.
Note Make sure you do not select the IBM-managed bucket as Spark Engine home. If you select an IBM-managed bucket, you cannot access it to view the logs.
For more information, see Before you begin.Reserve capacity - Select the Node Type.
- Enter the number of nodes in the No of nodes field.
Associated catalogs (optional) Select the catalogs that must be associated with the engine. Note Provisioning time of the native Spark engine varies depending on the number and type of nodes that you add to the engine.
d. If you choose Register an external Spark engine, configure the following details:
Registering IBM Analytics Engine (Spark) Field Description Display name Enter your compute engine name. Instance API endpoint Enter the IBM Analytics engine instance endpoint. For more information, see Retrieving service endpoints API key Enter the API key. -
Click Create. The engine is provisioned and is displayed in the Infrastructure Manager page.
Related API
For information on related API, see Create Spark engine, Pause engine, Resume engine, Scale Spark engine, and List Spark version.