Working with watsonx.ai Notebooks

Applies to : Spark engine Gluten accelerated Spark engine

IBM® watsonx.data integrates with watsonx.ai to allow a web-based working experience with Jupyter Notebook. You can use the watsonx.ai interface to build your own code in the Jupyter Notebook, and run it by using watsonx.data Spark as the runtime environment.

For more information about Jupyter Notebook, see Notebooks.

Prerequisites

Subscription to watsonx.ai on IBM Cloud.

Procedure

To run the Jupyter Notebook on your watsonx.data spark engine, do the following:

Create watsonx.ai project. To create a watsonx.ai project, see Creating a project.
Create a Spark engine environment. To run a Jupyter Notebook, you must create a runtime environment template.

To do that, access the watsonx. ai project from the UI. Go to Manage tab. Create a template. For more information about creating environment templates, see Creating environment templates. While creating the template, select Type as Spark and from the Spark engine list, select the native Spark engine that you provisioned in the watsonx.data instance.
Create a Jupyter Notebook asset and access it from the Jupyter Notebook editor tool. To create a notebook file in the notebook editor, see Creating a notebook file in the notebook editor.

When you create the notebook, specify the runtime environment that you created for the watsonx.data spark engine.

The notebook opens in edit mode. You can start working on it. For more information, see Creating a notebook file in the notebook editor.

Accessing the watsonx.data catalog

Add the following code snippet in the notebook cell and run it. The code snippet includes configurations that are required to connect to the associated watsonx.data catalog.

conf=spark.sparkContext.getConf()
spark.stop()

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, to_utc_timestamp
import base64,getpass

wxd_username=getpass.getpass("Please enter your username with hms access:").strip() #Prompt for username
wxd_hms_username="ibmlhapikey_"+wxd_username
wxd_hms_password=getpass.getpass("Please enter your api key with hms access:").strip() #Prompt for api key
string_to_encode= wxd_hms_username+":"+wxd_hms_password
wxd_encoded_apikey="Basic " + base64.b64encode(string_to_encode.encode("utf-8")).decode("utf-8")

conf.setAll([("spark.hive.metastore.client.plain.username", wxd_hms_username), \
    ("spark.hive.metastore.client.plain.password", wxd_hms_password), \
    ("spark.hadoop.wxd.apikey", wxd_encoded_apikey)
])

spark = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()

You can step through the notebook execution cell, by selecting Shift-Enter or you can run the entire notebook. It prompts for the username and password. Username is the IBM Cloud ID of the user whose api key is used to access the data bucket. The API Key here is the API key of the user accessing the Object storeage. To generate an API key, log in into the watsonx.data console and navigate to Profile > Profile and Settings > API Keys and generate a new API key.

You can add more code snippets based on your use case and continue. For more information, see Creating a notebook file in the notebook editor.