Quick start watsonx.data console
When you log in to the IBM® watsonx.data web console for the first time, you are presented with the quick start wizard. In this tutorial, you learn how to use the quick start wizard to configure the core components and get started with watsonx.data in a few minutes.
The wizard guides you through the initial configuration process for the infrastructure components of watsonx.data.
Selecting your primary goal
-
The Welcome to IBM watsonx.data page is the entry point for setting up your watsonx.data. It introduces the core value of watsonx.data and helps you choose an initial configuration based on your data management needs.
Available Goals
- Optimize performance for cost‑efficient data processing (with Spark)
-
Select this option if your primary objective is to process large datasets efficiently while minimizing infrastructure and compute costs. It uses Spark engine. This configuration is ideal for:
-
Batch data processing
-
ETL and ELT workloads
-
Data preparation and transformation
-
Cost-sensitive analytics pipelines
-
- Run scalable analytics and data processing workloads
-
Select this option if you need flexible, high-performance analytics across large datasets. It uses Engines: Presto, and Spark. By combining Presto for fast, interactive SQL queries and Spark for large-scale processing, this option is well-suited for analytical and data processing use cases.
-
Click Next. The Configure details page opens.
Configuring engine and storage details
To work with your data, you must configure the engine and storage details. Based on the selected goal, specify the details.
- Optimize performance for cost‑efficient data processing (with Spark) : You must setup the Spark engine and storage.
- Run scalable analytics and data processing workloads : You must setup the Spark engine, Presto engine, and storage.
Selected goal: Optimize performance for cost‑efficient data processing
a. In the Setup Spark section, select the Spark version, Spark size, and the number of Spark nodes to deploy. Click Next. The Setup storage section expands.
b. In the Setup storage section, select one of the following options and provide details.
-
Use existing pair : you can reuse an existing catalog–storage pair. This capability is available only in the Tokyo and Sydney SaaS regions. In this case, you can choose the Use existing pair option to aautomatically lists all storage-catalog pairs in your account that you have access to, allowing you to select an existing pair, avoid duplicate resources, and reduce setup time.
-
Discover COS instance : Selects an existing IBM COS instance and an attached bucket on your IBM Cloud account. If multiple IBM COS instances and buckets are detected, select the IBM COS instance that contains the desired bucket to register with watsonx.data. You must specify the catalog name that needs to be associated with the storage in the Catalog name field.
If you select the Discover COS instance option and your instance is provisioned in Tokyo and Sydney SaaS regions, you must explicitly enter the catalog name to support this account‑scoped behavior. To learn more about account‑scoped behavior, see Component scoping at account level.
-
Register my own : You can use any existing IBM COS bucket from an existing instance or provision a new instance. To provision a new IBM COS instance, provide the following details:
Add bucket Field Description Bucket Type Select from Amazon S3, IBM Storage Ceph, or IBM Cloud Object Storage. Region The region where the data bucket is available. Bucket Name Enter your bucket name. Display name Enter the bucket name to be displayed on-screen. Endpoint Enter the endpoint URL. Access key Enter your access key. Secret key Enter your secret key. Connection status Click the Test connection link to test whether the bucket connection with watsonx.data is successful or not. The system displays the status message.
If you select an existing IBM COS bucket, the default size is 10 GB. It is meant for an exploratory purpose and cannot be used to store production or sensitive data. The watsonx.data instance administrators can disable this bucket for compliance reasons.
When you register your own bucket, ensure to provide the correct details for bucket configuration. The quick start wizard does not validate the bucket configuration details and you cannot modify them later.
Selected goal: Run scalable analytics and data processing workloads
a. In the Setup Presto section, select the Presto engine type. Click Next. The Setup storage section expands.
b. For information about setting up Spark and storage details, see Optimize performance for cost‑efficient data processing (with Spark).
Finishing the setup
- Click Finish and go. The Preparing your journey page opens with a progress bar.
When the setup is complete, the watsonx.data home page appears. For information about the home page, see Getting started with the web console. Resource Unit consumption begins soon after creating the support services by using the quick start wizard. You can view the run rate that is submitted for billing from the billing and usage tab. For more information, see Billing and usage.
Enabling query monitoring (Optional)
If you have selected the Run scalable analytics and data processing workloads goal, the watsonx.data home page displays an information message to enable monitoring of the queries. Click Query monitoring link from the information message and do the following:
-
Use the toggle switch to enable (or disable) the query monitoring feature.
The associated catalog that appears with the query monitoring bucket is of type Hive.
-
If you enable the QHMM feature, you must configure the storage details for storing QHMM data. Select one of the following options and provide details.
Starting with version 2.3.1, you can complete your watsonx.data Quick Start setup by using the new feature that lets you specify and reuse an existing catalog–storage pair. This capability is available only in the Tokyo and Sydney SaaS regions. In this case, you can choose the Use existing pair option to aautomatically lists all storage-catalog pairs in your account that you have access to, allowing you to select an existing pair, avoid duplicate resources, and reduce setup time. If you select the Discover COS instance option, you must explicitly enter the catalog name to support this account‑scoped behavior. To learn more about account‑scoped behavior, see Component scoping at account level.
-
Discover COS instance : Selects an existing IBM COS instance and an attached storage on your IBM Cloud account. If multiple IBM COS instances and storages are detected, select the IBM COS instance that contains the desired storage to register with watsonx.data. You must specify the catalog name that needs to be associated with the storage in the Catalog name field.
-
Register my own : You can register an existing bucket as a QHMM bucket. Only the following bucket types can be registered as a QHMM bucket: Amazon S3, IBM Storage Ceph, or IBM Cloud Object Storage. To register an existing bucket as QHMM bucket, provide the following details:
Add bucket Field Description Bucket Type Select from Amazon S3, IBM Storage Ceph, or IBM Cloud Object Storage. Region The region where the data bucket is available. Bucket Name Enter your bucket name. Display name Enter the bucket name to be displayed on-screen. Endpoint Enter the endpoint URL. Access key Enter your access key. Secret key Enter your secret key. Connection status Click the Test connection link to test whether the bucket connection with watsonx.data is successful or not. The system displays the status message.
The storage (default or BYOB) can be changed at later point from the watsonx.data console page. See Query monitoring.
-
-
Click Next. The Infrastructure manager opens.
Next steps
You are all set to use the watsonx.data or you can configure it further.