FAQs

This is a collection of frequently asked questions (FAQs) about the IBM® watsonx.data service.

General

What is IBM® watsonx.data?

IBM® watsonx.data is the hybrid, open data lakehouse to power AI and analytics with all your data, anywhere. It is a data management solution for collecting, storing, querying, and analyzing all your enterprise data (structured, semi-structured, and unstructured) with a single unified data platform. It provides a flexible and reliable platform that is optimized to work on open data formats.

What can I do with IBM® watsonx.data?

You can use IBM® watsonx.data to collect, store, query, and analyze all your enterprise data with a single unified data platform. You can connect to data in multiple locations and get started in minutes with built-in governance, security, and automation. You can use multiple query engines to run analytics, and AI workloads, reducing your data warehouse costs by up to 50%.

Which data formats are supported in IBM® watsonx.data?

The following data formats are supported in IBM® watsonx.data:

Ingestion: Data ingestion in IBM® watsonx.data supports CSV and Parquet data file formats.
Create table from file: Create table from file in IBM® watsonx.data supports CSV, Parquet, JSON, and TXT data file formats.

What are the key features of IBM watsonx.data?

The key features of IBM® watsonx.data are:

An architecture that fully separates compute, metadata, and storage to offer ultimate flexibility.
Multiple engines such as Presto (Java), Presto (C++), and Spark that provide fast, reliable, and efficient processing of big data at scale.
Open formats for analytic data sets, allowing different engines to access and share the data at the same time.
Data sharing between watsonx.data, Db2® Warehouse, and Netezza Performance Server or any other data management solution through common Iceberg table format support, connectors, and a shareable metadata store.
Built-in governance that is compatible with existing solutions, including IBM Knowledge Catalog.
Cost-effective, simple object storage is available across hybrid-cloud and multicloud environments.
Integration with a robust ecosystem of IBM’s best-in-class solutions and third-party services to enable easy development and deployment of key use cases.

What is the maximum size of the default IBM managed storage?

The IBM-managed storage is a default 10 GB storage.

Presto

What is Presto?

Presto is a distributed SQL query engine, with the capability to query vast data sets located in different data sources, thus solving data problems at scale.

What are the Presto server types?

A Presto installation includes three server types: coordinator, worker, and resource manager.

What SQL statements are supported in IBM watsonx.data?

For information on supported SQL statements, see Supported SQL statements.

Metastore

What is HMS (Hive Metastore)?

Hive Metastore (HMS) is a service that stores metadata that is related to Presto and other services in a backend Relational Database Management System (RDBMS) or Hadoop Distributed File System (HDFS).

Installation and setup

How can I provision an IBM® watsonx.data service instance?

To provision an instance, see Getting started with watsonx.data.

How can I delete my IBM® watsonx.data instance?

To delete an instance, see Deleting watsonx.data instance.

How can I access the IBM® watsonx.data web console?

To access the IBM® watsonx.data web console, login to your IBM Cloud account and follow the steps as mentioned here Open the web console in Getting started with watsonx.data.

How can I provision an engine?

From the IBM® watsonx.data web console, go to Infrastructure manager to provision an engine. For more information, see Provisioning an Engine.

How can I configure catalog or metastore?

To configure a catalog with an engine, see Associating a catalog with an engine.

How can I configure a storage?

From the IBM® watsonx.data web console, go to Infrastructure manager to configure a storage. For more information, see Adding a storage-catalog pair.

Access

How can I manage IAM access for IBM® watsonx.data?

IBM Cloud® Identity and Access Management (IAM) controls access to IBM® watsonx.data service instances for users in your account. Every user that accesses the IBM® watsonx.data service in your account must be assigned an access policy with an IAM role. For more information, see Managing IAM access for watsonx.data.

How can I add and remove the users?

To add or remove users in a component, see Managing user access.

How is the access control for users provided?

To provide access control for users to restrict unauthorized access, see Managing data policy rules.

What is the process to assign access to a user?

To assign access to a user, see Managing roles and privileges.

What is the process to assign access to a group?

To assign access to a group, see Managing roles and privileges.

Presto Engine

How can I create an engine?

To create an engine, see Provisioning an Engine.

How can I pause and resume an engine?

To pause an engine, see Pause an Engine.

To resume a paused engine, see Resume an Engine.

How can I delete an engine?

To delete an engine, see Deleting an engine.

How can I run SQL queries?

You can use the Query workspace interface in IBM® watsonx.data to run SQL queries and scripts against your data. For more information, see Running SQL queries.

Databases and Connectors

How can I add a database?

To add a database, see Adding a database-catalog pair.

How can I remove a database?

To remove a database, see Deleting a database-catalog pair.

What data sources does IBM® watsonx.data currently support?

IBM® watsonx.data currently supports the following data sources:

IBM Db2
IBM Netezza
Apache Kafka
MongoDB
MySQL
PostgreSQL
SQL Server
Custom
Teradata
SAP HANA
Elasticsearch
SingleStore
Snowflake
IBM Data Virtualization Manager for z/OS

How can I load the data into the IBM® watsonx.data?

There are 3 ways to load the data into the IBM® watsonx.data.

Web console: You can use the Ingestion jobs tab from the Data manager page to securely and easily load data into the IBM® watsonx.data console. For more information, see Ingesting data by using Spark.
Command-Line Interface: You can load data into IBM® watsonx.data through CLI. For more information, see Loading or ingesting data through CLI.
Creating tables: You can load or ingest local data files to create tables by using the Create table option. For more information, see Creating tables.

How can I create tables?

You can create table through the Data manager page by using the web console. For more information, see Creating tables.

How can I create schema?

You can create schema through the Data manager page by using the web console. For more information, see Creating schema.

How can I query the loaded data?

You can use the Query workspace interface in IBM® watsonx.data to run SQL queries and scripts against your data. For more information, see Running SQL queries.

Ingestion

What are the storage options available?

The storage options available are IBM Storage Ceph, IBM Cloud Object Storage (COS), AWS S3, and MinIO object storage.

What type of data files can be ingested?

Only Parquet and CSV data files can be ingested.

Can a folder of multiple files be ingested together?

Yes a folder of multiple data files be ingested. A S3 folder must be created with data files in it for ingesting. The source folder must contain either all parquet files or all CSV files. For detailed information on S3 folder creation, see Preparing for ingesting data.

What commands are supported in the command-line interface during ingestion?

For commands supported in the command-line interface during ingestion, see Loading or ingesting data through CLI.

Pricing plans

Where can I learn more about each pricing plan?

watsonx.data as a service offers three pricing plans:

Lite plan: It provides a free usage limit of 500 Resource Units (monitored on the Billing and usage page of IBM Cloud) within a time frame of 30 days. The cap value is displayed on the IBM Cloud catalog provisioning page and is reflected on your billing page within your watsonx.data instance upon provisioning.
Enterprise plan: You pay by hour for each infrastructure resource that you add. Start with support services then build the engines and services that you want. This has an hourly rate that is computed in Resource Units that maps to your payment method whether ‘Pay as You Go’ or ‘Subscription’.

For more information, see Subscription plans.

Lite plan

Is the lite plan credit card free?

Yes, if you use an IBM cloud trial account the lite plan is credit card free. You have a set amount of free usage limit of 500 Resource Units within a time frame of 30 days, whichever ends first to try the product. For more information, see Subscription plans.

What's included in the lite plan?

The lite plan is provided for you to try the basic features of watsonx.data and is available to all IBM Cloud account types like trial, pay-as-you-go, and subscription. It supports the basic features only. It is not available on AWS and is limited to one watsonx.data instance per IBM Cloud account (cross-regional).

Key supported features:

Ability to pause and resume Presto engine.
Ability to connect to an IBM Cloud-provided Cloud Object Storage (COS) and provide credentials to your own COS or S3 storage.
Ability to delete Presto, Milvus, and connections to your own storage.

Limitations:

It is limited to provisioning a single instance per resource group.
It is limited to 500 resource units (RUs) before the instance is suspended. The cap value is displayed on the IBM Cloud catalog provisioning page and is reflected on your billing page within your watsonx.data instance upon provisioning. Your license expires on reaching either the cap limit of 500 RUs or exceeding the trial period of 30 days.
It is limited to a maximum of one Presto engine or Milvus service with starter size (1.25 RUs per hour) or both.
It is limited to the smallest node sizes and profiles for each engine and service. You cannot increase the node size.
The lite instances cannot be used for production purposes.
The lite instances might be removed any time and are unrecoverable (no BCDR).
Engine scaling functions are not available.

What is the limit for using the lite plan?

The lite plan of watsonx.data instance is typically a trial account that is free to use, with limits on capacity (500 Resource Units), features for a time frame of 30 days. You can use the account to explore and familiarize yourself with watsonx.data. You need to create a paid IBM cloud account (either 'Pay as you go' or 'Subscription') and then provision an enterprise plan instance to access all the features and functions.

I have exhausted all my resource units. How do I delete my lite plan instance?

You can delete the lite plan instance from the resource group or IBM cloud resource collection will remove it after a period of 40 days.

The lite plan has ended. How do I upgrade to the enterprise plan?

Either before or after your lite plan has concluded, you can create a paid account whether 'Subscription' or 'Pay as you go' IBM Cloud. Now, you can create your new watsonx.data instance. The enterprise plan is available on IBM Cloud and AWS environments. You may create an enterprise plan instance once you have created a paid IBM cloud account (either 'Subscription' or 'Pay as you go') and then you can use a Cloud Object Store bucket that you own to store data. For more information, see How to create instance for watsonx.data enterprise plan and see How to use a Cloud Object Store bucket that you own to store data.

How do I save data from a lite plan to an enterprise plan?

You may create an IBM Cloud Object Store (COS) bucket that you own and connect it to your lite plan instance of watsonx.data. You can then write data to that COS bucket that you own. Then, once you have created a paid IBM cloud account (either 'Pay as you go' or 'Subscription'), you can create an enterprise instance of watsonx.data and connect it to the same COS bucket that you own to keep working with the same data files.

Enterprise plan

What is included in the enterprise plan?

In addition to the lite plan, the enterprise plan includes the following features:

You pay by hour for each infrastructure resource that you add. Starting with support services then build the engines and services that you want. This has an hourly rate that is computed in Resource Units that maps to your payment method whether ‘Pay as You Go’ or ‘Subscription’.
Presto and external Spark engine and Milvus service.
Hive metastore and Iceberg catalog.
Infrastructure manager and query editor.
Db2 Warehouse and Netezza integration.
Ability to scale (increase and decrease) node sizes for Presto engines.
Available on both IBM Cloud and AWS environments.

What are the different payment plans under the enterprise plan?

The different payment plans under the enterprise plan are ‘Subscription’ or ‘Pay as you go’.

Is the cost for services like Milvus included in the enterprise plan?

Yes, Milvus service is included in the enterprise plan.