Release notes for watsonx.data

Use these release notes to learn about the latest updates to IBM® watsonx.data that are grouped by date.

10 April 2025 - Version 2.1.2 Hotfix 1

Engine and service enhancements

This release of watsonx.data introduces the following service enhancement:

Introduced Tiny Milvus, a lightweight, single-node deployment of the Milvus vector database, which is tailored for experimentation and early-stage development.

Tiny Milvus provides the core Milvus experience and is designed specifically for use within the watsonx.ai platform. It serves as an entry point for vector-based AI exploration with minimal resource requirements to help ensure effective data management and analysis. It is distinct from other Milvus configurations available within watsonx.data, which support broader scalability and enterprise-grade features.

Tiny Milvus supports up to 10K vectors, making it suitable for quick trials and early experimentation without heavy infrastructure. It is not intended for production workloads.

For more information about using Tiny Milvus, see Setting up a watsonx.data Milvus vector store.

04 April 2025 - Version 2.1.2

watsonx.data 2.1.2 version is releasing to different geographic regions in stages and is not available in all regions. To know if the 2.1.2 release is available in your region, contact IBM Support. If you are currently using watsonx.data 2.1.1 version, you can refer to the documentation, watsonx.data 2.1.1.

Data sources and storage enhancements

This release of watsonx.data includes the following storage enhancement:

Now you can connect to IBM Db2 for i data source. For information about IBM Db2 for i, see IBM Db2 for i.

Connectivity enhancements

This release of watsonx.data includes the following Connectivity enhancement:

You can now securely and privately connect to a watsonx.data instance by using virtual private endpoints. For information about configuring network endpoints in watsonx.data, see Setting up virtual private endpoints.

Integration enhancements

This release of watsonx.data introduces the following enhanced integrations with other services:

Now, you can define IBM Knowledge Catalog governance policies for Presto (C++) engine when you integrate with watsonx.data. For information about connecting to IBM Knowledge Catalog (IKC), see Connecting to IBM Knowledge Catalog (IKC).
You can now export configuration files for target Presto engine, based on their ODBC driver selection (Simba or CData), to more easily establish connections with watsonx.data. This enhancement saves you from manually configuring Presto engine details by using PowerBI. For more information about connecting to Presto by using the Config files, see Connecting to Presto by using the Config files.
Integrating with Data Product Hub: You can integrate watsonx.data with DPH to package SQL tables and queries into data products tailored for specific use cases. For details, see Integrating with Data Product Hub.

Ingestion enhancement

This release of watsonx.data includes the following Ingestion enhancement:

Ingestion jobs using an external Spark engine now provide logs within watsonx.data. This enhancement allows users to effectively identify and troubleshoot job execution directly within the watsonx.data on cloud platform (SaaS instance). The details of the ingestion procedure is available in Ingesting data by using Spark through the web console.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancement:

You can now use the Azure Data Lake Storage Gen2 with AccessKey Authmode with Spark engine to store your data while submitting Spark applications. For information about Azure Data Lake Storage Gen2, see Azure Data Lake Storage.

Query workspace enhancements

This release of watsonx.data introduces the following query workspace enhancement:

You now have the option to cancel one or multiple running queries. Additionally, you can remove queries from the worksheet after they are canceled or successfully completed, making it easier to keep your workspace organized. For more information, see Running SQL queries.

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

Administrators can now configure access for IBM Db2 and IBM Netezza. They can assign roles for watsonx.data users to view, edit, and administer the IBM Netezza and IBM Db2 engines. For information about the resource-level permissions, see (Db2 and Netezza).
Administrators can now grant or revoke specific permissions to users or roles when creating and viewing their own schemas. For information about data policy rules, see Managing data policy rules.
DAS proxy flow, which was previously deprecated, is now removed and is no longer available in watsonx.data.

Query History Monitoring and Management (QHMM) enhancement

This release of watsonx.data introduces the following QHMM enhancements:

You can now select the Presto engine that is associated to a QHMM catalog when you configure query monitoring in watsonx.data. For information about configuring QHMM, see Configuring query monitoring.
You can now use the migration script to transfer QHMM data from the source bucket to the destination bucket in watsonx.data. For more information about using the migration script, see QHMM Shell Script usage.

CPDCTL CLI enhancements

This release of watsonx.data introduces the following enhancements to IBM Cloud Pak for Data Command Line Interface (IBM cpdctl):

Starting in version 2.1.2, the wx-data command is available by default, which enables you to do operations such as, ingesting, managing engines, and so on, in watsonx.data.
You can use the wx-data engine create and wx-data engine delete commands to provision and delete all available engines in watsonx.data.
You can use the sparkjob command to submit, list, and get the details of a Spark application.
INSTANCE_ID used in setting the instance environment is replaced with WX_DATA_INSTANCE_ID.

For more information, see IBM cpdctl.

28 February 2025 - Version 2.1.1

New region availability

watsonx.data is now available in Toronto region for Lite and Enterprise plans. To provision, see Provisioning watsonx.data Lite plan and Provisioning watsonx.data Enterprise plan.

Data sources and storage enhancements

This release of watsonx.data includes the following storage enhancements:

Now, you can test connections for the following data sources and storage:
- Apache Phoenix
- IBM Data Virtualization Manager
- BigQuery
- Google Cloud Storage
You can now register and load external pre-existing Hudi and Delta tables on an object storage by using Register table and load table metadata APIs.

Ingestion enhancement

After an ingestion job is completed, you can now access the ingested data directly from the Ingestion History page, which streamlines your workflow and saves time.

Integration enhancements

This release of watsonx.data introduces the following enhanced integrations with other services:

The Connection Information page now includes:
- Presto configuration details for DBT integration. You can copy the Presto configuration details that are required for DBT integration from this page.
- Option to export the TDS file, which includes the Presto engine configuration details that are required for Tableau integration.

For more information, see Getting connection information.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

You can now create a Spark application from the Applications tab of the Spark engine details page. For more information, see Submitting Spark application from Console.
You can now use Spark version, 3.5.4 to run the applications in watsonx.data. In watsonx.data, Apache Spark 3.4.4 and Apache Spark 3.5.4 are the supported versions.
Milvus allows the following:
- In Milvus you can now do a hybrid GroupBy search based on multiple vector columns and also customize the group size when you run search queries. For more information, see Connecting watsonx Assistant to watsonx.data Milvus for custom search.
- Milvus now supports custom size with a capacity of 3 billion vectors with a maximum of 1,024 dimensions.
- Milvus now allows scaling up or down between predefined T-shirt sizes (small, medium, and large) or custom sizes. For more information, see Adding Milvus service.
Starting from watsonx.data 2.1.1 version, Milvus 2.5.0 is supported. For more information, see Milvus.

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

The Access Management Service (AMS) in watsonx.data can now use JSON Web Token (JWT) authentication for incoming requests from Presto, ensuring secure and efficient access control. For more information, see Connecting to Presto engine through Presto CLI (Remote).
You can now assign users and roles to infrastructure components in batches of Twenty. For more information, see Managing user access.
You can now use Apache Ranger Hadoop SQL policies to govern data with Spark engines. You can define Ranger policies when the Spark engine accesses data from Hadoop clusters. Enabling Ranger policy ensures robust data security and governance. With the Ranger policy, you can configure table authorization (L3), row-level filtering, and column masking for data. For more information, see Enabling Apache Ranger policy for resources.

CPDCTL CLI enhancements

IBM CPDCTL CLI is now used to configure and manage different operations in watsonx.data. Using the CPDCTL CLI, you can manage configuration settings, run ingestion jobs, manage engines, data sources, and storages. The following two plugins are currently used to execute these operations:

config - To configure watsonx.data service environment and users.
wx-data - To perform other operations such as, ingesting, managing engines, etc in watsonx.data. For more information, see IBM cpdctl.

watsonx.data developer edition is now enabled in IBM CPDCTL version v1.6.104 and later.

Deprecated features

The following features are deprecated in this release:

The Data Access Service (DAS) proxy feature is now deprecated and will be removed in a future release. You cannot use the Data Access Service (DAS) proxy feature to access object storage (S3, ADLS and ABS). If you use DAS proxy flow and face any issues, contact IBM support. For an overview of the DAS feature, see Data Access Service (DAS).
IBM Client package is now deprecated and shall be removed in a future release. The utilities and commands in Client package is replaced with IBM CPDCTL CLI. For more information about how to use IBM CPDCTL CLI, see IBM cpdctl.

04 February 2025 - Version 2.1.0 Hotfix 2

Lite plan enhancement: IBM® watsonx.data Lite plan is now available in the Sydney region. For more information to provision a Lite plan instance in Sydney region, see Provisioning Lite plan.

10 January 2025 - Version 2.1.0 Hotfix 1

Enterprise plan enhancement: If you use IBM Cloud CLI to provision an Enterprise plan instance in the Sydney region, you must use the plan name lakehouse-enterprise-mcsp. For more information, see Provision an instance through CLI.

13 December 2024 - Version 2.1.0

Data sources and storage enhancements

This release includes the following new data sources and storage enhancements:

Now you can connect to Apache Phoenix data sources. For more information, see Apache Phoenix
If you work with MySQL data sources, now you can manage drivers in the Driver manager section of the Configurations page. Each of these drivers goes through a series of validation steps. You can no longer test MySQL connections. For more information, see MySQL.

When you upgrade to version 2.1.0, any existing MySQL catalog is no longer linked to the engine. This means that you need to reestablish the connection between the MySQL catalog and the engine.

Test connection feature is now available for the following data sources supported by Arrow Flight service:
- Apache Derby
- Salesforce
- Greenplum
- MariaDB
Now you can test connection for Azure Data Lake Storage (ADLS) and IBM Data Virtualization Manager for z/OS data source.

Integration enhancements

This release of watsonx.data introduces the following new or enhanced integrations with other services:

You can now enable Databand connection from the Configurations page. For more information, see Monitoring Spark application runs by using Databand.
You can now retrieve the Presto connection information from the watsonx.data instance > Configurations > Connection information page for the following integration:
- BI tools
- DataBuildTool (dbt)
Starting with watsonx.data version 2.1, you can only integrate with one of the following policy engines:
- Apache Ranger
- IBM Knowledge Catalog (IKC)

For more information, see Connection information.

You can now integrate IBM Manta Data Lineage with watsonx.data to capture and publish jobs, runs, and dataset events from Spark through the Manta UI. For more information, see IBM Manta Data Lineage.
You can now use all of the Presto data types with the dbt adapter for Presto. Specify the data type as column_types in the dbt_project.yml. For more information, see Installing and using dbt-watsonx-presto.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

You can now use the Azure Data Lake Storage Gen2 with AccessKey Authmode and Google Cloud Storage with Presto (C++) engine. You can now use Azure Data Lake Storage (ADLS) and Google Cloud Storage to store your data while submitting Spark applications. For more information, see Azure Data Lake Storage and Google Cloud Storage.
You can now use Google Cloud Storage (GCS) with Data Access Service (DAS) to store your data while submitting Spark applications. For more information, see Submitting Spark application by using native Spark engine.
You can now enable the Spark Access Control extension to access and operate on the Hive and Hudi catalogs. For more information, see Enhancing Spark application submission using Spark access control extension for external Spark and Enhancing Spark application submission using Spark access control extension for native Spark.
You can now select a watsonx.data Spark engine as a runtime environment in watsonx.ai notebooks. This allows you to run Jupyter notebooks on your watsonx.data native Spark engine. For more information, see Working with watsonx.ai Notebooks.
Presto administrators can now configure JMX metrics through API. Currently, Only alphanumeric characters are allowed for the key in JMX property names. For more information, see Update presto engine.

Query history information by using ibm-lh utility

You can get the following Query history information by using ibm-lh utility:

Basic query information.
Basic error information of failed queries.
Query stats information.
Query memory information.
Query garbage collection information.
Top time taken query.
Memory usage details of queries.
Information after joining the two tables.
Information containing all the columns of a table.
Information about the errors in the query.
Count of all error codes.
Count of all failure messages.
Count of all failure types.

For more information, see Retrieving QHMM logs by using ibm-lh utility.

Ingestion enhancements

This release of watsonx.data introduces the following ingestion enhancements:

Target table preview: Before submitting an ingestion job, users can now preview the target table schema and edit the column headers and data types. This allows for validation and ensures data is ingested into the correct table structure. For more information, see Ingesting data by using Spark through the web console.
Java/Spark-based ingestion for table creation: The Data Manager now includes an option to create tables using the Java/Spark-based ingestion flow navigating to Local ingestion, providing flexibility and control based on file size and other factors. For more information, see Creating table and Ingesting data by using Spark through the web console.
Enhanced source storage support:
- Azure Data Lake Storage (ADLS): Support for ingesting data directly from ADLS is now available.
- Google Cloud Storage (GCS): Support for ingesting data directly from GCS is now available.
Transient storage: Users can now select the external bucket to use as a staging area for local ingestions. If no storage is specified, watsonx.data can infer and select an appropriate bucket. For more information, see Ingesting data by using Spark through the web console.

Introduction to Metadata Service (MDS)

Starting from the 2.1 release, watsonx.data uses Metadata Service (MDS) instead of Hive Metastore (HMS). MDS is compatible with modern, open catalog APIs, Unity Catalog API, and Apache Iceberg REST Catalog API, enabling wider tool integration and increased flexibility. This new architecture delivers comparable performance while it continues to support Spark and Presto clients through the existing Thrift or HMS interface. For more information, see Metadata Service (MDS) overview.

It is recommended to use MDS in your test environments and then move to using it in production.

Deprecated features

The following feature is deprecated in this release:

The REST API feature to capture DDL changes in watsonx.data through the event listener will be deprecated from watsonx.data release version 2.1.

13 November 2024 - Version 2.0.4 Hotfix

Lite plan enhancements

This hotfix release includes the following Lite plan enhancements:

Lite plan now includes a dedicated read-only sample IBM COS storage associated to the Presto engine to support querying sample and benchmarking data.
You can now work with tpcds sample worksheets for high performance use cases and Gosales sample worksheet for Data engineering and GenAI use cases.
Query Optimizer is now automatically enabled for High Performance BI use cases.

29 October 2024 - Version 2.0.4

Engine and service enhancements

This release includes the following engine and service enhancements:

The default value of the task.max-drivers-per-task property for Presto (Java) and Presto (C++) workers is now set based on the number of vCPUs.
You can enable the file pruning functionality in Query History Monitoring and Management (QHMM) from the Query monitoring page. You can also configure the maximum size and threshold percentage for the QHMM storage bucket. When the threshold is met during file upload or when a cleanup scheduler runs (default every 24 hours), older data is deleted. For more information, see Configuring query monitoring.
Query History Monitoring and Management (QHMM) no longer stores the diagnostic data in the default IBM Managed trial bucket (wxd-system). To store the diagnostic data, you must now use a storage type supported for QHMM. For more information about using your own storage, see Configuring query monitoring.
You can now verify query optimization status by checking the wxdQueryOptimized parameter in the JSON file. For more information, see Running queries from the Presto (C++) CLI or Query workspace.

Data sources enhancements

This release includes the following data sources and storage enhancements:

Test connection feature is now available for the following data sources:
- Apache Pinot
- Cassandra
- Prometheus
New data source SAP HANA is now available. You can use Driver manager under the Configurations page to manage drivers for SAP HANA data source. Each of these drivers undergoes a series of validations. For more information on SAP HANA data source and BYOJ process, see SAP HANA.

Lite plan

To enhance usability, the system catalogs (cmx and system) are now hidden for Lite plan users. The Lite plan instance with Presto (C++) engine includes tpch as the benchmarking catalog and the instance with Presto (Java) engine include tpch and tpcds as the benchmarking catalogs.

Deprecated features

The following features are deprecated in this release:

The REST API feature to capture DDL changes in watsonx.data through event listener is deprecated in this release and will be removed from watsonx.data with version 2.1 release.
Support for Apache Spark 3.3 runtime is deprecated. You must upgrade to Spark 3.4. To update the Apache Spark version, see Editing the Spark engine details.

25 September 2024 - Version 2.0.3

Data sources and storage enhancements

This release includes the following new data sources and storage enhancements:

You can now enable Azure Data Lake Storage Gen1 Blob and Google Cloud Storage for Milvus. For more information, see ADLS Gen1 Blob and Google Cloud Storage.
You can create or add a new data source to the engine without attaching a catalog to it. A catalog can be attached to the data source at a later stage.
You can now use Apache Ozone storage for the Presto (Java) engine. For more information, see Apache Ozone.
You can now configure the Apache Kafka data source to use the Salted Challenge Response Authentication Mechanism (SCRAM) authentication mechanism. You can upload a self-signed certificate. For more information, see Apache Kafka.

Integration enhancements

This release of watsonx.data introduces the following new or enhanced integrations with other services:

You can now integrate watsonx.data with data build tool (dbt) for Spark engine for in-place data transformation within watsonx.data. For more information, see About dbt integration.
You can integrate watsonx.data with Databand. This integration can enhance the monitoring capabilities by providing insights that extend beyond Spark UI and Spark History. For more information, see Monitoring Spark application runs by using Databand.
You can integrate watsonx.data with the following Business Intelligence (BI) visualization tools to access the connected data sources and build compelling and interactive data visualizations:
- Tableau
- Looker
- Domo
- Qlik
- PowerBI
For more information, see About BI visualization tools.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

Iceberg tables are supported by Query Optimizer. For more information, see Query Optimizer.
You can now use the data build tool (dbt-watsonx-presto) adapter to build, test, and document data models for the Presto (Java) engine. For more information, see dbt-watsonx-presto.
A new customization property (file-column-names-read-as-lower-case) is now available for Presto (C++) engine to avoid upper case and lower case mismatch in columns names. For more information, see Catalog properties for Presto (C++).

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

You can now add users and user groups to define data policy rules. For more information, see Data policy.
Administrators can now select TPCDS and TPCH catalogs to create access control policies. ‘Select’ is the only allowed operation to define rules with these catalogs. To define data policies, see Data policy.
Administrators can now edit resource group configuration after creating the resource group. For more information, see Configuring Presto resource groups.

IBM Knowledge Catalog governance policies for data sources

You can now apply IBM Knowledge Catalog governance policies to the following data sources in Presto:

Oracle
PostgreSQL
MySQL
SQL Server
Db2

Ingestion enhancements

This release of watsonx.data includes the following improvements to the ingestion workflow:

You can now submit an ingestion job using the data sources. For more information, see Ingesting data by using Spark through the web console.
You can now ingest data using AVRO, and ORC file formats. For more information, see About data ingestion.
You can preview uploaded files and click table headers to edit column names. For more information, see Ingesting data by using Spark through the web console.
You can access and view Spark logs associated with an ingestion job. For more information, see Accessing Spark logs for ingestion jobs.

Lite plan

You can provision your Lite plan instance based on the following three use cases. Select one use case from the list to proceed:

Generative AI : You can explore Generative AI use cases using this option. The provisioned instance includes Presto, Milvus, and Spark.
High Performance BI : You can explore BI visualization functionalities using this option. The provisioned instance includes Presto (C++) and Spark.
Data Engineering Workloads : You can use data engineering workload to explore various workload driven use cases. The provisioned instance includes Presto (Java) and Spark.

For more information, see Lite plan.

27 August 2024 - Version 2.0.2

Data sources and storage enhancements

This release includes the following new data sources and storage enhancements:

Content Aware Storage (CAS) is now called Data Access Service (DAS).
Apache Hive is upgraded to version 4.0.0.
You can now view the DAS endpoint from the Storage details page. For more information, see Exploring storage objects.

Integration enhancements

This release of watsonx.data introduces the following new or enhanced integrations with other services:

You can now use the governance capabilities of IBM Knowledge Catalog for SQL views within the watsonx.data platform. For more information, see Integrating with IBM Knowledge Catalog (IKC).
IBM watsonx.data now supports Apache Ranger policies to govern data with Presto (C++) engines. For more information, see Apache Ranger policy.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

Instance administrators can now configure resource groups in Presto. For more information, see Resource groups.
You can now use an API to execute queries and retrieve results. For more information, see API.
You can now configure or change the log level of Presto (Java) through API customization. For more information, API.
You can now generate Number of Distinct Values (NDV) column statistics with the Iceberg Spark Analyze procedure to enhance the Spark Cost-Based Optimizer (CBO) for improved query planning.
You can now use the custom data source option to connect to Black Hole and Local File connectors for the Presto (Java) engine. For more information, see Custom data source.
You can now generate JSON snippet for Presto engine and Milvus service. You can copy/paste it over to the watsonx.data Presto and Milvus connector UI in IBM Cloud Pak for Data and watsonx to simplify the connection creation. For more information, see Getting connection information.

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

You can now control access to Presto (C++) engines. For more information, see Engine (Presto (Java) or Presto (C++)).
You can now grant component access to users and user groups in batch. For more information, see Managing user access.
You can now have System Access Control (SAC) plug-in logs with DEBUG information in Presto. For more information, see API customization.

Ingestion enhancements

This release of watsonx.data introduces the following ingestion enhancements:

Ingestion workflow in watsonx.data is now simplified to submit an ingestion job, and support local file ingestion. For more information, see Ingesting data by using Spark through the web console.
You can now ingest data using JSON file format. For more information, see About data ingestion.
CSV file properties are now available as parameters supporting ibm-lh data-copy. For more information, see Options and parameters supported in ibm-lh tool.
New environment variables are available for Spark ingestion through ibm-lh tool command line. For more information, see Spark ingestion through ibm-lh tool command line.

01 August 2024 - Version 2.0.1

Data sources

You can now connect to Db2 data sources by using IBM API key as the authentication mechanism. For more information, see IBM Db2.
Presto (C++) engine can now be associated with Arrow Flight service data sources. Read only operations are supported. The following Arrow Flight service data sources are supported:
- Salesforce
- MariaDB
- Greenplum
- Apache Derby

For more information, see Arrow Flight service.

The following new databases are available for Presto (Java) engine:
- Redis
- Apache Druid
- For more information, see Redis and Apache Druid.

Integrations

When integrating IBM Knowledge Catalog with IBM watsonx.data, you can configure data protection rules for individual rows in a table, allowing users to access a subset of rows in a table. For more information, see Filtering rows.
You can now apply the following Apache Ranger policies for Presto (Java) engines:
- Row-level filtering: Users can access a subset of rows in a table. For more information, see Adding row-level filtering policy.
- Column masking: Restrict users to seeing masked values instead of displaying sensitive data. For more information, see Adding column masking policy.
You can now integrate IBM watsonx.data with on-premises IBM DataStage. You can use DataStage service to load and to read data from IBM watsonx.data. For more information, Integrating with DataStage.

Authentication and authorization

The Spark access control extension allows additional authorization, enhancing security at the time of application submission. If you enable the extension in the spark configuration, only authorized users are allowed to access and operate IBM watsonx.data catalogs through Spark jobs. For more information, see Enhancing Spark application submission using Spark access control extension.
IBM watsonx.data now supports object storage proxy and signature for Azure Data Lake Storage and Azure Blob Storage. For more information, see Using DAS proxy to access ADLS and ABS compatible buckets.
Lightweight Directory Access Protocol (LDAP) is now provided for Teradata and Db2 data sources. The user needs to set up this configuration at the server level. For Teradata, explicitly choose the authentication mechanism type as LDAP in the UI. For more information, Teradata.

DAS proxy to access ADLS and ABS buckets and LDAP enhancements are Tech preview in version 2.0.1.

Milvus now supports partition-level isolation for users. Administrators can authorize specific user actions on partitions. For more information, see Service (Milvus).

Storage

You can now add the following storage to Presto (Java) engine in IBM watsonx.data:
- Azure Data Lake Storage Gen2
- Azure Data Lake Storage Gen1 Blob

For more information, see Azure Data Lake Storage Gen2 and Azure Data Lake Storage Gen1 Blob.

You can modify the access key and secret key of a user-registered bucket for a storage. This feature is not applicable to default buckets, ADLS, or Google Cloud Storage. This feature can only be used if the new credentials successfully pass the test connection.

Engines

You can now use the ALTER TABLE ADD, DROP, and RENAME column statements for MongoDB data source.
You can now configure how Presto handles unsupported data types. For more information, see ignore-unsupported-datatypes.

Catalogs

You can now associate and disassociate catalogs to an engine in bulk through UI under Manage associations in the Infrastructure manager page.

API Customization and properties

The following customization parameters are added for Presto (C++) workers:
- system-mem-limit-gb
- system-mem-shrink-gb
- system-mem-pushback-enabled
For more information, see Configuration properties for Presto (C++) - worker nodes.
The configuration property optimizer.size-based-join-flipping-enabled is added for Presto (C++) coordinator nodes. For more information, see Configuration properties for Presto (C++) - coordinator nodes.
Enhanced API customization to support data cache and fragment result cache for performance improvement.For more information, see Configuration properties for Presto (Java) - coordinator and worker nodes and Catalog properties for Presto (Java).

Infrastructure manager

You can use search feature for the following values on the Infrastructure manager page:
- database name
- registered hostname
- created by username
You can now use the ‘Do Not Disturb’ toggle switch in the Notifications section under the bell icon to enable or disable pop-up notifications.
You can find the connectivity information under the Connect information tile in the Configurations page. This information can be copied and downloaded to a JSON snippet.

Query Workspace

You can run queries on all tables under a schema through the SQL query workspace without specifying the path <catalog>.<schema> by selecting the required catalogs and schemas from the new drop down list. For more information, Running SQL queries.

watsonx.data pricing plans

You can now delete the existing Lite plan instance before reaching the account cap limit of 2000 RUs, and create a new instance and consume the remaining resource units available in the account. For more information, see watsonx.data Lite plan.

03 July 2024 - Version 2.0.0

New data types for data sources

The following new data types are now available for some data sources. You can access these data types on the Data manager page under the Add column option.

BLOB
- Db2
- Teradata
- Oracle
- MySQL
- SingleStore
CLOB
- Db2
- Teradata
- Oracle
BINARY
- SQL Server
- MySQL

Because the numeric data type is not supported in watsonx.data, you can use the decimal data type as an equivalent alternative to the numeric data type for Netezza data source.

You can now use the BLOB and CLOB data types with the SELECT statement in the Query workspace to build and run queries against your data for Oracle and SingleStore data sources.

You can now use the BLOB and CLOB data types for MySQL and PostgreSQL data sources as equivalents to LONGTEXT, BYTEA, and TEXT because these data types are not compatible with Presto (Java). These data types are mapped to CLOB and BLOB in Presto (Java) if data sources have existing tables with LONGTEXT, TEXT, and BYTEA data types.

MySQL (CLOB as equivalent to LONGTEXT)
PostgreSQL (CLOB as equivalent to TEXT)
PostgreSQL (BLOB as equivalent to BYTEA)
Netezza (decimal as equivalent to numeric)
Oracle (BLOB and CLOB with the SELECT statement)
SingleStore (BLOB and CLOB with the SELECT statement)

New operations for Db2 data source

You can perform the following operations for BLOB and CLOB data types for Db2 data source:

INSERT
CREATE
CTAS
ALTER
DROP

New Arrow Flight service based data sources

You can now use the following data sources with Arrow Flight service:

Greenplum
Salesforce
MariaDB
Apache Derby

For more information, see Arrow Flight service.

New data sources

You can now use the following data sources:

Cassandra
BigQuery
ClickHouse
Apache Pinot

For more information, see Adding a database-catalog pair.

Command to retrieve ingestion history

You can now retrieve the status of all ingestion jobs that are submitted by using the ibm-lh get-status --all-jobs CLI command. You can retrieve the status of all ingestion jobs that are submitted. You get the history records that you have access to. For more information, see Options and parameters supported in ibm-lh tool.

Additional roles for IBM Knowledge Catalog (IKC) S2S authorization

Besides data access, IBM Knowledge Catalog S2S authorization needs metadata access and Console API access to integrate with watsonx.data. The following new roles are created for IKC service access configuration:

Viewer
Metastore viewer

Apache Ranger policies

IBM watsonx.data now supports Apache Ranger policies to allow integration with Presto engines. For more information, see Apache Ranger policy.

Version upgrade

Presto (Java) engine is now upgraded to version 0.286.
Milvus service is now upgraded to version to 2.4.0. Important features include:
- Better Performance (Low Memory Utilisation)
- Support Sparse Data
- Inbuilt SPLADE Engine for Sparse Vector Embedding
- BGE M3 Hybrid (Dense+Sparse) Search

Hive Metastore (HMS) access in watsonx.data

You can now fetch metadata information for Hive Metastore by using REST APIs instead of getting the information from the engine details. HMS details are used by external entities to integrate with watsonx.data. You must have an Admin, Metastore Admin, or Metastore Viewer role to run the API.

Semantic automation for data enrichment

Semantic automation for data enrichment leverages generative AI with IBM Knowledge Catalog to understand your data on a deeper level and enhance data with automated enrichment to make it valuable for analysis. Semantic layer integration is available for Lite plan users only as a 30 days trial version. For more information, see Semantic automation for data enrichment in watsonx.data.

Query Optimizer to improve query performance

You can now use Query Optimizer, to improve the performance of queries that are processed by the Presto (C++) engine. If Query Optimizer determines that optimization is feasible, the query undergoes rewriting; otherwise, the native engine optimization takes precedence. For more information, see Query Optimizer overview.

New name for Presto engine in watsonx.data

Presto is renamed to Presto (Java).

New engine (Presto C++) in watsonx.data

You can provision a Presto (C++) engine ( version 0.286) in watsonx.data to run SQL queries on your data source and fetch the queried data. For more information, see Presto (C++) overview.

Using proxy to access S3 and S3 compatible buckets

External applications and query engines can access the S3 and S3 compatible buckets managed by watsonx.data through an S3 proxy. For more information, see Using S3 proxy to access S3 and S3 compatible buckets.

Mixed case feature flag for Presto (Java) engine

The mixed case feature flag, which allows to switch between case sensitive and case insensitive behavior in Presto (Java), is available. The flag is set to OFF by default and can be set to ON during the deployment of watsonx.data. For more information, see Presto (Java) mixed-case support overview.

New storage type Google Cloud Storage

You can now use new storage type Google Cloud Storage. For more information, see Adding storage-catalog pair.

31 May 2024 - Version 1.1.5

Provision Spark engine in watsonx.data Lite plan

You can now add a small-sized Spark engine (single node) in the watsonx.data Lite plan instance. For more information, see watsonx.data Lite plan.

Updates related to Spark labs

Working with Jupyter Notebooks from Spark labs

: You can now install the Jupyter extension from the VS Code Marketplace inside your Spark lab and work with Jupyter Notebooks. For more information, see Create Jupyter Notebooks.

Accessing Spark UI from Spark labs

You can now access the Spark user interface (UI) from Spark labs to monitor various aspects of running a Spark application. For more information, see Accessing Spark UI from Spark labs.

New region to provision for IBM Cloud instance

You can now provision your IBM Cloud instance in the Sydney region.

30 Apr 2024 - Version 1.1.4

A new version of watsonx.data was released in April 2024.

This release includes the following features and updates:

Kerberos authentication for HDFS connections

You can now enable Kerberos authentication for secure Apache Hadoop Distributed File System (HDFS) connections. For more information, see HDFS.

New data sources

The following new data sources are now available:

Oracle
Amazon Redshift
Informix
Prometheus

For more information, see Data sources.

Test SSL connections

You can now test SSL connections for the MongoDB and SingleStore data sources.

Uploading description files for Apache Kafka data source

The Apache Kafka data source stores data as byte messages that producers and consumers must interpret. To query this data, consumers must first map it into columns. Now, you can upload topic description files that convert raw data into a table format. Each file must be a JSON file that contains a definition for a table. To upload these JSON files from the UI, go to the overview page of the Apache Kafka database that you registered and select the Add topic option. For more information, see Apache Kafka.

License plans for watsonx.data

IBM® watsonx.data now offers the following license plans.

Lite plan
Enterprise plan

For more information about the different license plans, see IBM® watsonx.data pricing plans.

Presto (Java) engine version upgrade

The Presto (Java) engine is now upgraded to version 0.285.1.

Pause or resume Milvus

You can now pause or resume Milvus service. Pausing your service can avoid incurring charges.

Spark is now available as a native engine

In addition to registering external Spark engines, you can now provision native Spark engine on your IBM watsonx.data instance. With native Spark engine, you can fully manage Spark Engine configuration, manage access to Spark Engines and view applications by using watsonx.data UI and REST API endpoints. For more information, see Provisioning Native Spark engine.

Ingest data using native Spark Engines

You can now submit ingestion jobs using native Spark Engines. For more information, see Working with different table formats.

27 Mar 2024 - Version 1.1.3

A new version of watsonx.data was released in March 2024.

This release includes the following features and updates:

New data type for some data sources

You can now use the BINARY data type with the SELECT statement in the Query workspace to build and run queries against your data for the following data sources:

Elasticsearch
SQL Server
MySQL

New data types: BLOB and CLOB are available for MySQL, PostgreSQL, Snowflake, SQL Server, and Db2 data sources. You can use these data types only with SELECT statements in the Query workspace to build and run queries against your data.

Delete data by using the DELETE FROM feature for Iceberg data sources

You can now delete data from tables in Iceberg data sources by using the DELETE FROM feature.

You can specify the table property delete mode for new tables by using either copy-on-write mode or merge-on-read mode (default). For more information, see SQL statements.

ALTER VIEW statement for Iceberg data source

You can now use the following SQL statement in the Query workspace to build and run queries against your data for ALTER VIEW:

ALTER VIEW name RENAME TO new_name

Upload SSL certificates for Netezza Performance Server data sources

You can now browse and upload the SSL certificate for SSL connections in Netezza Performance Server data sources. The valid file formats for SSL certificate are .pem, .crt, and .cer. You can upload SSL certificates by using the Adding a database-catalog pair option in the Infrastructure manager.

Query data from Db2 and Watson Query

You can now query nicknames that are created in Db2 and virtualized tables from Watson Query instances.

SSL connection for IBM Data Virtualization Manager for z/OS data source

You can now enable SSL connection for the IBM Data Virtualization Manager for z/OS data source by using the Add database user interface to secure and encrypt the database connection. Select Validate certificate to validate whether the SSL certificate that is returned by the host is trusted. You can choose to provide the hostname in the SSL certificate.

Use data from Apache Hudi catalog

You can now connect to and use data from Apache Hudi catalog.

Add Milvus as a service in watsonx.data

You can now provision Milvus as a service in watsonx.data with the following features:

Provision different storage variants such as starter, medium, and large nodes.
Assign Admin or User roles for Milvus users: User access policy is now available for Milvus users. Using the Access Control UI, you can assign Admin or User roles for Milvus users and also grant, revoke, or update the privilege.
Configure the Object storage for Milvus to store data. You can add or configure a custom bucket and specify the username, password, region, and bucket URL.

For more information, see Milvus.

Load data in batch by using the ibm-lh ingestion tool

You can now use the ibm-lh ingestion tool to run batch ingestion procedures in non-interactive mode (from outside the ibm-lh-tools container), by using the ibm-lh-client package. For more information, see ibm-lh commands and usage.

Creating schema by using bulk ingestion in web console

You can now create a schema by using the bulk ingestion process in the web console, if the schema is not previously created.

Use time-travel queries in Apache Iceberg tables

You can now run the following time-travel queries by using branches and tags in Apache Iceberg table snapshots:

- SELECT *FROM <table name> FOR VERSION AS OF 'historical-tag'

- SELECT *FROM <table name> FOR VERSION AS OF 'test-branch'

Access Cloud Object Storage without credentials You can now access your Cloud Object Storage bucket without credentials, by using the Data Access Service (DAS) endpoint. For more information about getting DAS endpoint, see Getting DAS endpoint.

28 Feb 2024 - Version 1.1.2

A new version of watsonx.data was released in February 2024.

This release includes the following features and updates:

SSL connection for data sources

You can now enable SSL connection for the following data sources by using the Add database user interface to secure and encrypt the database connection. :

Db2
PostgreSQL

For more information, see Adding a database.

Secure ingestion job history

Now, users can view only their own ingestion job history. Administrators can view the ingestion job history for all users.

SQL enhancements

You can now use the following SQL statements in the Query workspace to build and run queries against your data:

Apache Iceberg data sources
- CREATE VIEW
- DROP VIEW
MongoDB data sources
- DELETE

New data types BLOB and CLOB for Teradata data source

New data types BLOB and CLOB are available for Teradata data source. You can use these data types only with SELECT statements in the Query workspace to build and run queries against your data.

Create a new table during data ingestion

Previously, you had to have a target table in watsonx.data for ingesting data. Now, you can create a new table directly from the source data file (available in parquet or CSV format) by using data ingestion from the Data Manager. You can create the table by using the following methods of ingestion:

Ingesting data by using Iceberg copy loader.
Ingesting data by using Spark.

Perform ALTER TABLE operations on a column

With an Iceberg data source, you can now perform ALTER TABLE operations on a column for the following data type conversions:

int to bigint
float to double
decimal (num1, dec_digits) to decimal (num2, dec_digits), where num2>num1.

Better query performance by using sorted files

With an Apache Iceberg data source, you can generate sorted files, which reduce the query result latency and improve the performance of Presto (Java). Data in the Iceberg table is sorted during the writing process within each file.

You can configure the order to sort the data by using the sorted_by table property. When you create the table, specify an array of one or more columns involved in sorting. To disable the feature, set the session property sorted_writing_enabled to false.

31 Jan 2024 - Version 1.1.1

A new version of watsonx.data was released in January 2024.

This release includes the following features and updates:

IBM Data Virtualization Manager for z/OS® connector

You can now use the new IBM Data Virtualization Manager for z/OS® connector to read and write IBM Z® without moving, replicating, or transforming the data. For more information, see Connecting to an IBM Data Virtualization Manager (DVM) data source.

Teradata connector is enabled for multiple ALTER TABLE statements

Teradata connector now supports the ALTER TABLE RENAME TO, ALTER TABLE DROP COLUMN, and ALTER TABLE RENAME COLUMN column_name TO new_column_name statements.

Support for time travel queries

Iceberg connector for Presto (Java) now supports time travel queries.

The property format_version now shows the current version

The property format_version now shows the correct value (current version) when you create an Iceberg table.

29 Nov 2023 - Version 1.1.0

A new version of watsonx.data was released in November 2023.

This release includes the following features and updates:

Presto (Java) case-sensitive behavior

The Presto (Java) behavior is changed from case-insensitive to case-sensitive. Now you can provide the object names in the original case format as in the database. For more information, see Case-sensitive search configuration with Presto (Java).

Roll-back feature

You can use the Rollback feature to rollback or rollforward to any snapshots for Iceberg tables.

Capture Data Definition Language (DDL) changes

You can now capture and track the DDL changes in watsonx.data by using an event listener. For more information, see Capturing DDL changes.

Ingest data by using Spark

You can now use the IBM Analytics Engine that is powered by Apache Spark to run ingestion jobs in watsonx.data.

For more information, see Ingesting data by using Spark.

Integration with Db2 and Netezza Performance Server

You can now register Db2 or Netezza Performance Server engines in watsonx.data console.

For more information, see Registering an engine.

New connectors

You can now use connectors in watsonx.data to establish connections to the following types of databases:

Teradata
Delta Lake
Elasticsearch
SingleStoreDB
Snowflake

For more information, see Adding a database.

AWS EMR for Spark

You can now run Spark applications from Amazon Web Services Elastic MapReduce (AWS EMR) to achieve the watsonx.data Spark use cases:

Data ingestion
Data querying
Table maintenance

For more information, see Using AWS EMR for Spark use case.

7 July 2023 - Version 1.0.0

watsonx.data is a new open architecture that combines the elements of the data warehouse and data lake models. The best-in-class features and optimizations available on the watsonx.data make it an optimal choice for next generation data analytics and automation. In the first release (watsonx.data 1.0.0), the following features are supported:

Creating, scaling, pausing, resuming, and deleting the Presto (Java) query engine
Associating and dissociating a catalog with an engine
Exploring catalog objects
Adding and deleting a database-catalog pair
Updating database credentials
Adding and deleting bucket-catalog pair
Exploring bucket objects
Loading data
Exploring data
Querying data
Query history