IBM Cloud Docs
Release notes for watsonx.data

Release notes for watsonx.data

Use these release notes to learn about the latest updates to IBM® watsonx.data that are grouped by date.

27 August 2024 - Version 2.0.2

Data sources and storage enhancements

This release includes the following new data sources and storage enhancements:

  • Content Aware Storage (CAS) is now called Data Access Service (DAS).

  • Apache Hive is upgraded to version 4.0.0.

  • You can now view the DAS endpoint from the Storage details page. For more information, see Exploring storage objects.

Integration enhancements

This release of watsonx.data introduces the following new or enhanced integrations with other services:

  • You can now use the governance capabilities of IBM Knowledge Catalog (IKC) for SQL views within the watsonx.data platform. For more information, see Integrating with IBM Knowledge Catalog (IKC).

  • IBM watsonx.data now supports Apache Ranger policies to govern data with Presto (C++) engines. For more information, see Apache Ranger policy.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

  • Instance administrators can now configure resource groups in Presto. For more information, see <link to be defined (to the associated topic>.

  • You can now use an API to execute queries and retrieve results. For more information, see API.

  • You can now configure or change the log level of Presto (Java) through API customization. For more information, API.

  • You can now generate Number of Distinct Values (NDV) column statistics with the Iceberg Spark Analyze procedure to enhance the Spark Cost-Based Optimizer (CBO) for improved query planning.

  • You can now use the custom data source option to connect to Black Hole and Local File connectors for the Presto (Java) engine. For more information, see Custom data source.

  • You can now generate JSON snippet for Presto engine and Milvus service. You can copy/paste it over to the watsonx.data Presto and Milvus connector UI in IBM Cloud Pak for Data and watsonx to simplify the connection creation. For more information, see Getting connection information.

  • The default value of the task.max-drivers-per-task property for Presto (Java) and Presto (C++) workers is now set based on the number of vCPUs.

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

Ingestion enhancements

This release of watsonx.data introduces the following ingestion enhancements:

01 August 2024 - Version 2.0.1

Data sources

  • You can now connect to Db2 data sources by using IBM API key as the authentication mechanism. For more information, see IBM Db2.
  • Presto (C++) engine can now be associated with Arrow Flight service data sources. Read only operations are supported. The following Arrow Flight service data sources are supported:
    • Salesforce
    • MariaDB
    • Greenplum
    • Apache Derby

For more information, see Arrow Flight service.

  • The following new databases are available for Presto (Java) engine:

Integrations

  • When integrating IBM Knowledge Catalog with IBM watsonx.data, you can configure data protection rules for individual rows in a table, allowing users to access a subset of rows in a table. For more information, see Filtering rows.

  • You can now apply the following Apache Ranger policies for Presto (Java) engines:

  • You can now integrate IBM watsonx.data with on-premises IBM DataStage. You can use DataStage service to load and to read data from IBM watsonx.data. For more information, Integrating with DataStage.

Authentication and authorization

  • The Spark access control extension allows additional authorization, enhancing security at the time of application submission. If you enable the extension in the spark configuration, only authorized users are allowed to access and operate IBM watsonx.data catalogs through Spark jobs. For more information, see Enhancing Spark application submission using Spark access control extension.

  • IBM watsonx.data now supports object storage proxy and signature for Azure Data Lake Storage and Azure Blob Storage. For more information, see Using DAS proxy to access ADLS and ABS compatible buckets.

  • Lightweight Directory Access Protocol (LDAP) is now provided for Teradata and Db2 data sources. The user needs to set up this configuration at the server level. For Teradata, explicitly choose the authentication mechanism type as LDAP in the UI. For more information, Teradata.

DAS proxy to access ADLS and ABS buckets and LDAP enhancements are Tech preview in version 2.0.1.

  • Milvus now supports partition-level isolation for users. Administrators can authorize specific user actions on partitions. For more information, see Service (Milvus).

Storage

  • You can now add the following storage to Presto (Java) engine in IBM watsonx.data:
    • Azure Data Lake Storage Gen2
    • Azure Data Lake Storage Gen1 Blob

For more information, see Azure Data Lake Storage Gen2 and Azure Data Lake Storage Gen1 Blob.

  • You can modify the access key and secret key of a user-registered bucket for a storage. This feature is not applicable to default buckets, ADLS, or Google Cloud Storage. This feature can only be used if the new credentials successfully pass the test connection.

Engines

  • You can now use the ALTER TABLE ADD, DROP, and RENAME column statements for MongoDB data source.
  • You can now configure how Presto handles unsupported data types. For more information, see ignore-unsupported-datatypes.

Catalogs

  • You can now associate and disassociate catalogs to an engine in bulk through UI under Manage associations in the Infrastructure manager page.

API Customization and properties

Infrastructure manager

  • You can use search feature for the following values on the Infrastructure manager page:
    • database name
    • registered hostname
    • created by username
  • You can now use the ‘Do Not Disturb’ toggle switch in the Notifications section under the bell icon to enable or disable pop-up notifications.
  • You can find the connectivity information under the Connect information tile in the Configurations page. This information can be copied and downloaded to a JSON snippet.

Query Workspace

  • You can run queries on all tables under a schema through the SQL query workspace without specifying the path <catalog>.<schema> by selecting the required catalogs and schemas from the new drop down list. For more information, Running SQL queries.

watsonx.data pricing plans

  • You can now delete the existing Lite plan instance before reaching the account cap limit of 2000 RUs, and create a new instance and consume the remaining resource units available in the account. For more information, see watsonx.data Lite plan.

03 July 2024 - Version 2.0.0

New data types for data sources

The following new data types are now available for some data sources. You can access these data types on the Data manager page under the Add column option.

  • BLOB

    • Db2
    • Teradata
    • Oracle
    • MySQL
    • SingleStore
  • CLOB

    • Db2
    • Teradata
    • Oracle
  • BINARY

    • SQL Server
    • MySQL

Because the numeric data type is not supported in watsonx.data, you can use the decimal data type as an equivalent alternative to the numeric data type for Netezza data source.

You can now use the BLOB and CLOB data types with the SELECT statement in the Query workspace to build and run queries against your data for Oracle and SingleStore data sources.

You can now use the BLOB and CLOB data types for MySQL and PostgreSQL data sources as equivalents to LONGTEXT, BYTEA, and TEXT because these data types are not compatible with Presto (Java). These data types are mapped to CLOB and BLOB in Presto (Java) if data sources have existing tables with LONGTEXT, TEXT, and BYTEA data types.

  • MySQL (CLOB as equivalent to LONGTEXT)
  • PostgreSQL (CLOB as equivalent to TEXT)
  • PostgreSQL (BLOB as equivalent to BYTEA)
  • Netezza (decimal as equivalent to numeric)
  • Oracle (BLOB and CLOB with the SELECT statement)
  • SingleStore (BLOB and CLOB with the SELECT statement)

New operations for Db2 data source

You can perform the following operations for BLOB and CLOB data types for Db2 data source:

  • INSERT
  • CREATE
  • CTAS
  • ALTER
  • DROP

New Arrow Flight service based data sources

You can now use the following data sources with Arrow Flight service:

  • Greenplum
  • Salesforce
  • MariaDB
  • Apache Derby

For more information, see Arrow Flight service.

New data sources

You can now use the following data sources:

  • Cassandra
  • BigQuery
  • ClickHouse
  • Apache Pinot

For more information, see Adding a database-catalog pair.

Command to retrieve ingestion history

You can now retrieve the status of all ingestion jobs that are submitted by using the ibm-lh get-status --all-jobs CLI command. You can retrieve the status of all ingestion jobs that are submitted. You get the history records that you have access to. For more information, see Options and parameters supported in ibm-lh tool.

Additional roles for IBM Knowledge Catalog (IKC) S2S authorization

Besides data access, IBM Knowledge Catalog S2S authorization needs metadata access and Console API access to integrate with watsonx.data. The following new roles are created for IKC service access configuration:

  • Viewer
  • Metastore viewer

Apache Ranger policies

IBM watsonx.data now supports Apache Ranger policies to allow integration with Presto engines. For more information, see Apache Ranger policy.

Version upgrade

  • Presto (Java) engine is now upgraded to version 0.286.
  • Milvus service is now upgraded to version to 2.4.0. Important features include:
    • Better Performance (Low Memory Utilisation)
    • Support Sparse Data
    • Inbuilt SPLADE Engine for Sparse Vector Embedding
    • BGE M3 Hybrid (Dense+Sparse) Search

Hive Metastore (HMS) access in watsonx.data

You can now fetch metadata information for Hive Metastore by using REST APIs instead of getting the information from the engine details. HMS details are used by external entities to integrate with watsonx.data. You must have an Admin, Metastore Admin, or Metastore Viewer role to run the API.

Semantic automation for data enrichment

Semantic automation for data enrichment leverages generative AI with IBM Knowledge Catalog to understand your data on a deeper level and enhance data with automated enrichment to make it valuable for analysis. Semantic layer integration is available for Lite plan users only as a 30 days trial version. For more information, see Semantic automation for data enrichment in watsonx.data.

Query Optimizer to improve query performance

You can now use Query Optimizer, to improve the performance of queries that are processed by the Presto (C++) engine. If Query Optimizer determines that optimization is feasible, the query undergoes rewriting; otherwise, the native engine optimization takes precedence. For more information, see Query Optimizer overview.

New name for Presto engine in watsonx.data

Presto is renamed to Presto (Java).

New engine (Presto C++) in watsonx.data

You can provision a Presto (C++) engine ( version 0.286) in watsonx.data to run SQL queries on your data source and fetch the queried data. For more information, see Presto (C++) overview.

Using proxy to access S3 and S3 compatible buckets

External applications and query engines can access the S3 and S3 compatible buckets managed by watsonx.data through an S3 proxy. For more information, see Using S3 proxy to access S3 and S3 compatible buckets.

Mixed case feature flag for Presto (Java) engine

The mixed case feature flag, which allows to switch between case sensitive and case insensitive behavior in Presto (Java), is available. The flag is set to OFF by default and can be set to ON during the deployment of watsonx.data. For more information, see Presto (Java) mixed-case support overview.

New storage type Google Cloud Storage

You can now use new storage type Google Cloud Storage. For more information, see Adding storage-catalog pair.

31 May 2024 - Version 1.1.5

Provision Spark engine in watsonx.data Lite plan

You can now add a small-sized Spark engine (single node) in the watsonx.data Lite plan instance. For more information, see watsonx.data Lite plan.

Updates related to Spark labs

  • Working with Jupyter Notebooks from Spark labs

: You can now install the Jupyter extension from the VS Code Marketplace inside your Spark lab and work with Jupyter Notebooks. For more information, see Create Jupyter Notebooks.

  • Accessing Spark UI from Spark labs

You can now access the Spark user interface (UI) from Spark labs to monitor various aspects of running a Spark application. For more information, see Accessing Spark UI from Spark labs.

New region to provision for IBM Cloud instance

You can now provision your IBM Cloud instance in the Sydney region.

30 Apr 2024 - Version 1.1.4

A new version of watsonx.data was released in April 2024.

This release includes the following features and updates:

Kerberos authentication for HDFS connections

You can now enable Kerberos authentication for secure Apache Hadoop Distributed File System (HDFS) connections. For more information, see HDFS.

New data sources

The following new data sources are now available:

  • Oracle
  • Amazon Redshift
  • Informix
  • Prometheus

For more information, see Data sources.

Test SSL connections

You can now test SSL connections for the MongoDB and SingleStore data sources.

Uploading description files for Apache Kafka data source

The Apache Kafka data source stores data as byte messages that producers and consumers must interpret. To query this data, consumers must first map it into columns. Now, you can upload topic description files that convert raw data into a table format. Each file must be a JSON file that contains a definition for a table. To upload these JSON files from the UI, go to the overview page of the Apache Kafka database that you registered and select the Add topic option. For more information, see Apache Kafka.

License plans for watsonx.data

IBM® watsonx.data now offers the following license plans.

  • Lite plan
  • Enterprise plan

For more information about the different license plans, see IBM® watsonx.data pricing plans.

Presto (Java) engine version upgrade

The Presto (Java) engine is now upgraded to version 0.285.1.

Pause or resume Milvus

You can now pause or resume Milvus service. Pausing your service can avoid incurring charges.

Spark is now available as a native engine

In addition to registering external Spark engines, you can now provision native Spark engine on your IBM watsonx.data instance. With native Spark engine, you can fully manage Spark Engine configuration, manage access to Spark Engines and view applications by using watsonx.data UI and REST API endpoints. For more information, see Provisioning Native Spark engine.

Ingest data using native Spark Engines

You can now submit ingestion jobs using native Spark Engines. For more information, see Working with Apache Hudi catalog and Working with Delta Lake catalog.

27 Mar 2024 - Version 1.1.3

A new version of watsonx.data was released in March 2024.

This release includes the following features and updates:

New data type for some data sources

You can now use the BINARY data type with the SELECT statement in the Query workspace to build and run queries against your data for the following data sources:

  • Elasticsearch
  • SQL Server
  • MySQL

New data types: BLOB and CLOB are available for MySQL, PostgreSQL, Snowflake, SQL Server, and Db2 data sources. You can use these data types only with SELECT statements in the Query workspace to build and run queries against your data.

Delete data by using the DELETE FROM feature for Iceberg data sources

You can now delete data from tables in Iceberg data sources by using the DELETE FROM feature.

You can specify the table property delete mode for new tables by using either copy-on-write mode or merge-on-read mode (default). For more information, see SQL statements.

ALTER VIEW statement for Iceberg data source

You can now use the following SQL statement in the Query workspace to build and run queries against your data for ALTER VIEW:

ALTER VIEW name RENAME TO new_name

Upload SSL certificates for Netezza Performance Server data sources

You can now browse and upload the SSL certificate for SSL connections in Netezza Performance Server data sources. The valid file formats for SSL certificate are .pem, .crt, and .cer. You can upload SSL certificates by using the Adding a database-catalog pair option in the Infrastructure manager.

Query data from Db2 and Watson Query

You can now query nicknames that are created in Db2 and virtualized tables from Watson Query instances.

SSL connection for IBM Data Virtualization Manager for z/OS data source

You can now enable SSL connection for the IBM Data Virtualization Manager for z/OS data source by using the Add database user interface to secure and encrypt the database connection. Select Validate certificate to validate whether the SSL certificate that is returned by the host is trusted. You can choose to provide the hostname in the SSL certificate.

Use data from Apache Hudi catalog

You can now connect to and use data from Apache Hudi catalog.

Add Milvus as a service in watsonx.data

You can now provision Milvus as a service in watsonx.data with the following features:

  • Provision different storage variants such as starter, medium, and large nodes.

  • Assign Admin or User roles for Milvus users: User access policy is now available for Milvus users. Using the Access Control UI, you can assign Admin or User roles for Milvus users and also grant, revoke, or update the privilege.

  • Configure the Object storage for Milvus to store data. You can add or configure a custom bucket and specify the username, password, region, and bucket URL.

For more information, see Milvus.

Load data in batch by using the ibm-lh ingestion tool

You can now use the ibm-lh ingestion tool to run batch ingestion procedures in non-interactive mode (from outside the ibm-lh-tools container), by using the ibm-lh-client package. For more information, see ibm-lh commands and usage.

Creating schema by using bulk ingestion in web console

You can now create a schema by using the bulk ingestion process in the web console, if the schema is not previously created.

Use time-travel queries in Apache Iceberg tables

You can now run the following time-travel queries by using branches and tags in Apache Iceberg table snapshots:

- SELECT *FROM <table name> FOR VERSION AS OF 'historical-tag'

- SELECT *FROM <table name> FOR VERSION AS OF 'test-branch'

Access Cloud Object Storage without credentials You can now access your Cloud Object Storage bucket without credentials, by using the Data Access Service (DAS) endpoint. For more information about getting DAS endpoint, see Getting DAS endpoint.

28 Feb 2024 - Version 1.1.2

A new version of watsonx.data was released in February 2024.

This release includes the following features and updates:

SSL connection for data sources

You can now enable SSL connection for the following data sources by using the Add database user interface to secure and encrypt the database connection. :

  • Db2

  • PostgreSQL

For more information, see Adding a database.

Secure ingestion job history

Now, users can view only their own ingestion job history. Administrators can view the ingestion job history for all users.

SQL enhancements

You can now use the following SQL statements in the Query workspace to build and run queries against your data:

  • Apache Iceberg data sources
    • CREATE VIEW
    • DROP VIEW
  • MongoDB data sources
    • DELETE

New data types BLOB and CLOB for Teradata data source

New data types BLOB and CLOB are available for Teradata data source. You can use these data types only with SELECT statements in the Query workspace to build and run queries against your data.

Create a new table during data ingestion

Previously, you had to have a target table in watsonx.data for ingesting data. Now, you can create a new table directly from the source data file (available in parquet or CSV format) by using data ingestion from the Data Manager. You can create the table by using the following methods of ingestion:

  • Ingesting data by using Iceberg copy loader.

  • Ingesting data by using Spark.

Perform ALTER TABLE operations on a column

With an Iceberg data source, you can now perform ALTER TABLE operations on a column for the following data type conversions:

  • int to bigint

  • float to double

  • decimal (num1, dec_digits) to decimal (num2, dec_digits), where num2>num1.

Better query performance by using sorted files

With an Apache Iceberg data source, you can generate sorted files, which reduce the query result latency and improve the performance of Presto (Java). Data in the Iceberg table is sorted during the writing process within each file.

You can configure the order to sort the data by using the sorted_by table property. When you create the table, specify an array of one or more columns involved in sorting. To disable the feature, set the session property sorted_writing_enabled to false.

31 Jan 2024 - Version 1.1.1

A new version of watsonx.data was released in January 2024.

This release includes the following features and updates:

IBM Data Virtualization Manager for z/OS® connector

You can now use the new IBM Data Virtualization Manager for z/OS® connector to read and write IBM Z® without moving, replicating, or transforming the data. For more information, see Connecting to an IBM Data Virtualization Manager (DVM) data source.

Teradata connector is enabled for multiple ALTER TABLE statements

Teradata connector now supports the ALTER TABLE RENAME TO, ALTER TABLE DROP COLUMN, and ALTER TABLE RENAME COLUMN column_name TO new_column_name statements.

Support for time travel queries

Iceberg connector for Presto (Java) now supports time travel queries.

The property format_version now shows the current version

The property format_version now shows the correct value (current version) when you create an Iceberg table.

29 Nov 2023 - Version 1.1.0

A new version of watsonx.data was released in November 2023.

This release includes the following features and updates:

Presto (Java) case-sensitive behavior

The Presto (Java) behavior is changed from case-insensitive to case-sensitive. Now you can provide the object names in the original case format as in the database. For more information, see Case-sensitive search configuration with Presto (Java).

Roll-back feature

You can use the Rollback feature to rollback or rollforward to any snapshots for Iceberg tables.

Capture Data Definition Language (DDL) changes

You can now capture and track the DDL changes in watsonx.data by using an event listener. For more information, see Capturing DDL changes.

Ingest data by using Spark

You can now use the IBM Analytics Engine that is powered by Apache Spark to run ingestion jobs in watsonx.data.

For more information, see Ingesting data by using Spark.

Integration with Db2 and Netezza Performance Server

You can now register Db2 or Netezza Performance Server engines in watsonx.data console.

For more information, see Registering an engine.

New connectors

You can now use connectors in watsonx.data to establish connections to the following types of databases:

  • Teradata
  • Delta Lake
  • Elasticsearch
  • SingleStoreDB
  • Snowflake

For more information, see Adding a database.

AWS EMR for Spark

You can now run Spark applications from Amazon Web Services Elastic MapReduce (AWS EMR) to achieve the watsonx.data Spark use cases:

  • Data ingestion
  • Data querying
  • Table maintenance

For more information, see Using AWS EMR for Spark use case.

7 July 2023 - Version 1.0.0

watsonx.data is a new open architecture that combines the elements of the data warehouse and data lake models. The best-in-class features and optimizations available on the watsonx.data make it an optimal choice for next generation data analytics and automation. In the first release (watsonx.data 1.0.0), the following features are supported:

  • Creating, scaling, pausing, resuming, and deleting the Presto (Java) query engine
  • Associating and dissociating a catalog with an engine
  • Exploring catalog objects
  • Adding and deleting a database-catalog pair
  • Updating database credentials
  • Adding and deleting bucket-catalog pair
  • Exploring bucket objects
  • Loading data
  • Exploring data
  • Querying data
  • Query history