IBM Cloud Docs
Release notes for watsonx.data

Release notes for watsonx.data

Use these release notes to learn about the latest updates to IBM® watsonx.data that are grouped by date.

13 December 2024 - Version 2.1.0

watsonx.data 2.1.0 version is releasing to different geographic regions in stages and is not available in all regions. To know if the 2.1.0 release is available in your region, contact IBM Support. If you are currently using watsonx.data 2.0.0 version, you can refer to the documentation, watsonx.data 2.0.0.

Data sources and storage enhancements

This release includes the following new data sources and storage enhancements:

  • Now you can connect to Apache Phoenix data sources. For more information, see Apache Phoenix

  • If you work with MySQL data sources, now you can manage drivers in the Driver manager section of the Configurations page. Each of these drivers goes through a series of validation steps. You can no longer test MySQL connections. For more information, see MySQL.

When you upgrade to version 2.1.0, any existing MySQL catalog is no longer linked to the engine. This means that you need to reestablish the connection between the MySQL catalog and the engine.

  • Test connection feature is now available for the following data sources supported by Arrow Flight service:

    • Apache Derby
    • Salesforce
    • Greenplum
    • MariaDB
  • Now you can test connection for Azure Data Lake Storage (ADLS) and IBM Data Virtualization Manager for z/OS data source.

Integration enhancements

This release of watsonx.data introduces the following new or enhanced integrations with other services:

  • You can now enable Databand connection from the Configurations page. For more information, see Monitoring Spark application runs by using Databand.

  • You can now retrieve the Presto connection information from the watsonx.data instance > Configurations > Connection information page for the following integration:

    • BI tools
    • DataBuildTool (dbt)

For more information, see Data visualization in watsonx.data with BI tools.

  • You can now integrate IBM Manta Data Lineage with watsonx.data to capture and publish jobs, runs, and dataset events from Spark through the Manta UI. For more information, see IBM Manta Data Lineage.

  • You can now use all of the Presto data types with the dbt adapter for Presto. Specify the data type as column_types in the dbt_project.yml. For more information, see Installing and using dbt-watsonx-presto.

  • You can now use the Birdwatcher debugging tool to check the state of Milvus system. For more information, see Birdwatcher debugging tool.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

Query history information by using ibm-lh utility

You can get the following Query history information by using ibm-lh utility:

  • Basic query information.
  • Basic error information of failed queries.
  • Query stats information.
  • Query memory information.
  • Query garbage collection information.
  • Top time taken query.
  • Memory usage details of queries.
  • Information after joining the two tables.
  • Information containing all the columns of a table.
  • Information about the errors in the query.
  • Count of all error codes.
  • Count of all failure messages.
  • Count of all failure types.

For more information, see Retrieving QHMM logs by using ibm-lh utility.

Ingestion enhancements

This release of watsonx.data introduces the following ingestion enhancements:

  • Target table preview: Before submitting an ingestion job, users can now preview the target table schema and edit the column headers and data types. This allows for validation and ensures data is ingested into the correct table structure. For more information, see Ingesting data by using Spark through the web console.

  • Java/Spark-based ingestion for table creation: The Data Manager now includes an option to create tables using the Java/Spark-based ingestion flow navigating to Local ingestion, providing flexibility and control based on file size and other factors. For more information, see Creating table and Ingesting data by using Spark through the web console.

  • Enhanced source storage support:

    • Azure Data Lake Storage (ADLS): Support for ingesting data directly from ADLS is now available.
    • Google Cloud Storage (GCS): Support for ingesting data directly from GCS is now available.
  • Transient storage: Users can now select the external bucket to use as a staging area for local ingestions. If no storage is specified, watsonx.data can infer and select an appropriate bucket. For more information, see Ingesting data by using Spark through the web console.

Introduction to Metadata Service (MDS)

Starting from the 2.1 release, watsonx.data uses Metadata Service (MDS) instead of Hive Metastore (HMS). MDS is compatible with modern, open catalog APIs, Unity Catalog API, and Apache Iceberg REST Catalog API, enabling wider tool integration and increased flexibility. This new architecture delivers comparable performance while it continues to support Spark and Presto clients through the existing Thrift or HMS interface. For more information, see Metadata Service (MDS) overview.

It is recommended to use MDS in your test environments and then move to using it in production.

Deprecated features

The following feature is deprecated in this release:

  • The REST API feature to capture DDL changes in watsonx.data through the event listener will be deprecated from watsonx.data release version 2.1.

13 November 2024 - Version 2.0.4 Hotfix

Lite plan enhancements

This hotfix release includes the following Lite plan enhancements:

  • Lite plan now includes a dedicated read-only sample IBM COS storage associated to the Presto engine to support querying sample and benchmarking data.

  • You can now work with tpcds sample worksheets for high performance use cases and Gosales sample worksheet for Data engineering and GenAI use cases.

  • Query Optimizer is now automatically enabled for High Performance BI use cases.

29 October 2024 - Version 2.0.4

Engine and service enhancements

This release includes the following engine and service enhancements:

  • The default value of the task.max-drivers-per-task property for Presto (Java) and Presto (C++) workers is now set based on the number of vCPUs.

  • You can enable the file pruning functionality in Query History Monitoring and Management (QHMM) from the Query monitoring page. You can also configure the maximum size and threshold percentage for the QHMM storage bucket. When the threshold is met during file upload or when a cleanup scheduler runs (default every 24 hours), older data is deleted. For more information, see Configuring query monitoring.

  • Query History Monitoring and Management (QHMM) no longer stores the diagnostic data in the default IBM Managed trial bucket (wxd-system). To store the diagnostic data, you must now use a storage type supported for QHMM. For more information about using your own storage, see Configuring query monitoring.

  • You can now verify query optimization status by checking the wxdQueryOptimized parameter in the JSON file. For more information, see Running queries from the Presto (C++) CLI or Query workspace.

Data sources enhancements

This release includes the following data sources and storage enhancements:

  • Test connection feature is now available for the following data sources:

    • Apache Pinot
    • Cassandra
    • Prometheus
  • New data source SAP HANA is now available. You can use Driver manager under the Configurations page to manage drivers for SAP HANA data source. Each of these drivers undergoes a series of validations. For more information on SAP HANA data source and BYOJ process, see SAP HANA.

Lite plan

To enhance usability, the system catalogs (cmx and system) are now hidden for Lite plan users. The Lite plan instance with Presto (C++) engine includes tpch as the benchmarking catalog and the instance with Presto (Java) engine include tpch and tpcds as the benchmarking catalogs.

Deprecated features

The following features are deprecated in this release:

  • The REST API feature to capture DDL changes in watsonx.data through event listener is deprecated in this release and will be removed from watsonx.data with version 2.1 release.

  • Support for Apache Spark 3.3 runtime is deprecated. You must upgrade to Spark 3.4. To update the Apache Spark version, see Editing the Spark engine details.

25 September 2024 - Version 2.0.3

Data sources and storage enhancements

This release includes the following new data sources and storage enhancements:

  • You can now enable Azure Data Lake Storage Gen1 Blob and Google Cloud Storage for Milvus. For more information, see ADLS Gen1 Blob and Google Cloud Storage.

  • You can create or add a new data source to the engine without attaching a catalog to it. A catalog can be attached to the data source at a later stage.

  • You can now use Apache Ozone storage for the Presto (Java) engine. For more information, see Apache Ozone.

  • You can now configure the Apache Kafka data source to use the Salted Challenge Response Authentication Mechanism (SCRAM) authentication mechanism. You can upload a self-signed certificate. For more information, see Apache Kafka.

Integration enhancements

This release of watsonx.data introduces the following new or enhanced integrations with other services:

  • You can now integrate watsonx.data with data build tool (dbt) for Spark engine for in-place data transformation within watsonx.data. For more information, see About dbt integration.

  • You can integrate watsonx.data with Databand. This integration can enhance the monitoring capabilities by providing insights that extend beyond Spark UI and Spark History. For more information, see Monitoring Spark application runs by using Databand.

  • You can integrate watsonx.data with the following Business Intelligence (BI) visualization tools to access the connected data sources and build compelling and interactive data visualizations:

    • Tableau
    • Looker
    • Domo
    • Qlik
    • PowerBI

    For more information, see About BI visualization tools.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

  • Iceberg tables are supported by Query Optimizer. For more information, see Query Optimizer.

  • You can now use the data build tool (dbt-watsonx-presto) adapter to build, test, and document data models for the Presto (Java) engine. For more information, see dbt-watsonx-presto.

  • A new customization property (file-column-names-read-as-lower-case) is now available for Presto (C++) engine to avoid upper case and lower case mismatch in columns names. For more information, see Catalog properties for Presto (C++).

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

  • You can now add users and user groups to define data policy rules. For more information, see Data policy.

  • Administrators can now select TPCDS and TPCH catalogs to create access control policies. ‘Select’ is the only allowed operation to define rules with these catalogs. To define data policies, see Data policy.

  • Administrators can now edit resource group configuration after creating the resource group. For more information, see Configuring Presto resource groups.

IBM Knowledge Catalog governance policies for data sources

You can now apply IBM Knowledge Catalog governance policies to the following data sources in Presto:

  • Oracle
  • PostgreSQL
  • MySQL
  • SQL Server
  • Db2
Ingestion enhancements

This release of watsonx.data includes the following improvements to the ingestion workflow:

Lite plan

You can provision your Lite plan instance based on the following three use cases. Select one use case from the list to proceed:

  • Generative AI : You can explore Generative AI use cases using this option. The provisioned instance includes Presto, Milvus, and Spark.
  • High Performance BI : You can explore BI visualization functionalities using this option. The provisioned instance includes Presto (C++) and Spark.
  • Data Engineering Workloads : You can use data engineering workload to explore various workload driven use cases. The provisioned instance includes Presto (Java) and Spark.

For more information, see Lite plan.

27 August 2024 - Version 2.0.2

Data sources and storage enhancements

This release includes the following new data sources and storage enhancements:

  • Content Aware Storage (CAS) is now called Data Access Service (DAS).

  • Apache Hive is upgraded to version 4.0.0.

  • You can now view the DAS endpoint from the Storage details page. For more information, see Exploring storage objects.

Integration enhancements

This release of watsonx.data introduces the following new or enhanced integrations with other services:

  • You can now use the governance capabilities of IBM Knowledge Catalog for SQL views within the watsonx.data platform. For more information, see Integrating with IBM Knowledge Catalog (IKC).

  • IBM watsonx.data now supports Apache Ranger policies to govern data with Presto (C++) engines. For more information, see Apache Ranger policy.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

  • Instance administrators can now configure resource groups in Presto. For more information, see Resource groups.

  • You can now use an API to execute queries and retrieve results. For more information, see API.

  • You can now configure or change the log level of Presto (Java) through API customization. For more information, API.

  • You can now generate Number of Distinct Values (NDV) column statistics with the Iceberg Spark Analyze procedure to enhance the Spark Cost-Based Optimizer (CBO) for improved query planning.

  • You can now use the custom data source option to connect to Black Hole and Local File connectors for the Presto (Java) engine. For more information, see Custom data source.

  • You can now generate JSON snippet for Presto engine and Milvus service. You can copy/paste it over to the watsonx.data Presto and Milvus connector UI in IBM Cloud Pak for Data and watsonx to simplify the connection creation. For more information, see Getting connection information.

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

Ingestion enhancements

This release of watsonx.data introduces the following ingestion enhancements:

01 August 2024 - Version 2.0.1

Data sources

  • You can now connect to Db2 data sources by using IBM API key as the authentication mechanism. For more information, see IBM Db2.
  • Presto (C++) engine can now be associated with Arrow Flight service data sources. Read only operations are supported. The following Arrow Flight service data sources are supported:
    • Salesforce
    • MariaDB
    • Greenplum
    • Apache Derby

For more information, see Arrow Flight service.

  • The following new databases are available for Presto (Java) engine:

Integrations

  • When integrating IBM Knowledge Catalog with IBM watsonx.data, you can configure data protection rules for individual rows in a table, allowing users to access a subset of rows in a table. For more information, see Filtering rows.

  • You can now apply the following Apache Ranger policies for Presto (Java) engines:

  • You can now integrate IBM watsonx.data with on-premises IBM DataStage. You can use DataStage service to load and to read data from IBM watsonx.data. For more information, Integrating with DataStage.

Authentication and authorization

  • The Spark access control extension allows additional authorization, enhancing security at the time of application submission. If you enable the extension in the spark configuration, only authorized users are allowed to access and operate IBM watsonx.data catalogs through Spark jobs. For more information, see Enhancing Spark application submission using Spark access control extension.

  • IBM watsonx.data now supports object storage proxy and signature for Azure Data Lake Storage and Azure Blob Storage. For more information, see Using DAS proxy to access ADLS and ABS compatible buckets.

  • Lightweight Directory Access Protocol (LDAP) is now provided for Teradata and Db2 data sources. The user needs to set up this configuration at the server level. For Teradata, explicitly choose the authentication mechanism type as LDAP in the UI. For more information, Teradata.

DAS proxy to access ADLS and ABS buckets and LDAP enhancements are Tech preview in version 2.0.1.

  • Milvus now supports partition-level isolation for users. Administrators can authorize specific user actions on partitions. For more information, see Service (Milvus).

Storage

  • You can now add the following storage to Presto (Java) engine in IBM watsonx.data:
    • Azure Data Lake Storage Gen2
    • Azure Data Lake Storage Gen1 Blob

For more information, see Azure Data Lake Storage Gen2 and Azure Data Lake Storage Gen1 Blob.

  • You can modify the access key and secret key of a user-registered bucket for a storage. This feature is not applicable to default buckets, ADLS, or Google Cloud Storage. This feature can only be used if the new credentials successfully pass the test connection.

Engines

  • You can now use the ALTER TABLE ADD, DROP, and RENAME column statements for MongoDB data source.
  • You can now configure how Presto handles unsupported data types. For more information, see ignore-unsupported-datatypes.

Catalogs

  • You can now associate and disassociate catalogs to an engine in bulk through UI under Manage associations in the Infrastructure manager page.

API Customization and properties

Infrastructure manager

  • You can use search feature for the following values on the Infrastructure manager page:
    • database name
    • registered hostname
    • created by username
  • You can now use the ‘Do Not Disturb’ toggle switch in the Notifications section under the bell icon to enable or disable pop-up notifications.
  • You can find the connectivity information under the Connect information tile in the Configurations page. This information can be copied and downloaded to a JSON snippet.

Query Workspace

  • You can run queries on all tables under a schema through the SQL query workspace without specifying the path <catalog>.<schema> by selecting the required catalogs and schemas from the new drop down list. For more information, Running SQL queries.

watsonx.data pricing plans

  • You can now delete the existing Lite plan instance before reaching the account cap limit of 2000 RUs, and create a new instance and consume the remaining resource units available in the account. For more information, see watsonx.data Lite plan.

03 July 2024 - Version 2.0.0

New data types for data sources

The following new data types are now available for some data sources. You can access these data types on the Data manager page under the Add column option.

  • BLOB

    • Db2
    • Teradata
    • Oracle
    • MySQL
    • SingleStore
  • CLOB

    • Db2
    • Teradata
    • Oracle
  • BINARY

    • SQL Server
    • MySQL

Because the numeric data type is not supported in watsonx.data, you can use the decimal data type as an equivalent alternative to the numeric data type for Netezza data source.

You can now use the BLOB and CLOB data types with the SELECT statement in the Query workspace to build and run queries against your data for Oracle and SingleStore data sources.

You can now use the BLOB and CLOB data types for MySQL and PostgreSQL data sources as equivalents to LONGTEXT, BYTEA, and TEXT because these data types are not compatible with Presto (Java). These data types are mapped to CLOB and BLOB in Presto (Java) if data sources have existing tables with LONGTEXT, TEXT, and BYTEA data types.

  • MySQL (CLOB as equivalent to LONGTEXT)
  • PostgreSQL (CLOB as equivalent to TEXT)
  • PostgreSQL (BLOB as equivalent to BYTEA)
  • Netezza (decimal as equivalent to numeric)
  • Oracle (BLOB and CLOB with the SELECT statement)
  • SingleStore (BLOB and CLOB with the SELECT statement)

New operations for Db2 data source

You can perform the following operations for BLOB and CLOB data types for Db2 data source:

  • INSERT
  • CREATE
  • CTAS
  • ALTER
  • DROP

New Arrow Flight service based data sources

You can now use the following data sources with Arrow Flight service:

  • Greenplum
  • Salesforce
  • MariaDB
  • Apache Derby

For more information, see Arrow Flight service.

New data sources

You can now use the following data sources:

  • Cassandra
  • BigQuery
  • ClickHouse
  • Apache Pinot

For more information, see Adding a database-catalog pair.

Command to retrieve ingestion history

You can now retrieve the status of all ingestion jobs that are submitted by using the ibm-lh get-status --all-jobs CLI command. You can retrieve the status of all ingestion jobs that are submitted. You get the history records that you have access to. For more information, see Options and parameters supported in ibm-lh tool.

Additional roles for IBM Knowledge Catalog (IKC) S2S authorization

Besides data access, IBM Knowledge Catalog S2S authorization needs metadata access and Console API access to integrate with watsonx.data. The following new roles are created for IKC service access configuration:

  • Viewer
  • Metastore viewer

Apache Ranger policies

IBM watsonx.data now supports Apache Ranger policies to allow integration with Presto engines. For more information, see Apache Ranger policy.

Version upgrade

  • Presto (Java) engine is now upgraded to version 0.286.
  • Milvus service is now upgraded to version to 2.4.0. Important features include:
    • Better Performance (Low Memory Utilisation)
    • Support Sparse Data
    • Inbuilt SPLADE Engine for Sparse Vector Embedding
    • BGE M3 Hybrid (Dense+Sparse) Search

Hive Metastore (HMS) access in watsonx.data

You can now fetch metadata information for Hive Metastore by using REST APIs instead of getting the information from the engine details. HMS details are used by external entities to integrate with watsonx.data. You must have an Admin, Metastore Admin, or Metastore Viewer role to run the API.

Semantic automation for data enrichment

Semantic automation for data enrichment leverages generative AI with IBM Knowledge Catalog to understand your data on a deeper level and enhance data with automated enrichment to make it valuable for analysis. Semantic layer integration is available for Lite plan users only as a 30 days trial version. For more information, see Semantic automation for data enrichment in watsonx.data.

Query Optimizer to improve query performance

You can now use Query Optimizer, to improve the performance of queries that are processed by the Presto (C++) engine. If Query Optimizer determines that optimization is feasible, the query undergoes rewriting; otherwise, the native engine optimization takes precedence. For more information, see Query Optimizer overview.

New name for Presto engine in watsonx.data

Presto is renamed to Presto (Java).

New engine (Presto C++) in watsonx.data

You can provision a Presto (C++) engine ( version 0.286) in watsonx.data to run SQL queries on your data source and fetch the queried data. For more information, see Presto (C++) overview.

Using proxy to access S3 and S3 compatible buckets

External applications and query engines can access the S3 and S3 compatible buckets managed by watsonx.data through an S3 proxy. For more information, see Using S3 proxy to access S3 and S3 compatible buckets.

Mixed case feature flag for Presto (Java) engine

The mixed case feature flag, which allows to switch between case sensitive and case insensitive behavior in Presto (Java), is available. The flag is set to OFF by default and can be set to ON during the deployment of watsonx.data. For more information, see Presto (Java) mixed-case support overview.

New storage type Google Cloud Storage

You can now use new storage type Google Cloud Storage. For more information, see Adding storage-catalog pair.

31 May 2024 - Version 1.1.5

Provision Spark engine in watsonx.data Lite plan

You can now add a small-sized Spark engine (single node) in the watsonx.data Lite plan instance. For more information, see watsonx.data Lite plan.

Updates related to Spark labs

  • Working with Jupyter Notebooks from Spark labs

: You can now install the Jupyter extension from the VS Code Marketplace inside your Spark lab and work with Jupyter Notebooks. For more information, see Create Jupyter Notebooks.

  • Accessing Spark UI from Spark labs

You can now access the Spark user interface (UI) from Spark labs to monitor various aspects of running a Spark application. For more information, see Accessing Spark UI from Spark labs.

New region to provision for IBM Cloud instance

You can now provision your IBM Cloud instance in the Sydney region.

30 Apr 2024 - Version 1.1.4

A new version of watsonx.data was released in April 2024.

This release includes the following features and updates:

Kerberos authentication for HDFS connections

You can now enable Kerberos authentication for secure Apache Hadoop Distributed File System (HDFS) connections. For more information, see HDFS.

New data sources

The following new data sources are now available:

  • Oracle
  • Amazon Redshift
  • Informix
  • Prometheus

For more information, see Data sources.

Test SSL connections

You can now test SSL connections for the MongoDB and SingleStore data sources.

Uploading description files for Apache Kafka data source

The Apache Kafka data source stores data as byte messages that producers and consumers must interpret. To query this data, consumers must first map it into columns. Now, you can upload topic description files that convert raw data into a table format. Each file must be a JSON file that contains a definition for a table. To upload these JSON files from the UI, go to the overview page of the Apache Kafka database that you registered and select the Add topic option. For more information, see Apache Kafka.

License plans for watsonx.data

IBM® watsonx.data now offers the following license plans.

  • Lite plan
  • Enterprise plan

For more information about the different license plans, see IBM® watsonx.data pricing plans.

Presto (Java) engine version upgrade

The Presto (Java) engine is now upgraded to version 0.285.1.

Pause or resume Milvus

You can now pause or resume Milvus service. Pausing your service can avoid incurring charges.

Spark is now available as a native engine

In addition to registering external Spark engines, you can now provision native Spark engine on your IBM watsonx.data instance. With native Spark engine, you can fully manage Spark Engine configuration, manage access to Spark Engines and view applications by using watsonx.data UI and REST API endpoints. For more information, see Provisioning Native Spark engine.

Ingest data using native Spark Engines

You can now submit ingestion jobs using native Spark Engines. For more information, see Working with Apache Hudi catalog and Working with Delta Lake catalog.

27 Mar 2024 - Version 1.1.3

A new version of watsonx.data was released in March 2024.

This release includes the following features and updates:

New data type for some data sources

You can now use the BINARY data type with the SELECT statement in the Query workspace to build and run queries against your data for the following data sources:

  • Elasticsearch
  • SQL Server
  • MySQL

New data types: BLOB and CLOB are available for MySQL, PostgreSQL, Snowflake, SQL Server, and Db2 data sources. You can use these data types only with SELECT statements in the Query workspace to build and run queries against your data.

Delete data by using the DELETE FROM feature for Iceberg data sources

You can now delete data from tables in Iceberg data sources by using the DELETE FROM feature.

You can specify the table property delete mode for new tables by using either copy-on-write mode or merge-on-read mode (default). For more information, see SQL statements.

ALTER VIEW statement for Iceberg data source

You can now use the following SQL statement in the Query workspace to build and run queries against your data for ALTER VIEW:

ALTER VIEW name RENAME TO new_name

Upload SSL certificates for Netezza Performance Server data sources

You can now browse and upload the SSL certificate for SSL connections in Netezza Performance Server data sources. The valid file formats for SSL certificate are .pem, .crt, and .cer. You can upload SSL certificates by using the Adding a database-catalog pair option in the Infrastructure manager.

Query data from Db2 and Watson Query

You can now query nicknames that are created in Db2 and virtualized tables from Watson Query instances.

SSL connection for IBM Data Virtualization Manager for z/OS data source

You can now enable SSL connection for the IBM Data Virtualization Manager for z/OS data source by using the Add database user interface to secure and encrypt the database connection. Select Validate certificate to validate whether the SSL certificate that is returned by the host is trusted. You can choose to provide the hostname in the SSL certificate.

Use data from Apache Hudi catalog

You can now connect to and use data from Apache Hudi catalog.

Add Milvus as a service in watsonx.data

You can now provision Milvus as a service in watsonx.data with the following features:

  • Provision different storage variants such as starter, medium, and large nodes.

  • Assign Admin or User roles for Milvus users: User access policy is now available for Milvus users. Using the Access Control UI, you can assign Admin or User roles for Milvus users and also grant, revoke, or update the privilege.

  • Configure the Object storage for Milvus to store data. You can add or configure a custom bucket and specify the username, password, region, and bucket URL.

For more information, see Milvus.

Load data in batch by using the ibm-lh ingestion tool

You can now use the ibm-lh ingestion tool to run batch ingestion procedures in non-interactive mode (from outside the ibm-lh-tools container), by using the ibm-lh-client package. For more information, see ibm-lh commands and usage.

Creating schema by using bulk ingestion in web console

You can now create a schema by using the bulk ingestion process in the web console, if the schema is not previously created.

Use time-travel queries in Apache Iceberg tables

You can now run the following time-travel queries by using branches and tags in Apache Iceberg table snapshots:

- SELECT *FROM <table name> FOR VERSION AS OF 'historical-tag'

- SELECT *FROM <table name> FOR VERSION AS OF 'test-branch'

Access Cloud Object Storage without credentials You can now access your Cloud Object Storage bucket without credentials, by using the Data Access Service (DAS) endpoint. For more information about getting DAS endpoint, see Getting DAS endpoint.

28 Feb 2024 - Version 1.1.2

A new version of watsonx.data was released in February 2024.

This release includes the following features and updates:

SSL connection for data sources

You can now enable SSL connection for the following data sources by using the Add database user interface to secure and encrypt the database connection. :

  • Db2

  • PostgreSQL

For more information, see Adding a database.

Secure ingestion job history

Now, users can view only their own ingestion job history. Administrators can view the ingestion job history for all users.

SQL enhancements

You can now use the following SQL statements in the Query workspace to build and run queries against your data:

  • Apache Iceberg data sources
    • CREATE VIEW
    • DROP VIEW
  • MongoDB data sources
    • DELETE

New data types BLOB and CLOB for Teradata data source

New data types BLOB and CLOB are available for Teradata data source. You can use these data types only with SELECT statements in the Query workspace to build and run queries against your data.

Create a new table during data ingestion

Previously, you had to have a target table in watsonx.data for ingesting data. Now, you can create a new table directly from the source data file (available in parquet or CSV format) by using data ingestion from the Data Manager. You can create the table by using the following methods of ingestion:

  • Ingesting data by using Iceberg copy loader.

  • Ingesting data by using Spark.

Perform ALTER TABLE operations on a column

With an Iceberg data source, you can now perform ALTER TABLE operations on a column for the following data type conversions:

  • int to bigint

  • float to double

  • decimal (num1, dec_digits) to decimal (num2, dec_digits), where num2>num1.

Better query performance by using sorted files

With an Apache Iceberg data source, you can generate sorted files, which reduce the query result latency and improve the performance of Presto (Java). Data in the Iceberg table is sorted during the writing process within each file.

You can configure the order to sort the data by using the sorted_by table property. When you create the table, specify an array of one or more columns involved in sorting. To disable the feature, set the session property sorted_writing_enabled to false.

31 Jan 2024 - Version 1.1.1

A new version of watsonx.data was released in January 2024.

This release includes the following features and updates:

IBM Data Virtualization Manager for z/OS® connector

You can now use the new IBM Data Virtualization Manager for z/OS® connector to read and write IBM Z® without moving, replicating, or transforming the data. For more information, see Connecting to an IBM Data Virtualization Manager (DVM) data source.

Teradata connector is enabled for multiple ALTER TABLE statements

Teradata connector now supports the ALTER TABLE RENAME TO, ALTER TABLE DROP COLUMN, and ALTER TABLE RENAME COLUMN column_name TO new_column_name statements.

Support for time travel queries

Iceberg connector for Presto (Java) now supports time travel queries.

The property format_version now shows the current version

The property format_version now shows the correct value (current version) when you create an Iceberg table.

29 Nov 2023 - Version 1.1.0

A new version of watsonx.data was released in November 2023.

This release includes the following features and updates:

Presto (Java) case-sensitive behavior

The Presto (Java) behavior is changed from case-insensitive to case-sensitive. Now you can provide the object names in the original case format as in the database. For more information, see Case-sensitive search configuration with Presto (Java).

Roll-back feature

You can use the Rollback feature to rollback or rollforward to any snapshots for Iceberg tables.

Capture Data Definition Language (DDL) changes

You can now capture and track the DDL changes in watsonx.data by using an event listener. For more information, see Capturing DDL changes.

Ingest data by using Spark

You can now use the IBM Analytics Engine that is powered by Apache Spark to run ingestion jobs in watsonx.data.

For more information, see Ingesting data by using Spark.

Integration with Db2 and Netezza Performance Server

You can now register Db2 or Netezza Performance Server engines in watsonx.data console.

For more information, see Registering an engine.

New connectors

You can now use connectors in watsonx.data to establish connections to the following types of databases:

  • Teradata
  • Delta Lake
  • Elasticsearch
  • SingleStoreDB
  • Snowflake

For more information, see Adding a database.

AWS EMR for Spark

You can now run Spark applications from Amazon Web Services Elastic MapReduce (AWS EMR) to achieve the watsonx.data Spark use cases:

  • Data ingestion
  • Data querying
  • Table maintenance

For more information, see Using AWS EMR for Spark use case.

7 July 2023 - Version 1.0.0

watsonx.data is a new open architecture that combines the elements of the data warehouse and data lake models. The best-in-class features and optimizations available on the watsonx.data make it an optimal choice for next generation data analytics and automation. In the first release (watsonx.data 1.0.0), the following features are supported:

  • Creating, scaling, pausing, resuming, and deleting the Presto (Java) query engine
  • Associating and dissociating a catalog with an engine
  • Exploring catalog objects
  • Adding and deleting a database-catalog pair
  • Updating database credentials
  • Adding and deleting bucket-catalog pair
  • Exploring bucket objects
  • Loading data
  • Exploring data
  • Querying data
  • Query history