IBM Cloud Docs
Release notes for watsonx.data

Release notes for watsonx.data

Use these release notes to learn about the latest updates to IBM® watsonx.data that are grouped by date.

For watsonx.data as a Service on IBM Cloud with gen AI experience what's new, see Release notes for watsonx.data as a Service on IBM Cloud with gen AI experience.

For watsonx.data on-prem what's new, see Release notes for watsonx.data.

For watsonx.data Premium Edition on-prem what's new, see Release notes for on-prem Premium.

Technology preview features: We also offer a Technology preview section that includes features currently in preview. These features are not generally available and may change before release. To view the release notes for technology preview items, see Technology preview.

10 December 2025 - Version 2.3

Instance provisioning enhancements

This release of watsonx.data introduces the following enhancements:

You can now provision watsonx.data instances with Virtual Private Endpoint (VPE) enabled in the following new regions: Dallas (us-south), Washington DC (us-east), and Frankfurt (eu-de). To enable VPE during provisioning, add the parameter "vpe_required":"true" to the CLI command. For information on how to provision a VPE enabled instance , see Provisioning Virtual Private Endpoint (VPE) enabled instance.

Data sources and storage enhancements

This release of watsonx.data introduces the following data sources and storage enhancements:

  • You can now apply IBM Knowledge Catalog governance policies to the data source, MongoDB. For more information, see Connecting to IBM Knowledge Catalog (IKC).

  • You can now associate Azure Data Lake Storage Gen2 with Presto (C++) using ServicePrincipal authentication.

  • You can now associate multiple Iceberg-type catalogs with a single object storage bucket or container. Each catalog must be configured with a unique, non-overlapping base path on the storage to ensure proper data isolation.

    For example:

    • Catalog1 can be associated with s3a://mybucket/foo/bar
    • Catalog2 can be associated with s3a://mybucket/lorem/ipsum

    This enhancement makes it easier to logically separate data within the same storage and reuse it across multiple catalogs, improving flexibility and organization. This behavior applies to all new Lite plan instances, which are now account-scoped. For more information, see Adding multiple Apache Iceberg catalogs to a single storage.

    This feature is available only for watsonx.data Lite instances. Previously, each object storage bucket or container could only be linked to a single catalog.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

  • Prestissimo now delivers improved performance when writing Apache Iceberg tables compared to Java implementations. The following capabilities are available:

    • Partitioned table support - Prestissimo writes to partitioned Iceberg tables and applies partition transforms efficiently using batch evaluation. It supports identity, temporal transforms (year, month, day, hour), bucket, and truncate, and generates Iceberg-compliant partition directory paths.
    • Data file statistics collection - During write operations, Prestissimo collects and reports essential data file statistics to Iceberg manifest files, including record count, file size, and partition details.
    • Sorted table write support - Prestissimo supports writing sorted Iceberg tables to enable optimized query performance for workloads that benefit from sorted data.
  • Serverless Spark with flexible fapacity for Enterprise Plan

    On-Demand Capacity

    • In the watsonx.data Enterprise plan, the Spark engine supports a serverless model while still offering the flexibility to allocate dedicated capacity when needed.
    • Running Spark jobs on a serverless platform eliminates the need for dedicated nodes for each Spark engine.
    • The serverless Spark environment provides a shared pool of nodes with a maximum resource quota of 8 vCPUs and 32 GB memory.

    This behavior applies to all new watsonx.data instances, which are now account-scoped.

    Dedicated Capacity

    • For workloads that require higher capacity, you can provision dedicated nodes with customizable memory configurations. For more details on serverless and on-demand capacities, see Managing Spark Capacity.
    • The Spark Engine creation process is now simplified by focusing only on essential details—engine name, Spark version, home bucket, and associated catalogs—while moving capacity reservation tasks to a new Capacity Management tab on the engine details page. This update removes capacity configuration from the creation flow, making engine setup faster and less complex.
    • After creating an engine, you can manage VM flavors, configure node pools, and set on-demand fallback thresholds under the Capacity tab.

    This behavior applies to all new watsonx.data instances, which are now account-scoped.

    The maximum resource quota for the Enterprise plan is 256 vCPUs and 1024 GB of memory. To increase this limit, you must contact IBM support.

    For more details on serverless and on-demand capacities, see Managing Spark Capacity.

Account-level component persistence for Lite Plan instance

You can now retain account-level components such as catalogs, databases, buckets, and their metadata properties independently of individual instances. When an instance is deleted, these components remain accessible from any other instance within the same account and region. This behavior applies to all new Lite plan instances, which are now account-scoped.

Schema name reuse across Iceberg catalogs for Lite Plan instance

Previously, when referencing a table using a three-part name (<catalog>.<schema>.<table>), schema names had to be unique across all catalogs within a watsonx.data instance. This restriction prevented the creation of schemas with the same name in different catalogs. This limitation is lifted for Iceberg catalogs. You can now reuse schema names across multiple Iceberg catalogs. For example:

  • myiceberg_catalog1.abcschema.mytable
  • myiceberg_catalog2.abcschema.mytable

This behavior applies to all new Lite plan instances, which are now account-scoped.

Schema names must still be unique across other catalog types such as Hive, Delta, and Hudi.

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

  • Administrators can create context-based restriction policies and define trusted IP addresses. You can now configure trusted IP addresses to enable secure access to watsonx.data (UI and API) for designated users. This capability adds an extra layer of protection by ensuring that only traffic originating from approved IP addresses can access the user interface and API. Any attempts to access watsonx.data from IPs outside the defined range will be blocked. For more information, see Securing UI Access with IP-Based Controls.

  • A new lightweight CPG is now available as a downloadable plugin, enabling seamless integration with any policy engine (for example, IBM Knowledge Catalog, Apache Ranger, Collibra). For more information, see Common Policy Gateway (CPG) connector.

Billing enhancements

This release of watsonx.data introduces the following enhancements to the billing feature:

  • Metering of watsonx.data components now operates at the runtime level, capturing start, stop, and pause events for each runtime tied to an engine. It provides clear visibility into engine consumption and resource usage. For engines like Presto, this remains a one-to-one mapping, while Spark introduces multiple runtime subtypes (e.g., Kernel, HistoryServer, Application), each tracked individually for active and inactive hours. The user interface will reflect these changes by displaying runtime-level activity bars and event history links scoped to each runtime, ensuring clarity and precision. For more information, see Metering and usage experience. This behavior applies to all new watsonx.data instances, which are now account-scoped.
Lite plan enhancements:

This release of watsonx.data introduces the following enhancements to the Lite plan:

Lite plan provisioning in watsonx.data is now simplified by removing support for multiple use cases. All new instances provision with the default Generative AI use case. The Data Engineering and Power BI use cases are deprecated and no longer available. The CLI provisioning method now allows only the default Generative AI use case, ensuring a consistent and streamlined experience.

OpenTelemetry enhancement

This release of watsonx.data introduces the following enhanced observability and monitoring for Presto (Java) engine.

  • You can now integrate OpenTelemetry with Presto (Java) engine to monitor query execution and system performance. OpenTelemetry enables capturing telemetry data such as traces and metrics, which can be visualized and analyzed using tools like Instana, Prometheus, and Grafana. For more information, see OpenTelemetry.

  • New Instana and Grafana dashboards - You can now use the Instana and Grafana dashboards to monitor the performance and provide a more comprehensive view of system health and performance. For more details, see Supporting dashboards.

Query Optimizer enhancement

This release of watsonx.data introduces the following enhancement to Query Optimizer.

  • The default query rewrite timeout for Query Optimizer is now configurable. Starting with version 2.3, you can change this timeout value using the PATCH API by updating the property optplus.query-timeout-seconds. For more information, see Updating query rewrite timeout for Query Optimizer.

  • Support for Hive and Iceberg metastore registration in Query Optimizer for Lite instances of watsonx.data.

    The Query Optimizer supports distinct metastore types for Hive and Iceberg catalogs.

    Users can now register:

    • Hive catalogs using the watsonx-data-hive metastore type.
    • Iceberg catalogs using the iceberg-rest metastore type.

    This enhancement allows more granular control and compatibility with evolving metastore architectures. Registration is done using the REGISTER_EXT_METASTORE procedure with updated syntax and properties.

    From this release onwards, legacy support for the unified watsonx-data metastore type is continued to support in the Enterprise version while is no longer available for Lite instances. For more information, see Manually syncing Query Optimizer with metastore.

Thrift over HTTP protocol support in watsonx.data Lite plan

The Metadata Service (MDS) in watsonx.data now runs the Thrift service over the HTTP protocol instead of the previous binary protocol. This change affects service endpoints and connection configurations.

Key changes:

  • The MDS Thrift Protocol (thrift://) is changed to Thrift Over HTTP (https://).
  • The account_id is mandatory for all Thrift API calls made to the MDS Thrift Service over HTTP.
  • The catalog query parameter is required when invoking APIs involving the Iceberg catalog.

For Spark and Presto engines within watsonx.data, these updates are applied automatically for both new and migrated catalogs. For external engines such as Spark, Db2, and Netezza, users must manually update the connection settings to reflect the new protocol, port, and query parameter.

Muti-tenant Metadata Service (MDS) enhancements

This release of watsonx.data introduces the following enhancements to MDS.

  • AccountId is now required for all direct calls to the MDS REST Service (Iceberg Catalog and Unity Catalog). Requests that do not include this header will fail.
  • The endpoint for Iceberg operations is now updated from /mds/iceberg to /api/v1/iceberg.

For more information, see API documentation.

CPDCTL CLI enhancements

This release of watsonx.data introduces the following enhancements to IBM Cloud Pak for Data Command Line Interface (IBM cpdctl):

  • Backward compatibility has been enabled for the bucket, engine, ingestion, component, and service commands in CPDCTL.

Starting with CPDCTL version 1.8.85, these commands can now connect to watsonx.data releases prior to version 2.2.1, ensuring smoother integration and compatibility across environments.

  • A new option under the bucket command wx-data bucket list-objects lists down the objects in a bucket added in watsonx.data. For details about the bucket command related operations in watsonx.data, see bucket.

  • A hidden flag as a workaround --en-apikey is now available to handle edge cases where the --api-key flag fails validation in sparkjob create and tablemaint commands. For more information, see Additional information about cpdctl wx-data command usage and examples.

Gen AI-powered chat interface in watsonx.data

You can now chat with a gen AI-powered chat interface, watsonx.data Assistant to ask questions about IBM® watsonx.data. The assistant answers your queries about watsonx.data based on its knowledge on IBM product documentation. It helps to explore and learn about the product in an easier and faster way. To enable the feature and start using it, see watsonx.data Assistant - genAI powered chat interface.

Retrieval Service

This release of watsonx.data introduces the following Retrieval Service enhancements:

  • You can now configure gpt-oss-120b AI model for Retrieval Service at the instance level in the watsonx.data console. For information on Configure AI model for Retrieval Service, see Configure AI model for Retrieval Service.
Ingestion enhancement

This release of watsonx.data introduces the following ingestion enhancement:

  • A new toggle is available in the target panel of the ingestion screen to control the delete mode for ingested Iceberg format tables with Copy-on-Write (COW) as default mode. Switching to the Merge-on-Read mode enable row-level deletion during ingestion.
Deprecated features

The following features are deprecated in this release:

  • The usernames ibmlhapikey and ibmlhtoken used for user authentication were announced as deprecated from version 2.2.0 and marked for removal. Now, in 2.3.0 the support has been completely removed.

To authenticate, you must use the new format:

ibmlhapikey_<username>

ibmlhtoken_<username>

For more infromation, see Access management and governance in watsonx.data.

Technology preview features

For this release, additional updates and enhancements are available under Technology Preview features. To review the Technology Preview updates for this release, see, Technology preview 2.3.

13 November 2025 - Version 2.2.2 New Feature 1 (NF1)

watsonx.data 2.2.2 NF1 version is releasing to different geographic regions in stages and is not available in all regions. To know if the 2.2.2 NF1 release is available in your region, contact IBM Support.

Technology preview features
For this release, additional updates and enhancements are available under Technology Preview features. To review the Technology Preview updates for this release, see, Technology preview 2.2.2 NF1.
Deprecated features
The High Performance BI and Data Engineering use cases are deprecated when creating a watsonx.data Lite instance through the UI. You can still create a watsonx.data Lite instance with these use cases using the CLI. However, these use cases will be removed from CLI in version 2.3.1.

31 October 2025 - Version 2.2.2

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

  • Milvus in watsonx.data now supports the following external storage types for storing vector data, index files, and binary logs: Google Cloud Storage (GCS), Azure Data Lake Storage (ADLS) Gen1, and S3-compatible storage types.
  • Milvus scaling functionality is now disabled for the Starter T-shirt size. You can no longer scale from the Milvus Starter T-shirt size to any other size. Scaling back to Starter from a larger configuration is also not allowed.
Data sources and storage enhancements

This release of watsonx.data introduces the following data sources and storage enhancements:

  • You can now apply IBM Knowledge Catalog governance policies to the data source, Teradata. For more information, see Connecting to IBM Knowledge Catalog (IKC).
  • You can now create a storage in an active state without associating it to a catalog. This enhancement removes the need for manual activation.
  • You can now enable and disable ACL features on an ACL-enabled storage in the topology view. For more information, see Disabling or enabling ACL on an ACL-enabled storage.
  • You can now use GlusterFS, a scalable distributed file system, as a supported storage backend for MinIO. For more information, see Setting up GlusterFS replicated storage with MinIO.
  • You can now configure any S3 compatible object storage in watsonx.data using the Custom S3 Storage option. For more information, see Custom S3 Storage.
  • You can now update credentials for Azure Data Lake Storage (ADLS) and Google Cloud Storage.
  • You can now choose to save connection details either in the instance console database or in the default catalog within the data platform for the following data sources:
    • IBM Db2
    • IBM Netezza
    • MySQL
    • Oracle
    • PostgreSQL
    • Snowflake
    • SQL Server
Delta Lake catalogs now available with Spark access control extension

You can now use Delta Lake catalogs with the Spark access control extension, enabling enhanced security during Spark application submissions. The feature brings in additional authorization, ensuring that only authorized users can access and operate watsonx.data catalogs through Spark jobs. For more information, see Enhancing Spark application submission using Spark access control extension.

Customize your Spark application payload

When you submit a Spark application in watsonx.data, you can customize the application payload to include the following features:

  • Idempotency keys: Ensures that application submissions are processed only once, even in cases of client-server communication failures.
  • Maximum runtime controls: Defines a maximum execution time for Spark applications. If the timeout is not specified, jobs continue to run until completion, regardless of how long they take.

For more information, see Customizing parameters for Spark application submission.

Common Policy Gateway (CPG)

The Common Policy Gateway (CPG) provisioning is now optional. You can create a watsonx.data instance without auto-provisioning CPG, unless a policy engine is explicitly required. With this feature, CPG provisioning is now fully optional and reversible. If a customer policy engine such as Ranger, or IKC is needed, CPG can be provisioned later. For more information, see Enabling or disabling common policy gateway engines.

CPDCTL CLI enhancements

This release of watsonx.data introduces the following enhancements to IBM Cloud Pak for Data Command Line Interface (IBM cpdctl):

  • Use the new access-control command group to manage access policies for resources in your watsonx.data instance, including viewing, updating, and revoking access for users and groups. For more information, see access-control.
Data manager enhancements

Users can now create schemas with custom paths to view and sync data at a more granular level. With this new feature, users can synchronize only a specific directory (for example, /test1 or /test1/schema1) to retrieve tables under that path, instead of syncing the entire catalog. This targeted sync capability improves performance and precision in data management.

Integration enhancements

IBM watsonx.data now supports column-level lineage tracking for Presto by integrating with Manta. With this enhancement, users can now explore detailed column dependencies, relationships, and metadata changes, enabling deeper insights into data flows and improving traceability across pipelines.

Deprecated features

The IBM Client package is deprecated and the installation and support of the ibm-lh-client package shall not be available from the 2.3.0 release of watsonx.data. The utilities and commands in the Client package are replaced with IBM CPDCTL CLI. Users are encouraged to migrate and explore CPDCTL. For more information about how to use IBM CPDCTL CLI, see IBM cpdctl.

Use the following available tools for equivalent functionalities of the Client package:

  • python-run / dev-sandbox – Use the standard Python environment to develop and run Spark scripts.
  • presto-run / Presto CLI – Use the official Presto CLI to run SQL queries against watsonx.data.
  • cert-mgmt – Use the JVM keytool to manage HTTPS certificates.

23 September 2025 - Version 2.2.1 New Functionalities Introduced (NFI)

Release notes for 2.2.1 NFI version of watsonx.data as a Service on IBM Cloud with the generative AI experience, see IBM watsonx.data as a Service version 2.2.1 New Functionalities Introduced (NFI).

Metadata Service enhancement
The Metadata Service (MDS) in watsonx.data now supports issuing vended credentials through the Iceberg and Unity REST APIs. By requesting temporary credentials, external metadata consumers can now securely access the data from the Object storage without the need to manage long-lived access keys.

The vended credentials support feature in watsonx.data is available only for storages such as Amazon S3, Google Cloud Storage (GCS), and Azure Data Lake Storage (ADLS). To enable vended credentials support for Amazon S3 storage, metadata consumers need to specify the Role ARN (Amazon Resource Name) field when the S3 component is registered in watsonx.data. For more details, see Adding Amazon S3 storages.

11 September 2025 - Version 2.2.1

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancement:

  • Introduced version v3 of the watsonx.data API. You can continue to use version v2 until watsonx.data version 2.3. See API documentation (v3).
  • You can now provision watsonx.data Spark engine with the Spark runtime set to Spark 4.0, which enables you to run Spark applications on Spark 4.0. For details about supported Spark versions, see Supported Spark version.
  • The Milvus service in watsonx.data is now upgraded to version 2.5.12.
  • You can now use the open-source Milvus backup tool to back up and restore data from Milvus within watsonx.data.
  • The Gluten accelerated Spark engine in watsonx.data is now able to run applications using Spark version 3.5. For details about supported Spark versions, see Supported Spark version.
  • You can now use the Vector Transport Service (VTS) with Milvus in watsonx.data to migrate or manage vector data across systems. For more details, see Using the Vector Transport Service.
Query Optimizer enhancement

You can now monitor query performance improvements through the optimizer dashboard. The optimizer is actively managing query plans for the associated catalogs and improving performance for Presto (C++) engines. For more details, see Managing statistical updates from Optimizer dashboard.

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

  • Privilege management for a Milvus service in watsonx.data now includes the following global privileges:

    • DescribeDatabase – Provides detailed information about the specified database.

    • AlterDatabase - Modifies the properties of an existing database.

For more details about managing user access in Milvus, see Predefined roles and permissions in watsonx.data.

CPDCTL CLI enhancements

This release of watsonx.data introduces the following enhancements to IBM Cloud Pak for Data Command Line Interface (IBM cpdctl):

  • Starting from CPDCTL version 1.8.25, compatibility is limited to watsonx.data version 2.2.1 and above. This change is due to the deprecation of v2 API support as part of the major upgrade to v3 APIs. For users on older CPDCTL versions, refer to the CPDCTL release archive.

    Some commands might have changed due to updates in the API specification. Use the --help option to review and adapt to the latest command syntax.

  • Starting with watsonx.data version 2.2.1, you can use HashiCorp Vault through cpdctl for secure secrets management and streamlined automation workflows.

  • A new option under the service command wx-data service generate-engine-dump allows you to generate dumps for Presto worker and coordinator nodes in watsonx.data. For details about the service command for serviceability related operations in watsonx.data, see service.

  • Use the new component command to retrieve configuration details and status of various components in watsonx.data. For details about the component command to get the configuration details for various components in watsonx.data, see wx-data commands and usage.

  • Starting from CPDCTL version 1.8.5, users no longer need to set the instance ID as an environment variable. This method is deprecated and will be removed in a future release. Instead, set the instance ID directly using the profile command. For details about setting the instance ID as environment variable, see config commands and usage.

Data sources and storage enhancements

You can now import catalogs and projects from the data platform for the following data sources:

  • IBM Db2
  • IBM Netezza
  • MySQL
  • Oracle
  • PostgreSQL
  • Snowflake
  • SQL Server
Semantic automation for data enrichment

watsonx.data now supports semantic search capabilities that allow users to query data using natural language, making data exploration more intuitive and efficient. For details about semantic search capabilities, see Performing semantic searches in watsonx.data.

Public preview enhancements

Public preview features are now accessible from Configurations UI. You can now easily access and manage watsonx.data public preview features from the Configurations UI. Features in public preview are highlighted with a Preview tag, making it easy to identify them. You can enable or disable the features to explore the functionality. Each public preview feature includes a link to its detailed documentation, allowing you to learn more about it. For more details about public preview features, see What's new in watsonx.data (Public preview).

Deprecated features

The following features are deprecated in this release:

  • watsonx.data API version v2 is now deprecated

watsonx.data API version v2 is completely removed from the watsonx.data developer edition starting with version 2.2.1. It will be completely removed from watsonx.data software edition in version 2.3. You must migrate to the latest supported API version (v3) to ensure continued compatibility and access to new features.

  • The option to register external Spark engines in watsonx.data is deprecated in this release and will be removed in version 2.3. watsonx.data already includes built-in Spark engines that you can provision and use directly, including the Gluten-accelerated Spark engine (Provisioning Gluten accelerated Spark engine) and the native watsonx.data Spark engine (Provisioning a Spark engine).

05 August 2025 - Version 2.2.0 New Feature 1 (NF1)

Support for BLOB and CLOB data types

The BLOB and CLOB data types support in watsonx.data is now updated to align with the SQL standard, which Presto follows as a federated query engine.

Read support: BLOBs and CLOBs can be read from JDBC-based federated systems. When read, they are mapped as follows:

  • BLOB to VARBINARY
  • CLOB to VARCHAR

Write support: Writing BLOB and CLOB data is also supported and are treated as follows:

  • VARBINARY for binary data
  • VARCHAR for character data

Create table support: You cannot use BLOB or CLOB as column types when creating new tables. Only VARBINARY and VARCHAR are supported for such use cases.

Engine version upgrade

The Presto (Java) and Presto (C++) engines are now upgraded to version 0.294.

Connecting to watsonx BI

You can now connect watsonx.data with IBM watsonx BI to directly access data available in different data sources, making it easier for data scientists and data analysts to use the data. For information about connecting to watsonx BI, see Integrating with watsonx BI.

Lite plan enhancement

This release of watsonx.data introduces the following Lite plan enhancements:

  • Serverless Spark engine for Lite plan: The Spark engine in the watsonx.data Lite plan instance operates in a serverless model. You can now run Spark jobs on a server less platform, eliminating the need for dedicated nodes for each Spark engine. The serverless Spark allows a maximum resource quota limit of 8 vCPU×32 GB, where the users can access a shared pool of nodes. The Spark runtimes are scheduled on any available nodes in the data plane rather than a dedicated node. For information about how to provision a Lite plan instance and to create a Spark engine in it, see Provisioning a serverless Spark engine for Lite plan.

  • A new Lite size configuration is introduced for the Presto (Java) engine, offering a single-node deployment setup for experimentation and early-stage development purposes. The Lite Presto (Java) engine is available only in watsonx.data Lite plan instances. For more information, see Provisioning a Presto (Java) engine.

Ingestion enhancement

This release of watsonx.data includes the following Ingestion enhancement:

The .txt file format is now accepted for data ingestion. This enhancement expands the flexibility allowing users to seamlessly upload plain text files alongside existing supported formats.

Service enhancements

You can now configure Query timeout using two settings: Maximum query execution time and Query client timeout. For more information see Managing user settings in watsonx.data: Session timeout, Query timeout, and Login message settings

11 July 2025

A new version of watsonx.data was released on 11 July, 2025 with the following change:

New region availability
watsonx.data on AWS is now available in the Mumbai region.

07 July 2025 - Version 2.2.0 Hotfix 1

Version 2.2 hotfix of watsonx.data was released on 07 July, 2025. This release includes security updates and fixes.

11 June 2025 - Version 2.2.0

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancement:

  • Introduced new API versions for connecting to a Milvus service by using a proxy host route. For more information, see Connecting to Milvus service.
  • For the Presto (C++) engines, the Hive and Iceberg catalogs are now enabled with region configuration. For more information, see Provisioning a Presto (C++) engine.
  • New Gluten accelerated Spark engine: You can now provision Gluten accelerated Spark engine and use it to run complex analytical workloads by leveraging high scalability of Spark SQL framework and high performance of native libraries. For information about working with the new Gluten accelerated Spark engine, see Working with Gluten accelerated Spark engine.
  • Run faster workspace queries by using a Spark job to transform Iceberg table data : To speed up the reading of Iceberg tables, you can now use a Spark job to transform Iceberg table data from Merge-on-Read (MOR) format to Copy-on-Write (COW) format. For more information, see Submitting Spark jobs for MoR to CoW conversion.
  • You can use the Spark API functionality to configure the limit of applications that can be listed and the filter criteria that you can use to filter the Spark applications.
CPDCTL CLI enhancements

This release of watsonx.data introduces the following enhancements to IBM Cloud Pak for Data Command Line Interface (IBM cpdctl):

  • You can use the tablemaint command to execute different Iceberg table maintenance operations in watsonx.data.

  • You can use the wx-data service command to perform various serviceability related operations, such as listing tables, retrieving the list of QHMM enabled buckets, and monitoring QHMM related statistics and queries.

For more information, see IBM cpdctl.

Integration enhancements

This release of watsonx.data introduces the following enhanced integration with other services:

  • New delivery method: Deliver as a table in watsonx.data

Data products using supported data sources can now be delivered to your instance of watsonx.data tables by using the deliver as a table in watsonx.data method. This method allows users with the appropriate permissions to create new tables or append to existing ones. For more information, see Integrating with Data Product Hub.

  • New delivery method: Access in watsonx.data

You can now subscribe to a data product created from the watsonx.data instance by using the access in watsonx.data delivery method. This method lets consumers directly access watsonx.data resources through Data Product Hub. After delivery, consumers will see details on how to access the watsonx.data instance and the specific resources they have access to. For more information, see Integrating with Data Product Hub.

Billing enhancements

This release of watsonx.data introduces the following enhancements to the billing feature:

  • Billing granularity: Users will now be able to view their billing statements in an itemized format, offering greater granularity and transparency
  • Billing accuracy: User billing usage will now be tracked at a per-minute level, replacing the previous high-watermark method
Query History Monitoring and Management (QHMM) enhancement

This release of watsonx.data introduces the following QHMM enhancement:

The Query monitoring page is removed from the Quick start wizard setup and is consolidated with the Configure a bucket page. You can now enable, disable, configure the QHMM storage details directly from the updated Configure a bucket page available in the Quick start wizard. For information about the updated Quick start wizard setup, see Quick start.

Data sources and storage enhancements

This release of watsonx.data includes the following storage enhancement:

You can now use the SQL Server with New Technology LAN Manager (NTLM) authentication and Microsoft Entra authentication. NTLM is a windows based challenge - response authentication method. For more information, see SQL Server.

You can now create the following storage in an active state by default:

  • IBM Cloud Object Storage
  • Amazon S3
  • IBM Storage Ceph
  • MinIO
  • Google Cloud Storage
  • Azure Data Lake Storage
  • Apache Ozone
Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

  • You can use the export functionality to download the existing resource policies and import them into another required environment. This ensures consistency and helps smooth migration. For information about how to use the import export functionality, see Managing user access.
  • A catalog administrator or a user who belongs to a group with an admin role can now remove their access to the catalog. For more infromation about how to remove a user for a component, see Managing user access.
  • Non-admin users has read-only access and can now view the Driver Manager page within the Configurations section. This allows them to see the list of active drivers and their details without needing to consult an administrator. For more information, see Driver manager.
Auditing and tracking enhancements

This release of watsonx.data introduces the following auditing and tracking enhancement:

The list of trackable events now includes detailed activities related to the MDS Thrift server and the MDS Rest server providing insights into how applications and users are interacting with these critical components. For information, see MDS Thrift server events and MDS Rest server events.

Deprecated features

The following features are deprecated in this release:

  • The Milvus APIs that use the REST host (APIs with the /api/v1 prefix) are deprecated as of watsonx.data v2.2.

  • Azure Data Lake Storage (ADLS) Gen1 is now deprecated and will be removed in an upcoming release. You must transition to ADLS Gen2 because ADLS Gen1 is not available.

  • The user authentication method of using ibmlhapikey and ibmlhtoken as the username is now deprecated and shall be removed in a future release. You can use ibmlhapikey_<username> and ibmlhtoken_<username> instead. For more infromation, see Access management and governance in watsonx.data.

10 April 2025 - Version 2.1.2 Hotfix 1

Engine and service enhancements

This release of watsonx.data introduces the following service enhancement:

Introduced Tiny Milvus, a lightweight, single-node deployment of the Milvus vector database, which is tailored for experimentation and early-stage development.

Tiny Milvus provides the core Milvus experience and is designed specifically for use within the watsonx.ai platform. It serves as an entry point for vector-based AI exploration with minimal resource requirements to help ensure effective data management and analysis. It is distinct from other Milvus configurations available within watsonx.data, which support broader scalability and enterprise-grade features.

Tiny Milvus supports up to 10K vectors, making it suitable for quick trials and early experimentation without heavy infrastructure. It is not intended for production workloads.

For more information about using Tiny Milvus, see Setting up a watsonx.data Milvus vector store.

04 April 2025 - Version 2.1.2

Data sources and storage enhancements

This release of watsonx.data includes the following storage enhancement:

Now you can connect to IBM Db2 for i data source. For information about IBM Db2 for i, see IBM Db2 for i.

Connectivity enhancements

This release of watsonx.data includes the following Connectivity enhancement:

You can now securely and privately connect to a watsonx.data instance by using virtual private endpoints. For information about configuring network endpoints in watsonx.data, see Setting up virtual private endpoints.

Integration enhancements

This release of watsonx.data introduces the following enhanced integrations with other services:

  • Now, you can define IBM Knowledge Catalog governance policies for Presto (C++) engine when you integrate with watsonx.data. For information about connecting to IBM Knowledge Catalog (IKC), see Connecting to IBM Knowledge Catalog (IKC).
  • You can now export configuration files for target Presto engine, based on their ODBC driver selection (Simba or CData), to more easily establish connections with watsonx.data. This enhancement saves you from manually configuring Presto engine details by using PowerBI. For more information about connecting to Presto by using the Config files, see Connecting to Presto by using the Config files.
  • Integrating with Data Product Hub: You can integrate watsonx.data with DPH to package SQL tables and queries into data products tailored for specific use cases. For details, see Integrating with Data Product Hub.
Ingestion enhancement

This release of watsonx.data includes the following Ingestion enhancement:

Ingestion jobs using an external Spark engine now provide logs within watsonx.data. This enhancement allows users to effectively identify and troubleshoot job execution directly within the watsonx.data on cloud platform (SaaS instance). The details of the ingestion procedure is available in Ingesting data by using Spark through the web console.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancement:

You can now use the Azure Data Lake Storage Gen2 with AccessKey Authmode with Spark engine to store your data while submitting Spark applications. For information about Azure Data Lake Storage Gen2, see Azure Data Lake Storage.

Query workspace enhancements

This release of watsonx.data introduces the following query workspace enhancement:

You now have the option to cancel one or multiple running queries. Additionally, you can remove queries from the worksheet after they are canceled or successfully completed, making it easier to keep your workspace organized. For more information, see Running SQL queries.

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

  • Administrators can now configure access for IBM Db2 and IBM Netezza. They can assign roles for watsonx.data users to view, edit, and administer the IBM Netezza and IBM Db2 engines. For information about the resource-level permissions, see (Db2 and Netezza).
  • Administrators can now grant or revoke specific permissions to users or roles when creating and viewing their own schemas. For information about data policy rules, see Managing data policy rules.
  • DAS proxy flow, which was previously deprecated, is now removed and is no longer available in watsonx.data.
Query History Monitoring and Management (QHMM) enhancement

This release of watsonx.data introduces the following QHMM enhancements:

  • You can now select the Presto engine that is associated to a QHMM catalog when you configure query monitoring in watsonx.data. For information about configuring QHMM, see Configuring query monitoring.
  • You can now use the migration script to transfer QHMM data from the source bucket to the destination bucket in watsonx.data. For more information about using the migration script, see QHMM Shell Script usage.
CPDCTL CLI enhancements

This release of watsonx.data introduces the following enhancements to IBM Cloud Pak for Data Command Line Interface (IBM cpdctl):

  • Starting in version 2.1.2, the wx-data command is available by default, which enables you to do operations such as, ingesting, managing engines, and so on, in watsonx.data.
  • You can use the wx-data engine create and wx-data engine delete commands to provision and delete all available engines in watsonx.data.
  • You can use the sparkjob command to submit, list, and get the details of a Spark application.
  • INSTANCE_ID used in setting the instance environment is replaced with WX_DATA_INSTANCE_ID.

For more information, see IBM cpdctl.

28 February 2025 - Version 2.1.1

New region availability

watsonx.data is now available in Toronto region for Lite and Enterprise plans. To provision, see Provisioning watsonx.data Lite plan and Provisioning watsonx.data Enterprise plan.

Data sources and storage enhancements

This release of watsonx.data includes the following storage enhancements:

  • Now, you can test connections for the following data sources and storage:

    • Apache Phoenix
    • IBM Data Virtualization Manager
    • BigQuery
    • Google Cloud Storage
  • You can now register and load external pre-existing Hudi and Delta tables on an object storage by using Register table and load table metadata APIs.

Ingestion enhancement

After an ingestion job is completed, you can now access the ingested data directly from the Ingestion History page, which streamlines your workflow and saves time.

Integration enhancements

This release of watsonx.data introduces the following enhanced integrations with other services:

  • The Connection Information page now includes:
    • Presto configuration details for DBT integration. You can copy the Presto configuration details that are required for DBT integration from this page.
    • Option to export the TDS file, which includes the Presto engine configuration details that are required for Tableau integration.

For more information, see Getting connection information.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

  • You can now create a Spark application from the Applications tab of the Spark engine details page. For more information, see Submitting Spark application from Console.
  • You can now use Spark version, 3.5.4 to run the applications in watsonx.data. In watsonx.data, Apache Spark 3.4.4 and Apache Spark 3.5.4 are the supported versions.
  • Milvus allows the following:
    • In Milvus you can now do a hybrid GroupBy search based on multiple vector columns and also customize the group size when you run search queries. For more information, see Connecting watsonx Assistant to watsonx.data Milvus for custom search.
    • Milvus now supports custom size with a capacity of 3 billion vectors with a maximum of 1,024 dimensions.
    • Milvus now allows scaling up or down between predefined T-shirt sizes (small, medium, and large) or custom sizes. For more information, see Adding Milvus service.
  • Starting from watsonx.data 2.1.1 version, Milvus 2.5.0 is supported. For more information, see Milvus.
Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

  • The Access Management Service (AMS) in watsonx.data can now use JSON Web Token (JWT) authentication for incoming requests from Presto, ensuring secure and efficient access control. For more information, see Connecting to Presto engine through Presto CLI (Remote).
  • You can now assign users and roles to infrastructure components in batches of Twenty. For more information, see Managing user access.
  • You can now use Apache Ranger Hadoop SQL policies to govern data with Spark engines. You can define Ranger policies when the Spark engine accesses data from Hadoop clusters. Enabling Ranger policy ensures robust data security and governance. With the Ranger policy, you can configure table authorization (L3), row-level filtering, and column masking for data. For more information, see Enabling Apache Ranger policy for resources.
CPDCTL CLI enhancements

IBM CPDCTL CLI is now used to configure and manage different operations in watsonx.data. Using the CPDCTL CLI, you can manage configuration settings, run ingestion jobs, manage engines, data sources, and storages. The following two plugins are currently used to execute these operations:

  • config - To configure watsonx.data service environment and users.

  • wx-data - To perform other operations such as, ingesting, managing engines, etc in watsonx.data. For more information, see IBM cpdctl.

    watsonx.data developer edition is now enabled in IBM CPDCTL version v1.6.104 and later.

Deprecated features

The following features are deprecated in this release:

  • The Data Access Service (DAS) proxy feature is now deprecated and will be removed in a future release. You cannot use the Data Access Service (DAS) proxy feature to access object storage (S3, ADLS and ABS). If you use DAS proxy flow and face any issues, contact IBM support. For an overview of the DAS feature, see Data Access Service (DAS).

  • IBM Client package is now deprecated and shall be removed in a future release. The utilities and commands in Client package is replaced with IBM CPDCTL CLI. For more information about how to use IBM CPDCTL CLI, see IBM cpdctl.

04 February 2025 - Version 2.1.0 Hotfix 2

Lite plan enhancement
IBM® watsonx.data Lite plan is now available in the Sydney region. For more information to provision a Lite plan instance in Sydney region, see Provisioning Lite plan.

10 January 2025 - Version 2.1.0 Hotfix 1

Enterprise plan enhancement
If you use IBM Cloud CLI to provision an Enterprise plan instance in the Sydney region, you must use the plan name lakehouse-enterprise-mcsp. For more information, see Provision an instance through CLI.

13 December 2024 - Version 2.1.0

Data sources and storage enhancements

This release includes the following new data sources and storage enhancements:

  • Now you can connect to Apache Phoenix data sources. For more information, see Apache Phoenix

  • If you work with MySQL data sources, now you can manage drivers in the Driver manager section of the Configurations page. Each of these drivers goes through a series of validation steps. You can no longer test MySQL connections. For more information, see MySQL.

When you upgrade to version 2.1.0, any existing MySQL catalog is no longer linked to the engine. This means that you need to reestablish the connection between the MySQL catalog and the engine.

  • Test connection feature is now available for the following data sources supported by Arrow Flight service:

    • Apache Derby
    • Salesforce
    • Greenplum
    • MariaDB
  • Now you can test connection for Azure Data Lake Storage (ADLS) and IBM Data Virtualization Manager for z/OS data source.

Integration enhancements

This release of watsonx.data introduces the following new or enhanced integrations with other services:

  • You can now enable Databand connection from the Configurations page. For more information, see Monitoring Spark application runs by using Databand.

  • You can now retrieve the Presto connection information from the watsonx.data instance > Configurations > Connection information page for the following integration:

    • BI tools
    • DataBuildTool (dbt)
  • Starting with watsonx.data version 2.1, you can only integrate with one of the following policy engines:

    • Apache Ranger
    • IBM Knowledge Catalog (IKC)

For more information, see Connection information.

  • You can now integrate IBM Manta Data Lineage with watsonx.data to capture and publish jobs, runs, and dataset events from Spark through the Manta UI. For more information, see IBM Manta Data Lineage.

  • You can now use all of the Presto data types with the dbt adapter for Presto. Specify the data type as column_types in the dbt_project.yml. For more information, see Installing and using dbt-watsonx-presto.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

Query history information by using ibm-lh utility

You can get the following Query history information by using ibm-lh utility:

  • Basic query information.
  • Basic error information of failed queries.
  • Query stats information.
  • Query memory information.
  • Query garbage collection information.
  • Top time taken query.
  • Memory usage details of queries.
  • Information after joining the two tables.
  • Information containing all the columns of a table.
  • Information about the errors in the query.
  • Count of all error codes.
  • Count of all failure messages.
  • Count of all failure types.

For more information, see Retrieving QHMM logs by using ibm-lh utility.

Ingestion enhancements

This release of watsonx.data introduces the following ingestion enhancements:

  • Target table preview: Before submitting an ingestion job, users can now preview the target table schema and edit the column headers and data types. This allows for validation and ensures data is ingested into the correct table structure. For more information, see Ingesting data by using Spark through the web console.

  • Java/Spark-based ingestion for table creation: The Data Manager now includes an option to create tables using the Java/Spark-based ingestion flow navigating to Local ingestion, providing flexibility and control based on file size and other factors. For more information, see Creating table and Ingesting data by using Spark through the web console.

  • Enhanced source storage support:

    • Azure Data Lake Storage (ADLS): Support for ingesting data directly from ADLS is now available.
    • Google Cloud Storage (GCS): Support for ingesting data directly from GCS is now available.
  • Transient storage: Users can now select the external bucket to use as a staging area for local ingestions. If no storage is specified, watsonx.data can infer and select an appropriate bucket. For more information, see Ingesting data by using Spark through the web console.

Introduction to Metadata Service (MDS)

Starting from the 2.1 release, watsonx.data uses Metadata Service (MDS) instead of Hive Metastore (HMS). MDS is compatible with modern, open catalog APIs, Unity Catalog API, and Apache Iceberg REST Catalog API, enabling wider tool integration and increased flexibility. This new architecture delivers comparable performance while it continues to support Spark and Presto clients through the existing Thrift or HMS interface. For more information, see Metadata Service (MDS) overview.

It is recommended to use MDS in your test environments and then move to using it in production.

Deprecated features

The following feature is deprecated in this release:

  • The REST API feature to capture DDL changes in watsonx.data through the event listener will be deprecated from watsonx.data release version 2.1.

13 November 2024 - Version 2.0.4 Hotfix

Lite plan enhancements

This hotfix release includes the following Lite plan enhancements:

  • Lite plan now includes a dedicated read-only sample IBM COS storage associated to the Presto engine to support querying sample and benchmarking data.

  • You can now work with tpcds sample worksheets for high performance use cases and Gosales sample worksheet for Data engineering and GenAI use cases.

  • Query Optimizer is now automatically enabled for High Performance BI use cases.

29 October 2024 - Version 2.0.4

Engine and service enhancements

This release includes the following engine and service enhancements:

  • The default value of the task.max-drivers-per-task property for Presto (Java) and Presto (C++) workers is now set based on the number of vCPUs.

  • You can enable the file pruning functionality in Query History Monitoring and Management (QHMM) from the Query monitoring page. You can also configure the maximum size and threshold percentage for the QHMM storage bucket. When the threshold is met during file upload or when a cleanup scheduler runs (default every 24 hours), older data is deleted. For more information, see Configuring query monitoring.

  • Query History Monitoring and Management (QHMM) no longer stores the diagnostic data in the default IBM Managed trial bucket (wxd-system). To store the diagnostic data, you must now use a storage type supported for QHMM. For more information about using your own storage, see Configuring query monitoring.

  • You can now verify query optimization status by checking the wxdQueryOptimized parameter in the JSON file. For more information, see Running queries from the Presto (C++) CLI or Query workspace.

Data sources enhancements

This release includes the following data sources and storage enhancements:

  • Test connection feature is now available for the following data sources:

    • Apache Pinot
    • Cassandra
    • Prometheus
  • New data source SAP HANA is now available. You can use Driver manager under the Configurations page to manage drivers for SAP HANA data source. Each of these drivers undergoes a series of validations.

Lite plan

To enhance usability, the system catalogs (cmx and system) are now hidden for Lite plan users. The Lite plan instance with Presto (C++) engine includes tpch as the benchmarking catalog and the instance with Presto (Java) engine include tpch and tpcds as the benchmarking catalogs.

Deprecated features

The following features are deprecated in this release:

  • The REST API feature to capture DDL changes in watsonx.data through event listener is deprecated in this release and will be removed from watsonx.data with version 2.1 release.

  • Support for Apache Spark 3.3 runtime is deprecated. You must upgrade to Spark 3.4. To update the Apache Spark version, see Editing the Spark engine details.

25 September 2024 - Version 2.0.3

Data sources and storage enhancements

This release includes the following new data sources and storage enhancements:

  • You can now enable Azure Data Lake Storage Gen1 Blob and Google Cloud Storage for Milvus. For more information, see ADLS Gen1 Blob and Google Cloud Storage.

  • You can create or add a new data source to the engine without attaching a catalog to it. A catalog can be attached to the data source at a later stage.

  • You can now use Apache Ozone storage for the Presto (Java) engine. For more information, see Apache Ozone.

  • You can now configure the Apache Kafka data source to use the Salted Challenge Response Authentication Mechanism (SCRAM) authentication mechanism. You can upload a self-signed certificate. For more information, see Apache Kafka.

Integration enhancements

This release of watsonx.data introduces the following new or enhanced integrations with other services:

  • You can now integrate watsonx.data with data build tool (dbt) for Spark engine for in-place data transformation within watsonx.data. For more information, see About dbt integration.

  • You can integrate watsonx.data with Databand. This integration can enhance the monitoring capabilities by providing insights that extend beyond Spark UI and Spark History. For more information, see Monitoring Spark application runs by using Databand.

  • You can integrate watsonx.data with the following Business Intelligence (BI) visualization tools to access the connected data sources and build compelling and interactive data visualizations:

    • Tableau
    • Looker
    • Domo
    • Qlik
    • PowerBI

    For more information, see About BI visualization tools.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

  • Iceberg tables are supported by Query Optimizer. For more information, see Query Optimizer.

  • You can now use the data build tool (dbt-watsonx-presto) adapter to build, test, and document data models for the Presto (Java) engine. For more information, see dbt-watsonx-presto.

  • A new customization property (file-column-names-read-as-lower-case) is now available for Presto (C++) engine to avoid upper case and lower case mismatch in columns names. For more information, see Catalog properties for Presto (C++).

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

  • You can now add users and user groups to define data policy rules. For more information, see Data policy.

  • Administrators can now select TPCDS and TPCH catalogs to create access control policies. ‘Select’ is the only allowed operation to define rules with these catalogs. To define data policies, see Data policy.

  • Administrators can now edit resource group configuration after creating the resource group. For more information, see Configuring Presto resource groups.

IBM Knowledge Catalog governance policies for data sources

You can now apply IBM Knowledge Catalog governance policies to the following data sources in Presto:

  • Oracle
  • PostgreSQL
  • MySQL
  • SQL Server
  • Db2
Ingestion enhancements

This release of watsonx.data includes the following improvements to the ingestion workflow:

Lite plan

You can provision your Lite plan instance based on the following three use cases. Select one use case from the list to proceed:

  • Generative AI : You can explore Generative AI use cases using this option. The provisioned instance includes Presto, Milvus, and Spark.
  • High Performance BI : You can explore BI visualization functionalities using this option. The provisioned instance includes Presto (C++) and Spark.
  • Data Engineering Workloads : You can use data engineering workload to explore various workload driven use cases. The provisioned instance includes Presto (Java) and Spark.

For more information, see Lite plan.

27 August 2024 - Version 2.0.2

Data sources and storage enhancements

This release includes the following new data sources and storage enhancements:

  • Content Aware Storage (CAS) is now called Data Access Service (DAS).

  • Apache Hive is upgraded to version 4.0.0.

  • You can now view the DAS endpoint from the Storage details page. For more information, see Exploring storage objects.

Integration enhancements

This release of watsonx.data introduces the following new or enhanced integrations with other services:

  • You can now use the governance capabilities of IBM Knowledge Catalog for SQL views within the watsonx.data platform. For more information, see Integrating with IBM Knowledge Catalog (IKC).

  • IBM watsonx.data now supports Apache Ranger policies to govern data with Presto (C++) engines. For more information, see Apache Ranger policy.

Engine and service enhancements

This release of watsonx.data introduces the following engine and service enhancements:

  • Instance administrators can now configure resource groups in Presto. For more information, see Resource groups.

  • You can now use an API to execute queries and retrieve results. For more information, see API.

  • You can now configure or change the log level of Presto (Java) through API customization. For more information, API.

  • You can now generate Number of Distinct Values (NDV) column statistics with the Iceberg Spark Analyze procedure to enhance the Spark Cost-Based Optimizer (CBO) for improved query planning.

  • You can now use the custom data source option to connect to Black Hole and Local File connectors for the Presto (Java) engine. For more information, see Custom data source.

  • You can now generate JSON snippet for Presto engine and Milvus service. You can copy/paste it over to the watsonx.data Presto and Milvus connector UI in IBM Cloud Pak for Data and watsonx to simplify the connection creation. For more information, see Getting connection information.

Access management enhancements

This release of watsonx.data introduces the following access management enhancements:

Ingestion enhancements

This release of watsonx.data introduces the following ingestion enhancements:

01 August 2024 - Version 2.0.1

Data sources

  • You can now connect to Db2 data sources by using IBM API key as the authentication mechanism. For more information, see IBM Db2.
  • Presto (C++) engine can now be associated with Arrow Flight service data sources. Read only operations are supported. The following Arrow Flight service data sources are supported:
    • Salesforce
    • MariaDB
    • Greenplum
    • Apache Derby

For more information, see Arrow Flight service.

  • The following new databases are available for Presto (Java) engine:

Integrations

  • When integrating IBM Knowledge Catalog with IBM watsonx.data, you can configure data protection rules for individual rows in a table, allowing users to access a subset of rows in a table. For more information, see Filtering rows.

  • You can now apply the following Apache Ranger policies for Presto (Java) engines:

  • You can now integrate IBM watsonx.data with on-premises IBM DataStage. You can use DataStage service to load and to read data from IBM watsonx.data. For more information, Integrating with DataStage.

Authentication and authorization

  • The Spark access control extension allows additional authorization, enhancing security at the time of application submission. If you enable the extension in the spark configuration, only authorized users are allowed to access and operate IBM watsonx.data catalogs through Spark jobs. For more information, see Enhancing Spark application submission using Spark access control extension.

  • IBM watsonx.data now supports object storage proxy and signature for Azure Data Lake Storage and Azure Blob Storage. For more information, see Using DAS proxy to access ADLS and ABS compatible buckets.

  • Lightweight Directory Access Protocol (LDAP) is now provided for Teradata and Db2 data sources. The user needs to set up this configuration at the server level. For Teradata, explicitly choose the authentication mechanism type as LDAP in the UI. For more information, Teradata.

DAS proxy to access ADLS and ABS buckets and LDAP enhancements are Tech preview in version 2.0.1.

  • Milvus now supports partition-level isolation for users. Administrators can authorize specific user actions on partitions. For more information, see Service (Milvus).

Storage

  • You can now add the following storage to Presto (Java) engine in IBM watsonx.data:
    • Azure Data Lake Storage Gen2
    • Azure Data Lake Storage Gen1 Blob

For more information, see Azure Data Lake Storage Gen2 and Azure Data Lake Storage Gen1 Blob.

  • You can modify the access key and secret key of a user-registered bucket for a storage. This feature is not applicable to default buckets, ADLS, or Google Cloud Storage. This feature can only be used if the new credentials successfully pass the test connection.

Engines

  • You can now use the ALTER TABLE ADD, DROP, and RENAME column statements for MongoDB data source.
  • You can now configure how Presto handles unsupported data types. For more information, see ignore-unsupported-datatypes.

Catalogs

  • You can now associate and disassociate catalogs to an engine in bulk through UI under Manage associations in the Infrastructure manager page.

API Customization and properties

Infrastructure manager

  • You can use search feature for the following values on the Infrastructure manager page:
    • database name
    • registered hostname
    • created by username
  • You can now use the ‘Do Not Disturb’ toggle switch in the Notifications section under the bell icon to enable or disable pop-up notifications.
  • You can find the connectivity information under the Connect information tile in the Configurations page. This information can be copied and downloaded to a JSON snippet.

Query Workspace

  • You can run queries on all tables under a schema through the SQL query workspace without specifying the path <catalog>.<schema> by selecting the required catalogs and schemas from the new drop down list. For more information, Running SQL queries.

watsonx.data pricing plans

  • You can now delete the existing Lite plan instance before reaching the account cap limit of 2000 RUs, and create a new instance and consume the remaining resource units available in the account. For more information, see watsonx.data Lite plan.

03 July 2024 - Version 2.0.0

New data types for data sources

The following new data types are now available for some data sources. You can access these data types on the Data manager page under the Add column option.

  • BLOB

    • Db2
    • Teradata
    • Oracle
    • MySQL
    • SingleStore
  • CLOB

    • Db2
    • Teradata
    • Oracle
  • BINARY

    • SQL Server
    • MySQL

Because the numeric data type is not supported in watsonx.data, you can use the decimal data type as an equivalent alternative to the numeric data type for Netezza data source.

You can now use the BLOB and CLOB data types with the SELECT statement in the Query workspace to build and run queries against your data for Oracle and SingleStore data sources.

You can now use the BLOB and CLOB data types for MySQL and PostgreSQL data sources as equivalents to LONGTEXT, BYTEA, and TEXT because these data types are not compatible with Presto (Java). These data types are mapped to CLOB and BLOB in Presto (Java) if data sources have existing tables with LONGTEXT, TEXT, and BYTEA data types.

  • MySQL (CLOB as equivalent to LONGTEXT)
  • PostgreSQL (CLOB as equivalent to TEXT)
  • PostgreSQL (BLOB as equivalent to BYTEA)
  • Netezza (decimal as equivalent to numeric)
  • Oracle (BLOB and CLOB with the SELECT statement)
  • SingleStore (BLOB and CLOB with the SELECT statement)

New operations for Db2 data source

You can perform the following operations for BLOB and CLOB data types for Db2 data source:

  • INSERT
  • CREATE
  • CTAS
  • ALTER
  • DROP

New Arrow Flight service based data sources

You can now use the following data sources with Arrow Flight service:

  • Greenplum
  • Salesforce
  • MariaDB
  • Apache Derby

For more information, see Arrow Flight service.

New data sources

You can now use the following data sources:

  • Cassandra
  • BigQuery
  • ClickHouse
  • Apache Pinot

For more information, see Adding a database-catalog pair.

Command to retrieve ingestion history

You can now retrieve the status of all ingestion jobs that are submitted by using the ibm-lh get-status --all-jobs CLI command. You can retrieve the status of all ingestion jobs that are submitted. You get the history records that you have access to. For more information, see Options and parameters supported in ibm-lh tool.

Additional roles for IBM Knowledge Catalog (IKC) S2S authorization

Besides data access, IBM Knowledge Catalog S2S authorization needs metadata access and Console API access to integrate with watsonx.data. The following new roles are created for IKC service access configuration:

  • Viewer
  • Metastore viewer

Apache Ranger policies

IBM watsonx.data now supports Apache Ranger policies to allow integration with Presto engines. For more information, see Apache Ranger policy.

Version upgrade

  • Presto (Java) engine is now upgraded to version 0.286.
  • Milvus service is now upgraded to version to 2.4.0. Important features include:
    • Better Performance (Low Memory Utilisation)
    • Support Sparse Data
    • Inbuilt SPLADE Engine for Sparse Vector Embedding
    • BGE M3 Hybrid (Dense+Sparse) Search

Hive Metastore (HMS) access in watsonx.data

You can now fetch metadata information for Hive Metastore by using REST APIs instead of getting the information from the engine details. HMS details are used by external entities to integrate with watsonx.data. You must have an Admin, Metastore Admin, or Metastore Viewer role to run the API.

Semantic automation for data enrichment

Semantic automation for data enrichment leverages generative AI with IBM Knowledge Catalog to understand your data on a deeper level and enhance data with automated enrichment to make it valuable for analysis. Semantic layer integration is available for Lite plan users only as a 30 days trial version. For more information, see Semantic automation for data enrichment in watsonx.data.

Query Optimizer to improve query performance

You can now use Query Optimizer, to improve the performance of queries that are processed by the Presto (C++) engine. If Query Optimizer determines that optimization is feasible, the query undergoes rewriting; otherwise, the native engine optimization takes precedence. For more information, see Query Optimizer overview.

New name for Presto engine in watsonx.data

Presto is renamed to Presto (Java).

New engine (Presto C++) in watsonx.data

You can provision a Presto (C++) engine ( version 0.286) in watsonx.data to run SQL queries on your data source and fetch the queried data. For more information, see Presto (C++) overview.

Using proxy to access S3 and S3 compatible buckets

External applications and query engines can access the S3 and S3 compatible buckets managed by watsonx.data through an S3 proxy. For more information, see Using S3 proxy to access S3 and S3 compatible buckets.

Mixed case feature flag for Presto (Java) engine

The mixed case feature flag, which allows to switch between case sensitive and case insensitive behavior in Presto (Java), is available. The flag is set to OFF by default and can be set to ON during the deployment of watsonx.data. For more information, see Presto (Java) mixed-case support overview.

New storage type Google Cloud Storage

You can now use new storage type Google Cloud Storage. For more information, see Adding storage-catalog pair.

31 May 2024 - Version 1.1.5

Provision Spark engine in watsonx.data Lite plan

You can now add a small-sized Spark engine (single node) in the watsonx.data Lite plan instance. For more information, see watsonx.data Lite plan.

Updates related to Spark labs

  • Working with Jupyter Notebooks from Spark labs

: You can now install the Jupyter extension from the VS Code Marketplace inside your Spark lab and work with Jupyter Notebooks. For more information, see Create Jupyter Notebooks.

  • Accessing Spark UI from Spark labs

You can now access the Spark user interface (UI) from Spark labs to monitor various aspects of running a Spark application. For more information, see Accessing Spark UI from Spark labs.

New region to provision for IBM Cloud instance

You can now provision your IBM Cloud instance in the Sydney region.

30 Apr 2024 - Version 1.1.4

A new version of watsonx.data was released in April 2024.

This release includes the following features and updates:

Kerberos authentication for HDFS connections

You can now enable Kerberos authentication for secure Apache Hadoop Distributed File System (HDFS) connections. For more information, see HDFS.

New data sources

The following new data sources are now available:

  • Oracle
  • Amazon Redshift
  • Informix
  • Prometheus

For more information, see Data sources.

Test SSL connections

You can now test SSL connections for the MongoDB and SingleStore data sources.

Uploading description files for Apache Kafka data source

The Apache Kafka data source stores data as byte messages that producers and consumers must interpret. To query this data, consumers must first map it into columns. Now, you can upload topic description files that convert raw data into a table format. Each file must be a JSON file that contains a definition for a table. To upload these JSON files from the UI, go to the overview page of the Apache Kafka database that you registered and select the Add topic option. For more information, see Apache Kafka.

License plans for watsonx.data

IBM® watsonx.data now offers the following license plans.

  • Lite plan
  • Enterprise plan

For more information about the different license plans, see IBM® watsonx.data pricing plans.

Presto (Java) engine version upgrade

The Presto (Java) engine is now upgraded to version 0.285.1.

Pause or resume Milvus

You can now pause or resume Milvus service. Pausing your service can avoid incurring charges.

Spark is now available as a native engine

In addition to registering external Spark engines, you can now provision native Spark engine on your IBM watsonx.data instance. With native Spark engine, you can fully manage Spark Engine configuration, manage access to Spark Engines and view applications by using watsonx.data UI and REST API endpoints. For more information, see Provisioning Native Spark engine.

Ingest data using native Spark Engines

You can now submit ingestion jobs using native Spark Engines. For more information, see Working with different table formats.

27 Mar 2024 - Version 1.1.3

A new version of watsonx.data was released in March 2024.

This release includes the following features and updates:

New data type for some data sources

You can now use the BINARY data type with the SELECT statement in the Query workspace to build and run queries against your data for the following data sources:

  • Elasticsearch
  • SQL Server
  • MySQL

New data types: BLOB and CLOB are available for MySQL, PostgreSQL, Snowflake, SQL Server, and Db2 data sources. You can use these data types only with SELECT statements in the Query workspace to build and run queries against your data.

Delete data by using the DELETE FROM feature for Iceberg data sources

You can now delete data from tables in Iceberg data sources by using the DELETE FROM feature.

You can specify the table property delete mode for new tables by using either copy-on-write mode or merge-on-read mode (default).

ALTER VIEW statement for Iceberg data source

You can now use the following SQL statement in the Query workspace to build and run queries against your data for ALTER VIEW:

ALTER VIEW name RENAME TO new_name

Upload SSL certificates for Netezza Performance Server data sources

You can now browse and upload the SSL certificate for SSL connections in Netezza Performance Server data sources. The valid file formats for SSL certificate are .pem, .crt, and .cer. You can upload SSL certificates by using the Adding a database-catalog pair option in the Infrastructure manager.

Query data from Db2 and Watson Query

You can now query nicknames that are created in Db2 and virtualized tables from Watson Query instances.

SSL connection for IBM Data Virtualization Manager for z/OS data source

You can now enable SSL connection for the IBM Data Virtualization Manager for z/OS data source by using the Add database user interface to secure and encrypt the database connection. Select Validate certificate to validate whether the SSL certificate that is returned by the host is trusted. You can choose to provide the hostname in the SSL certificate.

Use data from Apache Hudi catalog

You can now connect to and use data from Apache Hudi catalog.

Add Milvus as a service in watsonx.data

You can now provision Milvus as a service in watsonx.data with the following features:

  • Provision different storage variants such as starter, medium, and large nodes.

  • Assign Admin or User roles for Milvus users: User access policy is now available for Milvus users. Using the Access Control UI, you can assign Admin or User roles for Milvus users and also grant, revoke, or update the privilege.

  • Configure the Object storage for Milvus to store data. You can add or configure a custom bucket and specify the username, password, region, and bucket URL.

For more information, see Milvus.

Load data in batch by using the ibm-lh ingestion tool

You can now use the ibm-lh ingestion tool to run batch ingestion procedures in non-interactive mode (from outside the ibm-lh-tools container), by using the ibm-lh-client package. For more information, see ibm-lh commands and usage.

Creating schema by using bulk ingestion in web console

You can now create a schema by using the bulk ingestion process in the web console, if the schema is not previously created.

Use time-travel queries in Apache Iceberg tables

You can now run the following time-travel queries by using branches and tags in Apache Iceberg table snapshots:

- SELECT *FROM <table name> FOR VERSION AS OF 'historical-tag'

- SELECT *FROM <table name> FOR VERSION AS OF 'test-branch'

Access Cloud Object Storage without credentials You can now access your Cloud Object Storage bucket without credentials, by using the Data Access Service (DAS) endpoint.

28 Feb 2024 - Version 1.1.2

A new version of watsonx.data was released in February 2024.

This release includes the following features and updates:

SSL connection for data sources

You can now enable SSL connection for the following data sources by using the Add database user interface to secure and encrypt the database connection. :

  • Db2

  • PostgreSQL

For more information, see Adding a database.

Secure ingestion job history

Now, users can view only their own ingestion job history. Administrators can view the ingestion job history for all users.

SQL enhancements

You can now use the following SQL statements in the Query workspace to build and run queries against your data:

  • Apache Iceberg data sources
    • CREATE VIEW
    • DROP VIEW
  • MongoDB data sources
    • DELETE

New data types BLOB and CLOB for Teradata data source

New data types BLOB and CLOB are available for Teradata data source. You can use these data types only with SELECT statements in the Query workspace to build and run queries against your data.

Create a new table during data ingestion

Previously, you had to have a target table in watsonx.data for ingesting data. Now, you can create a new table directly from the source data file (available in parquet or CSV format) by using data ingestion from the Data Manager. You can create the table by using the following methods of ingestion:

  • Ingesting data by using Iceberg copy loader.

  • Ingesting data by using Spark.

Perform ALTER TABLE operations on a column

With an Iceberg data source, you can now perform ALTER TABLE operations on a column for the following data type conversions:

  • int to bigint

  • float to double

  • decimal (num1, dec_digits) to decimal (num2, dec_digits), where num2>num1.

Better query performance by using sorted files

With an Apache Iceberg data source, you can generate sorted files, which reduce the query result latency and improve the performance of Presto (Java). Data in the Iceberg table is sorted during the writing process within each file.

You can configure the order to sort the data by using the sorted_by table property. When you create the table, specify an array of one or more columns involved in sorting. To disable the feature, set the session property sorted_writing_enabled to false.

31 Jan 2024 - Version 1.1.1

A new version of watsonx.data was released in January 2024.

This release includes the following features and updates:

IBM Data Virtualization Manager for z/OS® connector

You can now use the new IBM Data Virtualization Manager for z/OS® connector to read and write IBM Z® without moving, replicating, or transforming the data. For more information, see Connecting to an IBM Data Virtualization Manager (DVM) data source.

Teradata connector is enabled for multiple ALTER TABLE statements

Teradata connector now supports the ALTER TABLE RENAME TO, ALTER TABLE DROP COLUMN, and ALTER TABLE RENAME COLUMN column_name TO new_column_name statements.

Support for time travel queries

Iceberg connector for Presto (Java) now supports time travel queries.

The property format_version now shows the current version

The property format_version now shows the correct value (current version) when you create an Iceberg table.

29 Nov 2023 - Version 1.1.0

A new version of watsonx.data was released in November 2023.

This release includes the following features and updates:

Presto (Java) case-sensitive behavior

The Presto (Java) behavior is changed from case-insensitive to case-sensitive. Now you can provide the object names in the original case format as in the database. For more information, see Case-sensitive search configuration with Presto (Java).

Roll-back feature

You can use the Rollback feature to rollback or rollforward to any snapshots for Iceberg tables.

Capture Data Definition Language (DDL) changes

You can now capture and track the DDL changes in watsonx.data by using an event listener.

Ingest data by using Spark

You can now use the IBM Analytics Engine that is powered by Apache Spark to run ingestion jobs in watsonx.data.

For more information, see Ingesting data by using Spark.

Integration with Db2 and Netezza Performance Server

You can now register Db2 or Netezza Performance Server engines in watsonx.data console.

For more information, see Registering an engine.

New connectors

You can now use connectors in watsonx.data to establish connections to the following types of databases:

  • Teradata
  • Delta Lake
  • Elasticsearch
  • SingleStoreDB
  • Snowflake

For more information, see Adding a database.

AWS EMR for Spark

You can now run Spark applications from Amazon Web Services Elastic MapReduce (AWS EMR) to achieve the watsonx.data Spark use cases:

  • Data ingestion
  • Data querying
  • Table maintenance

For more information, see Using AWS EMR for Spark use case.

7 July 2023 - Version 1.0.0

watsonx.data is a new open architecture that combines the elements of the data warehouse and data lake models. The best-in-class features and optimizations available on the watsonx.data make it an optimal choice for next generation data analytics and automation. In the first release (watsonx.data 1.0.0), the following features are supported:

  • Creating, scaling, pausing, resuming, and deleting the Presto (Java) query engine
  • Associating and dissociating a catalog with an engine
  • Exploring catalog objects
  • Adding and deleting a database-catalog pair
  • Updating database credentials
  • Adding and deleting bucket-catalog pair
  • Exploring bucket objects
  • Loading data
  • Exploring data
  • Querying data
  • Query history