IBM® watsonx.data overview

IBM® watsonx.data is a hybrid, open data lakehouse to power AI and analytics with all your data anywhere. It combines the elements of the data warehouse and data lakes to bring the best-in-class features and optimizations making watsonx.data it an optimal choice for next generation AI, data analytics and automation.

It helps your organization to break down data silos and unlock value without business disruption, there by unifying all your data for AI and analytics. It also augments your current data investments with an open modern data stack.

It allows co-existence of open-source technologies and proprietary products and offers a single platform where you can store the data or attach your current data sources for managing and analyzing your enterprise data. Attaching your data sources helps to reduce data duplication and cost of storing data in multiple places.

It uses open data formats with APIs and machine learning libraries, making it easier for data scientists and data engineers to use the data. architecture enforces schema and data integrity, making it easier to implement robust data security and governance mechanisms.

You can use watsonx.data to store any type of data (structured, semi-structured, and unstructured) and make that data accessible directly for Artificial Intelligence (AI) and Business Intelligence (BI). It uses open data formats with APIs and machine learning libraries, making it easier for data scientists and data engineers to use the data. watsonx.data architecture enforces schema and data integrity, making it easier to implement robust data security and governance mechanisms.

Key features

An architecture that fully separates compute, metadata, and storage to offer ultimate flexibility.
Multiple engines such as Presto (Java), Presto (C++), Spark, and Milvus for different use cases that provide fast, reliable, and efficient processing of big data at scale.
Open formats for analytic data sets, allowing different engines to access and share the data at the same time.
Data sharing between watsonx.data, Db2® Warehouse, and Netezza Performance Server or any other data management solution through common Iceberg table format support, connectors, and a shareable metadata store.
Built-in governance that is compatible with existing solutions, including IBM Knowledge Catalog and Apache Ranger.
Cost-effective, simple object storage available across hybrid-cloud and multi-cloud environments.
Integration with a robust ecosystem of IBM’s best-in-class solutions and third-party services to enable easy development and deployment of key use cases.

Deployment options

watsonx.data is available with the following deployment options:

Software – The on-prem version of watsonx.data can be deployed on IBM Software Hub. For more details, see IBM watsonx.data on IBM Software Hub.
SaaS – The SaaS version of watsonx.data can be deployed on IBM Cloud or AWS cloud. For more details, see:
- IBM Cloud
- AWS
Developer – The Developer version is an entry-level watsonx.data for the students, developers and partner community. For more details, see Setting up watsonx.data developer edition