Working with watsonx.data
About watsonx.data
IBM® watsonx.data is a data management solution for collecting, storing, querying, and analyzing all your enterprise data with a single unified data platform. It provides a flexible and reliable platform that is optimized to work on open data formats. The key features of IBM® watsonx.data include :
- An architecture that fully separates compute, metadata, and storage to offer ultimate flexibility.
- A distributed query engine based on Presto, which is designed to handle modern data formats that are highly elastic and scalable.
- Data sharing between watsonx.data, IBM Db2 Warehouse SaaS, and NPSaaS or any other data management solution through common Iceberg table format support, connectors, and a shareable metadata store.
To provision a watsonx.data instance, see Getting started with watsonx.data.
Use cases for Analytics Engine
You need an IBM Analytics Engine Spark instance to work with watsonx.data to achieve the following specific use-cases:
- Ingesting large volumes of data into watsonx.data tables (S3, COS or compatible storage). You can also cleanse and transform data by using Spark procedural code before the ingestion. You can query the data from tables by using the available engines from watsonx.data.
- Table maintenance operations to enhance watsonx.data performance of the tables. Using the Iceberg table format, you can use Spark to perform operations such as file compaction, snapshot cleanup, removal of orphan files, schema evolution.
- For complex analytics that are difficult to represent as queries, Spark procedural programming is a suitable solution for data transformation.
To get started with watsonx.data and IBM Analytics Engine Serverless Spark, see Provisioning an Analytics Engine instance.