Integrating Confluent Tableflow in watsonx.data

You can integrate Confluent Tableflow with IBM® watsonx.data to enable zero-copy querying of remote data. Confluent offers a data platform that acts as a central nervous system for real-time data, enabling businesses to connect, store, and manage data across cloud and on-premise environments.

Confluent Tableflow automatically converts Apache Kafka topics into ready-to-query Apache Iceberg tables, enabling zero-copy, real-time analytics through data federation. It eliminates complex data pipelines by materializing data in user-owned or managed storage with automated maintenance.

How it works

Create a Kafka cluster in Confluent Cloud.
Create topics to stream your data.
Enable Tableflow for topics to convert them into Iceberg tables.
Query the remote tables using watsonx.data Spark or Presto engines without copying data.

Storage options

Confluent managed storage: Confluent automatically provisions and manages AWS S3 storage. No additional setup required.
Customer integration: Use your own cloud storage (AWS S3, Azure Blob, or Google Cloud Storage) with full control over data location and access.

Key features

Zero-copy data access
Real-time data availability in Iceberg format
Automatic schema evolution
Query federation through watsonx.data
Integration with watsonx.data compute engines

Important limitations

Tableflow tables are read-only from external compute engines.
Write operations (INSERT, CREATE TABLE, UPDATE, DELETE) are not supported when querying through watsonx.data.
Data can only be modified by publishing messages to the source Kafka topic

For more information, see Confluent TableFlow documentation.

Integrating Confluent Tableflow in watsonx.data

How it works

Storage options

Key features

Important limitations

Next steps