Integrating Confluent Tableflow in watsonx.data
You can integrate Confluent Tableflow with IBM® watsonx.data to enable zero-copy querying of remote data. Confluent offers a data platform that acts as a central nervous system for real-time data, enabling businesses to connect, store, and manage data across cloud and on-premise environments.
Confluent Tableflow automatically converts Apache Kafka topics into ready-to-query Apache Iceberg tables, enabling zero-copy, real-time analytics through data federation. It eliminates complex data pipelines by materializing data in user-owned or managed storage with automated maintenance.
How it works
- Create a Kafka cluster in Confluent Cloud.
- Create topics to stream your data.
- Enable Tableflow for topics to convert them into Iceberg tables.
- Query the remote tables using watsonx.data Spark or Presto engines without copying data.
Storage options
- Confluent managed storage: Confluent automatically provisions and manages AWS S3 storage. No additional setup required.
- Customer integration: Use your own cloud storage (AWS S3, Azure Blob, or Google Cloud Storage) with full control over data location and access.
Key features
- Zero-copy data access
- Real-time data availability in Iceberg format
- Automatic schema evolution
- Query federation through watsonx.data
- Integration with watsonx.data compute engines
Important limitations
- Tableflow tables are read-only from external compute engines.
- Write operations (
INSERT,CREATE TABLE,UPDATE,DELETE) are not supported when querying through watsonx.data. - Data can only be modified by publishing messages to the source Kafka topic
For more information, see Confluent TableFlow documentation.