IBM Cloud Docs
Integrating Confluent Tableflow in watsonx.data

Integrating Confluent Tableflow in watsonx.data

You can integrate Confluent Tableflow with IBM® watsonx.data to enable zero-copy querying of remote data. Confluent offers a data platform that acts as a central nervous system for real-time data, enabling businesses to connect, store, and manage data across cloud and on-premise environments.

Confluent Tableflow automatically converts Apache Kafka topics into ready-to-query Apache Iceberg tables, enabling zero-copy, real-time analytics through data federation. It eliminates complex data pipelines by materializing data in user-owned or managed storage with automated maintenance.

How it works

  1. Create a Kafka cluster in Confluent Cloud.
  2. Create topics to stream your data.
  3. Enable Tableflow for topics to convert them into Iceberg tables.
  4. Query the remote tables using watsonx.data Spark or Presto engines without copying data.

Storage options

  • Confluent managed storage: Confluent automatically provisions and manages AWS S3 storage. No additional setup required.
  • Customer integration: Use your own cloud storage (AWS S3, Azure Blob, or Google Cloud Storage) with full control over data location and access.

Key features

  • Zero-copy data access
  • Real-time data availability in Iceberg format
  • Automatic schema evolution
  • Query federation through watsonx.data
  • Integration with watsonx.data compute engines

Important limitations

  • Tableflow tables are read-only from external compute engines.
  • Write operations (INSERT, CREATE TABLE, UPDATE, DELETE) are not supported when querying through watsonx.data.
  • Data can only be modified by publishing messages to the source Kafka topic

For more information, see Confluent TableFlow documentation.

Next steps