Querying Confluent Tableflow using Presto engine
About this task
You can query remote Confluent Tableflow tables using the IBM® watsonx.data Presto engine by registering Tableflow as a custom data source for zero-copy data federation.
Presto does not support vended-credentials. You must use provider-integrated storage (AWS S3, Azure Blob, or Google Cloud Storage) with explicit credentials. Confluent managed storage is not supported with Presto.
Before you begin
- Confluent requirements:
- Active Confluent Cloud account
- Kafka cluster with Tableflow-enabled topics using provider-integrated storage
- Tableflow API key and secret
- REST Catalog endpoint
- Table information requirements:
- List of Kafka topic names with Tableflow enabled
- Kafka cluster ID (namespace) where topics are located
- watsonx.data requirements:
- Provisioned Presto engine
- Network connectivity to Confluent Cloud endpoints
- Storage requirements:
- AWS S3, Azure Blob Storage, or Google Cloud Storage configured as Tableflow storage
- Storage access credentials (access key and secret key for AWS S3)
Procedure
-
Register Tableflow as a custom data source for remote lakehouse access.
-
In the watsonx.data console, click Infrastructure manager.
-
Click Add component > Add data source.
-
Select Custom as the data source type.
-
Enter a display name (e.g.,
confluent_tableflow). -
In the Properties section, add the following properties:
connector.name=iceberg iceberg.catalog.type=REST iceberg.rest.uri=https://tableflow.{CLOUD_REGION}.aws.confluent.cloud/iceberg/catalog/organizations/{ORG_ID}/environments/{ENV_ID} iceberg.rest.auth.type=OAUTH2 iceberg.rest.auth.oauth2.credential={APIKEY}:{SECRET} hive.s3.aws-access-key={S3_ACCESS_KEY} hive.s3.aws-secret-key={S3_SECRET_KEY}Replace the placeholders:
{CLOUD_REGION}: Your Confluent cluster region (e.g.,us-east-1){ORG_ID}: Your Confluent organization ID{ENV_ID}: Your Confluent environment ID{APIKEY}:{SECRET}: Your Tableflow API credentials{S3_ACCESS_KEY},{S3_SECRET_KEY}: Your S3 access credentials
-
Click Create.
-
-
Create a catalog for the data source.
- Click Data manager > Catalogs.
- Click Create catalog.
- Select Iceberg as the catalog type.
- Enter a catalog name (e.g.,
confluent_catalog). - In the Data source field, select the custom data source you created (
confluent_tableflow). - Click Create.
-
Associate the catalog with Presto engine.
- Click Infrastructure manager.
- Select your Presto engine from the list.
- Click Associate catalog.
- Select the catalog you created (
confluent_catalog). - Click Associate.
The catalog is now available for querying remote data through the Presto engine.
-
Query Tableflow tables
-
Click Query workspace.
-
Select your Presto engine from the engine dropdown.
-
Run queries against your remote Tableflow tables using fully qualified table names:
-- List available schemas (Kafka cluster IDs) SHOW SCHEMAS IN confluent_catalog; -- Describe table structure DESCRIBE confluent_catalog."{namespace}".{table_name}; -- Query data SELECT * FROM confluent_catalog."{namespace}".{table_name} LIMIT 10; -- Get row count SELECT COUNT(*) FROM confluent_catalog."{namespace}".{table_name};
Namespace names (Kafka cluster IDs) often contain hyphens and must be enclosed in double quotes.
-
Results
You can now query real-time data from Confluent Tableflow using Presto. The tables automatically reflect new messages published to Kafka topics.
Example queries
-- List schemas
SHOW SCHEMAS IN confluent_catalog;
-- Result:
-- lkc-5g8orq
-- Query a table
SELECT
my_field1,
my_field2,
my_field3,
"$$timestamp"
FROM confluent_catalog."lkc-5g8orq".topic_0
LIMIT 5;
Limitations
- Presto cannot automatically discover Tableflow tables.
- Tables do not appear in Data Manager or with the
SHOW TABLEScommand. - You must know the exact Kafka topic names and cluster ID to query them.
- Presto requires customer-managed storage (AWS S3, Azure Blob Storage, or Google Cloud Storage) with explicit credentials. Platform-managed storage is not supported.