Querying Cloudera tables using Presto engine
About this task
After setting up the Cloudera integration, you can query Hive tables stored in Cloudera HDFS from IBM® watsonx.data using the Presto engine. This guide covers query operations and examples.
For information about setting up the integration, see Setting up Cloudera integration with Presto engine.
Before you begin
Ensure that you have completed the setup process:
- HDFS storage component is configured in watsonx.data
- Catalog is created and associated with Presto engine
- Tables are created in Cloudera Hue editor
Procedure
After creating tables in Cloudera, they will be automatically available in watsonx.data once the catalog is configured.
-
Navigate to Query Workspace in watsonx.data.
-
Select your configured catalog.
-
Expand the schema to see available tables.
-
Query all schemas from the catalog:
SHOW SCHEMAS FROM <catalog_name>;Example output:
Schema -------------------------- default employee_db sales_db information_schema -
Query tables in a specific schema:
SHOW TABLES FROM <catalog_name>.<schema_name>;Example output:
Table ----------------- employee department salary_history -
View the structure of a table:
DESCRIBE <catalog_name>.<schema_name>.<table_name>;Example output:
Column | Type | Extra | Comment -------------|-----------------------|-------|-------- id | integer | | name | varchar | | department | varchar | | salary | decimal(10,2) | | -
Query data from Hive tables:
SELECT * FROM <catalog_name>.<schema_name>.<table_name> LIMIT 10;Example output:
+---------------+--------------+------------+-----------------+ | id | name | department | salary | +---------------+--------------+------------+-----------------+ | 1 | John Doe | IT | 75000.00 | | 2 | Jane Smith | HR | 65000.00 | | 3 | Bob Johnson | Finance | 80000.00 | +---------------+--------------+------------+-----------------+ -
Count rows in a table:
SELECT COUNT(*) FROM <catalog_name>.<schema_name>.<table_name>; -
Perform complex queries with filtering and aggregation:
SELECT department, COUNT(*) as employee_count, AVG(salary) as avg_salary FROM <catalog_name>.<schema_name>.<table_name> WHERE salary > 60000 GROUP BY department ORDER BY avg_salary DESC;
Results
You can now query Hive tables from Cloudera HDFS using Presto. The queries execute directly on the data in HDFS without copying data into watsonx.data.