Getting started with your query engine
After successfully provisioning the Presto engine, kickstart your watsonx.data experience by exploring the capabilities of Presto by running test queries, creating your first schema, and tables.
In this tutorial, you learn to execute some test queries to understand the capabilities of Presto and create your first schema to establish a foundational structure for organizing your data. When your schema is in place, you can proceed to create your table. Tables are essential components for storing and organizing data.
Sample Scenario : To understand:
- Querying tables by using Presto in watsonx.data
- About the data available for kick start
- To learn how to create schemas and tables to store, organize and query data.
Objective
- Running test queries
- Creating your first schema and table
Before you begin
This tutorial requires:
- Subscribe to watsonx.data on IBM Cloud.
- Provision your watsonx.data instance. For more information, see Provision an instance.
Running test queries
This section of the tutorial describes how to use the Query workspace, and run test queries to familiarize with the working of the Presto engine.
To run the SQL query, do the following steps:
- From the navigation menu, select SQL. The Query workspace page opens.
- Select the Presto engine from the Engine list.
- Go to System and benchmarking data. watsonx.data provides default catalogs tpcds, and tpch. These catalogs provide auto-generated data for benchmarking. You can also use it for running sample queries and data.
- Select a schema from the tpch catalog (for example,
Tiny
) and then select a table, for example,Customer
. Click the overflow menu to select Generate Path, or Generate SELECT. - Select Generate Select and select a limit, for example 100.
- Click Run on to run the query.
- Select the Result set or Details tab to view the results. If required, you can save the query.
Saving data
- From the navigation menu, select Data manager. The Data manager page opens.
- Select the iceberg_data catalog. The catalog is available by default.
- Click Create. Select Create schema to create a schema under the iceberg_data catalog. For more information, see Creating schema.
- Give your schema a name, for example
new_schema
. - To store data, you must create tables inside the schema. Use one of the following options to create a table:
-
Option 1: You can create a table (with the default data that is available in the tpcds catalog) by running an SQL query.
a. Go to the Query workspace page.
b. Run Create table query. A sample query is shown:
CREATE TABLE IF NOT EXISTS "iceberg_data"."new_schema"."new_table" AS SELECT * FROM "tpcds"."sf1"."catalog_returns" LIMIT 100;
The SQL query creates a table named,
new_table
by usingnew_schema
, within theiceberg_data catalog
. It also loads data from thecatalog_returns
table, which uses thesf1
schema in thetpcds
catalog.You can go to the iceberg_data catalog and run a select query statement to verify if
new_table
has data in it. -
Option 2: You can create a table by uploading your own data (.csv, parquet, .json, .txt format).
a. Go to the Data manager page.
b. You can create table with your data in .csv, parquet, .json, .txt formats. To do that, see Creating table.
You can go to the Query workspace page. Select the newly created table from the iceberg_data catalog and run a select query statement to verify if the table has data in it.
Querying data from the table created
You can run a Generate SELECT query from the new table that is created within the iceberg_data catalog. To do that:
- From the navigation menu, go to the Query workspace page.
- Select the table,
new_table
within the schema,new_schema
inside the catalogiceberg_data
. - Select Generate SELECT.
- Edit the limit to 100 as shown in the example:
SELECT * FROM "iceberg_data"."new_schema"."new_table" LIMIT 100;
- Click the Run on to run the query.
- Select the Result set or Details tab to view the results. You can also save the query.