IBM Cloud Docs
Merging and querying data from watsonx.data

Merging and querying data from watsonx.data

Learn how to query and merge data from a NPSaaS and a data lake table.

Before you begin

In the examples, the publicly available New York taxi trip record data for yellow taxis in January 2021 and 2022 is used. To follow this example, make sure that the data is in an accessible S3 bucket and the table was loaded into watsonx.data into an Apache Iceberg table in the Hive Metastore server (HMS).

Run a cross database query with the table in the data lake database.

  • To identify which year had the most passengers, run:

    Example:

    LOCALDB.ADMIN(ADMIN)=> select ( select sum(PASSENGER_COUNT) from YELLOW_TAXI_JANUARY_2022_LOADED) as "passengers 2022",( select sum(PASSENGER_COUNT) from MYLAKE.TAXIDATA.YELLOW_TAXI_JANUARY_2021) as "passengers 2021";
    

    Output:

    passengers 2022 | passengers 2021
    
    -----------------+-----------------
    
            3324167 |         1794615
    
    (1 row)
    
  • To compare how many passengers travelled between 1:00 AM and 6:00 PM in 2021 and 2022, run:

    Example:

    LOCALDB.ADMIN(ADMIN)=> select(select sum(PASSENGER_COUNT) from YELLOW_TAXI_JANUARY_2022_LOADED where TPEP_PICKUP_DATETIME::time > '1:00am' and TPEP_PICKUP_DATETIME::time < '6:00am') as "overnight passengers 2022", (select sum(PASSENGER_COUNT) from MYLAKE.TAXIDATA.YELLOW_TAXI_JANUARY_2021 where TPEP_PICKUP_DATETIME::time > '1:00am' and TPEP_PICKUP_DATETIME::time < '6:00am') as "overnight passengers 2021";
    

    Output:

    overnight passengers 2022 | overnight passengers 2021
    
    ---------------------------+---------------------------
    
                       163058 |                     33469
    
    (1 row)