IBM Cloud Docs
Registering external data into watsonx.data

Registering external data into watsonx.data

If you have pre-existing data (such as Iceberg, Delta, or Hudi tables) in an object store bucket, you can register it into IBM® watsonx.data and use it for running queries. To enable this feature, you must attach the appropriate catalog to the storage.

You can register tables in all three formats. For Iceberg tables, you can register pre-existing data at the bucket level. For Delta and Hudi tables, registration is currently supported only at the table level.

If external changes occur on Iceberg tables through other systems, you may need to sync the data on the watsonx.data side. To facilitate this, you can use sync feature.

For Hudi and Delta tables, explicit sync is unnecessary because the metadata pointer refers to the metadata folder, not an individual metadata file. (For example, Iceberg requires referencing the latest metadata.json file.)

Registering and syncing external Iceberg data

To register and sync external Iceberg data into watsonx.data, complete the following steps:

  1. Add a storage and associate it to the Apache Iceberg catalog, see Adding storage.
  2. To pull the changed data in a storage bucket in watsonx.data, go to the Infrastructure manager page, hover over the Apache Iceberg catalog and click Sync metadata. You can see three options to select the Mode and the corresponding possibility for metadata loss. The following are the three sync options:
  • Register new objects only: Schemas, tables, and metadata that are created by external applications since the last sync operations are added to this catalog. Existing schemas and tables in this catalog are not modified.
  • Update existing objects only: Schemas, tables, and metadata already present in this catalog are updated or deleted to match the current state found in the associated bucket. Any other schemas, tables, and metadata in the associated bucket are ignored.
  • Sync all objects: Schemas, tables, and metadata already present in this catalog are updated to match the exact state of the associated bucket. All the new objects are added and all the existing objects are updated or removed.

For information on related API, see External Iceberg table registration.

Registering external Hudi and Delta Lake data

To register external Hudi and Delta Lake data into watsonx.data, complete the following steps:

  1. Add a storage and based on the type of table format, you can select one of the following Catalog type. See Adding storage.

    • Apache Hudi
    • Delta Lake
  2. You can register and load table using Register table and load table metadata APIs.

    To register the tables, you must provide the exact location of the metatdata folder. The schema is inferred based on the path in the location url.