IBM Cloud Docs
About data ingestion

About data ingestion

Data ingestion is the process of importing and loading data into IBM® watsonx.data. You can use the Create table option from the Data manager page to load local or external sources of data files to create tables.

When you ingest a data file into the watsonx.data, the table schema is generated and inferred when a query is run. Data ingestion in watsonx.data supports CSV and Parquet formats. The files to be ingested must be of the same format type and same schema. watsonx.data auto-discovers the schema based on the source file being ingested.

Following are some of the requirements or limitations of the ibm-lh tool:

  • Schema evolution is not supported.
  • Target table must be an iceberg format table.
  • Partitioning is not supported.
  • IBM Storage Ceph, IBM Cloud Object Storage (COS), AWS S3, and MinIO object storage are supported.
  • pathStyleAccess property for object storage is not supported.
  • Only Parquet and CSV file formats are supported as source data files.

Loading or ingesting data through CLI

An ingestion job in watsonx.data can be run with the ibm-lh tool. The tool must be pulled from the ibm-lh-client and installed in the local system to run the ingestion job through the CLI. For more details and instructions to install ibm-lh-client package and use the ibm-lh tool for ingestion, see Installing ibm-lh-client and Setting up the ibm-lh command-line utility.

The ibm-lh tool supports the following features:

  • Auto-discovery of schema based on the source file or target table.

  • Advanced table configuration options for the CSV files:

    • Delimiter
    • Header
    • File encoding
    • Line delimiter
    • Escape characters
  • Ingestion of a single, multiple file(s), or a single folder (no sub folders) of S3 and local Parquet file(s).

  • Ingestion of a single, multiple file(s), or a single folder (no sub folders) of S3 and local CSV file(s).