Ingesting data from a local system

You can upload files from your local system and ingest them into IBM® watsonx.data by using the Spark ingestion UI.

Before you begin

  • Review the prerequisites for using the Spark ingestion UI.
  • A transient storage bucket must be configured in your watsonx.data instance for temporary file storage.
  • The maximum cumulative file size for local uploads is 2 GB and individual file sizes must not exceed 200 MB limit. For larger files, use the remote storage ingestion flow.
  • Lite ingestion of files smaller than 2 MB does not require a Spark engine. For files 2 MB or larger, you must provision a Spark engine before ingestion.
  • Only one file type is supported per ingestion job.

Supported file formats

  • CSV (Comma-Separated Values)
  • TXT
  • Parquet
  • JSON (JavaScript Object Notation)
  • Avro
  • ORC (Optimized Row Columnar)

Procedure

  1. Log in to the watsonx.data console.
  2. From the navigation menu, select Data manager.
  3. Click Ingest data.
  4. Select Local system as the ingestion flow.
  5. In the top right corner of the page, select a transient storage bucket from the Select transient storage bucket dropdown.
  6. This bucket will temporarily store your uploaded files during the ingestion process.
  7. Click Browse or drag and drop your file into the upload area.
  8. After the upload completes, the file name and size are displayed.
  9. To upload additional files of the same type, click Upload another file.
  10. Click Next to proceed to file details configuration.
  11. Review the detected file format. If incorrect, select the correct format from the File format list.
  12. Configure format-specific options:
  • For CSV and TXT files:

    • Delimiter: Specify the delimiter character (default: comma)
    • Header: Select whether the first row contains column headers
    • Infer schema: Enable to automatically detect column data types
    • Quote character: Specify the character used for quoting values (default: double quote)
    • Escape character: Specify the character used for escaping special characters (default: backslash)
  • For JSON files:

    • Multi-line: Enable if each JSON record spans multiple lines
    • Infer schema: Enable to automatically detect the schema from the JSON structure
  • For Parquet, Avro, and ORC files:

    • Schema is automatically detected from the file metadata
  1. Click Preview data to view a sample of the data with the current configuration.
  2. Verify that the data is parsed correctly. If not, adjust the configuration options.
  3. Click Next to proceed to target table configuration.
  4. See Configuring target table settings in the parent topic.
  5. See Configuring job details in the parent topic.
  6. Review the ingestion configuration summary.
  7. Click Submit to start the ingestion job.

Results

After the ingestion job completes successfully, the data from your local file is loaded into the target table. The uploaded file is stored temporarily and is automatically deleted after the ingestion job completes.

Related information