IBM Cloud API Docs

Introduction

IBM Cloud Data Engine is a cloud-native service that provides stream ingestion, data preparation, ETL, and data query from IBM Cloud Object Storage and Kafka. It also manages tables and views in a catalog that is compatible with Hive metastore and other big data engines and services can connect to it. Data Engine supports full standard ANSI SQL to submit work as serverless jobs. It is hosted on IBM Cloud and includes a publicly accessible REST API. The V3 API documentation intends to help you get going with the Data Engine API, and it offers resources on how to operationalize it.

API endpoint

https://api.dataengine.cloud.ibm.com/

Running an SQL statement

You can use the Data Engine service REST V3 API to run queries and retrieve information about their status. This is especially helpful when writing code that automatically queries data.

Data Engine provides the following REST APIs:

  1. POST request to submit a new SQL job.
  2. GET request to receive details about a specific SQL job.
  3. GET request to receive a list of submitted SQL jobs.
  4. GET request to receive a list of available tables and views.
  5. GET request to receive details about a specific table or view.
  6. GET request to receive a list of partitions belonging to a specific table.

Before you can call any of the above REST APIs, create an IAM bearer token and have your Data Engine instance Cloud Resource Name (CRN) available. You find the CRN in the Data Engine instance dashboard that provides a copy button to get it into your clipboard.

Creating an IAM bearer token

The recommended method to retrieve an IAM token programmatically is to create an API key for your IBM Cloud identity and then use the IAM token API to exchange that key for a token. Each token is valid only for one hour, and after a token expires you have to create a new one if you want to continue using the API.

You can create a token in IBM Cloud or by using the IBM Cloud command line interface (CLI).

To create a token in the IBM Cloud:

  1. Log in to IBM Cloud and select Manage > Security > Platform API Keys.
  2. Create an API key for your own personal identity, copy the key value, and save it in a secure place. After you leave the page, you will no longer be able to access this value.
  3. With your API key, set up Postman or another REST API tool and run the command to the right.
  4. Use the value of the access_token property for your Data Engine API calls. Set the access_token value as the authorization header parameter for requests to the Watson Data APIs. The format is Authorization: Bearer <access_token_value_here>. For example:
    Authorization: Bearer eyJraWQiOiIyMDE3MDgwOS0wMDowMDowMCIsImFsZyI6IlJTMjU2In0...

To create a token by using the IBM Cloud CLI:

Follow the steps to install the CLI, log in to IBM Cloud, and get the token described here.

Curl command with API key to retrieve token

        curl "https://iam.cloud.ibm.com/identity/token"         -d "apikey=YOUR_API_KEY_HERE&grant_type=urn%3Aibm%3Aparams%3Aoauth%3Agrant-type%3Aapikey"         -H "Content-Type: application/x-www-form-urlencoded"         -H "Authorization: Basic Yng6Yng="

Response

        {
        "access_token": "eyJraWQiOiIyMDE3MDgwOS0wMDowMDowMCIsImFsZyI6...",
        "refresh_token": "zmRTQFKhASUdF76Av6IUzi9dtB7ip8F2XV5fNgoRQ0mbQgD5XCeWkQhjlJ1dZi8K...",
        "token_type": "Bearer",
        "expires_in": 3600,
        "expiration": 1505865282
        }

Error handling

This API uses standard HTTP response codes to indicate if a method completed successfully:

  • A 200 or 201 type response indicates success.
  • A 400 type response indicates an error in the specified parameters.
  • A 401 type response indicates an authorization error.
  • A 404 type response indicates that a resource related to this request was not found.

Rate limiting

Rate limits for API POST requests are enforced for each instance. If the number of POST requests for an instance reaches the request limit, no further requests are accepted until one of the Data Engine jobs that are running for that instance finishes.

An HTTP status code of 429 indicates that the rate limit has been exceeded.

The number of requests that are allowed depends on your plan.

Methods

Run an SQL job

Runs a batch or streaming SQL job and stores the result either in IBM Cloud Object Storage or IBM® Db2® on Cloud. The FROM clause references rectangular data that is stored in Parquet, CSV, ORC, AVRO or JSON format in IBM Cloud Object Storage or a topic of a streaming source. For more information, see the Data Engine overview documentation.

POST /sql_jobs

Request

Query Parameters

  • The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

SQL job specification

Examples:
View
  • curl -XPOST   --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN"  -H "Content-Type: application/json"  -d '{"statement":"SELECT firstname FROM cos://us-geo/sql/employees.parquet STORED AS PARQUET WHERE EMPLOYEEID=5 INTO cos://us-geo/target-bucket/q1-results" }'

Response

Abridged information about an SQL job, including its identifier and processing status.

Status Code

  • Successful submission. Returns information about the SQL job, including its status and an identifier for future reference.

  • The request provided an invalid job specification or other invalid data. Returns details about the validation problem.

  • The request did not specify a valid bearer token for authentication.

  • You are not authorized to perform this action for the specified service instance.

  • The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.

  • This instance is currently running its maximum number of SQL jobs. A new job can be accepted only after at least one of the currently running jobs completes.

  • An internal error occurred while processing the request.

Example responses
  • {
      "job_id": "7ebed7f7-00dc-44a2-acfa-5bdb53889648",
      "status": "queued"
    }

Get information about recent SQL jobs

Returns information about recently submitted SQL jobs.

GET /sql_jobs

Request

Query Parameters

  • The type of jobs that should be listed, can be 'batch' or 'stream' jobs

    Allowable values: [batch,stream]

  • The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

  • curl -XGET   --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs?type=stream&instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 

Response

List of information about SQL jobs.

Status Code

  • Information about recently submitted SQL jobs of the requested type. The list might be empty.

  • The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.

  • The request did not specify a valid bearer token for authentication.

  • You are not authorized to perform this action for the specified service instance.

  • The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.

  • An internal error occurred while processing the request.

Example responses
  • {
      "jobs": [
        {
          "job_id": "7ebed7f7-00dc-44a2-acfa-5bdb53889648",
          "status": "completed",
          "submit_time": "2018-08-14T08:45:54.012Z",
          "user_id": "user1@ibm.com"
        },
        {
          "job_id": "ffde4c5a-1cc2-448b-b377-43573818e5d8",
          "status": "completed",
          "submit_time": "2018-08-14T08:47:33.350Z",
          "user_id": "user1@ibm.com"
        }
      ]
    }

Get information about a specific SQL job

Returns information about the specified SQL job.

GET /sql_jobs/{job_id}

Request

Path Parameters

  • ID of the SQL job for which information is to be retrieved. This ID is returned when an SQL job is submitted, and when information about recently submitted SQL jobs is requested.

Query Parameters

  • The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

  • curl -XGET   --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs/YOUR_JOB_ID?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 

Response

Full information about an SQL job, including output or error information.

Status Code

  • Status of the specified SQL job.

  • The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.

  • The request did not specify a valid bearer token for authentication.

  • You are not authorized to perform this action for the specified service instance.

  • No information was found for the specified job. Note that the system periodically deletes information about completed or failed jobs.

  • The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.

  • An internal error occurred while processing the request.

Example responses
  • {
      "job_id": "7ebed7f7-00dc-44a2-acfa-5bdb53889648",
      "status": "completed",
      "statement": "SELECT e.firstname employee, e.city FROM cos://us-geo/sql/employees.parquet STORED AS PARQUET e",
      "plan_id": "e03a38d0-5ec1-41c5-b3b3-5e081dc19c8c",
      "submit_time": "2018-08-14T08:45:54.012Z",
      "resultset_location": "cos://s3.us.cloud-object-storage.appdomain.cloud/result/test/jobid=7ebed7f7-00dc-44a2-acfa-5bdb53889648",
      "resultset_format": "parquet",
      "rows_returned": 9,
      "rows_read": 9,
      "bytes_read": 4928,
      "end_time": "2018-08-14T08:46:01.516Z",
      "user_id": "user1@ibm.com"
    }

Stops a specific SQL streaming job

Stops the execution of a specific SQL streaming job.

PUT /sql_jobs/{job_id}/stop

Request

Path Parameters

  • ID of the SQL streaming job that is to be stopped. This ID is returned when an SQL job is submitted, and when information about recently submitted SQL jobs is requested.

Query Parameters

  • The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

  • curl -XPUT   --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs/YOUR_JOB_ID/stop?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 

Response

Status Code

  • The specified job could be stopped successfully.

  • The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.

  • The request did not specify a valid bearer token for authentication.

  • You are not authorized to perform this action for the specified service instance.

  • No information was found for the specified job. Note that the system periodically deletes information about completed or failed jobs.

  • The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.

  • An internal error occurred while processing the request.

No Sample Response

This method does not specify any sample responses.

List catalog tables

Retrieve a list of the first 100 tables that are defined for the given instance in the catalog.

GET /tables

Request

Query Parameters

  • The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

  • A table name pattern for filtering the tables that should be listed. The pattern follows Hive syntax conventions and can include asterisks as wildcards and vertical bars to separate alternatives.

  • A table type for filtering the tables that should be listed, can be "table" or "view".

  • curl -XGET   --url "https://api.dataengine.cloud.ibm.com/v3/tables?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 

Response

List of catalog tables.

Status Code

  • Names of defined catalog tables. The list might be empty.

  • The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.

  • The request did not specify a valid bearer token for authentication.

  • You are not authorized to perform this action for the specified service instance.

  • The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.

  • The client has submitted too many requests for catalog information. Retry the request after a short wait time.

  • An internal error occurred while processing the request.

Example responses
  • {
      "tables": [
        "employees",
        "customers",
        "products",
        "orders"
      ]
    }

Get information about a specific catalog table

Returns information about the specified catalog table.

GET /tables/{table_name}

Request

Path Parameters

  • Name of the catalog table for which information is to be retrieved. Table names are case-insensitive and must only contain alphabetic and numeral characters, and underscore (_).

Query Parameters

  • The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

  • curl -XGET   --url "https://api.dataengine.cloud.ibm.com/v3/tables/YOUR_TABLE_NAME?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 

Response

Detailed information about a catalog table.

Status Code

  • Information about the specified table.

  • The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.

  • The request did not specify a valid bearer token for authentication.

  • You are not authorized to perform this action for the specified service instance.

  • The specified table is not in the catalog

  • The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.

  • The client has submitted too many requests for catalog information. Retry the request after a short wait time.

  • An internal error occurred while processing the request.

Example responses
  • {
      "name": "employees",
      "type\"": "TABLE",
      "creation_time": "2022-03-15T14:44:29.000Z",
      "data_format": "CSV",
      "location": "cos://sql-0fd3de82-d91f-42a7-b460-d2e2c319ee88.us-geo/employees.csv",
      "partitioning_columns\"": [
        "city"
      ],
      "columns": [
        {
          "name": "employeeID",
          "type": "integer",
          "nullable": true
        },
        {
          "name": "lastName",
          "type": "string",
          "nullable": true
        },
        {
          "name": "firstName",
          "type": "string",
          "nullable": true
        },
        {
          "name": "birthDate",
          "type": "timestamp",
          "nullable": true
        },
        {
          "name": "hireDate",
          "type": "timestamp",
          "nullable": true
        },
        {
          "name": "city",
          "type": "string",
          "nullable": true
        }
      ]
    }

List partitions of the catalog table

Retrieve the list of partitions of the specified catalog table.

GET /tables/{table_name}/partitions

Request

Path Parameters

  • Name of the catalog table for which information is to be retrieved. Table names are case-insensitive and must only contain alphabetic and numeral characters, and underscore (_).

Query Parameters

  • The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

  • curl -XGET   --url "https://api.dataengine.cloud.ibm.com/v3/tables/YOUR_TABLE_NAME/partitions?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 

Response

List of table partitions.

Status Code

  • List of the partitions for the specified table. The list might be empty.

  • The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.

  • The request did not specify a valid bearer token for authentication.

  • You are not authorized to perform this action for the specified service instance.

  • The specified table is not in the catalog

  • The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.

  • The client has submitted too many requests for catalog information. Retry the request after a short wait time.

  • An internal error occurred while processing the request.

Example responses
  • {
      "partitions": [
        "country=America/customerID=1",
        "country=America/customerID=2",
        "country=Spain/customerID=1",
        "country=Spain/customerID=2"
      ]
    }