Data Engine | IBM Cloud API Docs

Introduction

Last updated: 2021-06-08

IBM Cloud Data Engine is a cloud-native service that provides stream ingestion, data preparation, ETL, and data query from IBM Cloud Object Storage and Kafka. It also manages tables and views in a catalog that is compatible with Hive metastore and other big data engines and services can connect to it. Data Engine supports full standard ANSI SQL to submit work as serverless jobs. It is hosted on IBM Cloud and includes a publicly accessible REST API. The V3 API documentation intends to help you get going with the Data Engine API, and it offers resources on how to operationalize it.

API endpoint

https://api.dataengine.cloud.ibm.com/

Running an SQL statement

You can use the Data Engine service REST V3 API to run queries and retrieve information about their status. This is especially helpful when writing code that automatically queries data.

Data Engine provides the following REST APIs:

POST request to submit a new SQL job.
GET request to receive details about a specific SQL job.
GET request to receive a list of submitted SQL jobs.
GET request to receive a list of available tables and views.
GET request to receive details about a specific table or view.
GET request to receive a list of partitions belonging to a specific table.

Before you can call any of the above REST APIs, create an IAM bearer token and have your Data Engine instance Cloud Resource Name (CRN) available. You find the CRN in the Data Engine instance dashboard that provides a copy button to get it into your clipboard.

Creating an IAM bearer token

The recommended method to retrieve an IAM token programmatically is to create an API key for your IBM Cloud identity and then use the IAM token API to exchange that key for a token. Each token is valid only for one hour, and after a token expires you have to create a new one if you want to continue using the API.

You can create a token in IBM Cloud or by using the IBM Cloud command line interface (CLI).

To create a token in the IBM Cloud:

Log in to IBM Cloud and select Manage > Security > Platform API Keys.
Create an API key for your own personal identity, copy the key value, and save it in a secure place. After you leave the page, you will no longer be able to access this value.
With your API key, set up Postman or another REST API tool and run the command to the right.
Use the value of the access_token property for your Data Engine API calls. Set the access_token value as the authorization header parameter for requests to the Watson Data APIs. The format is Authorization: Bearer <access_token_value_here>. For example:
Authorization: Bearer eyJraWQiOiIyMDE3MDgwOS0wMDowMDowMCIsImFsZyI6IlJTMjU2In0...

To create a token by using the IBM Cloud CLI:

Follow the steps to install the CLI, log in to IBM Cloud, and get the token described here.

Curl command with API key to retrieve token

        curl "https://iam.cloud.ibm.com/identity/token"         -d "apikey=YOUR_API_KEY_HERE&grant_type=urn%3Aibm%3Aparams%3Aoauth%3Agrant-type%3Aapikey"         -H "Content-Type: application/x-www-form-urlencoded"         -H "Authorization: Basic Yng6Yng="

Response

        {
        "access_token": "eyJraWQiOiIyMDE3MDgwOS0wMDowMDowMCIsImFsZyI6...",
        "refresh_token": "zmRTQFKhASUdF76Av6IUzi9dtB7ip8F2XV5fNgoRQ0mbQgD5XCeWkQhjlJ1dZi8K...",
        "token_type": "Bearer",
        "expires_in": 3600,
        "expiration": 1505865282
        }
Copy to clipboard

Error handling

This API uses standard HTTP response codes to indicate if a method completed successfully:

A 200 or 201 type response indicates success.
A 400 type response indicates an error in the specified parameters.
A 401 type response indicates an authorization error.
A 404 type response indicates that a resource related to this request was not found.

Rate limiting

Rate limits for API POST requests are enforced for each instance. If the number of POST requests for an instance reaches the request limit, no further requests are accepted until one of the Data Engine jobs that are running for that instance finishes.

An HTTP status code of 429 indicates that the rate limit has been exceeded.

The number of requests that are allowed depends on your plan.

Resources

Runs a batch or streaming SQL job and stores the result either in IBM Cloud Object Storage or IBM® Db2® on Cloud. The FROM clause references rectangular data that is stored in Parquet, CSV, ORC, AVRO or JSON format in IBM Cloud Object Storage or a topic of a streaming source. For more information, see the Data Engine overview documentation.

POST /sql_jobs

Request

Query Parameters

instance_crn
Required*
string
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

Request Body

Required*

sql_job_specification

SQL job specification

Examples:

Example request

curl -XPOST   --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN"  -H "Content-Type: application/json"  -d '{"statement":"SELECT firstname FROM cos://us-geo/sql/employees.parquet STORED AS PARQUET WHERE EMPLOYEEID=5 INTO cos://us-geo/target-bucket/q1-results" }'
Copy to clipboard

Response

Response Body

sql_job_info_short

Abridged information about an SQL job, including its identifier and processing status.

Status Code

201
Successful submission. Returns information about the SQL job, including its status and an identifier for future reference.
400
The request provided an invalid job specification or other invalid data. Returns details about the validation problem.
401
The request did not specify a valid bearer token for authentication.
403
You are not authorized to perform this action for the specified service instance.
406
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
429
This instance is currently running its maximum number of SQL jobs. A new job can be accepted only after at least one of the currently running jobs completes.
500
An internal error occurred while processing the request.

Example responses

Status 201

{
  "job_id": "7ebed7f7-00dc-44a2-acfa-5bdb53889648",
  "status": "queued"
}
Copy to clipboard

Returns information about recently submitted SQL jobs.

GET /sql_jobs

Request

Query Parameters

type
Required*
string
The type of jobs that should be listed, can be 'batch' or 'stream' jobs

Allowable values: [batch,stream]
instance_crn
Required*
string
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

Example request

curl -XGET   --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs?type=stream&instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 
Copy to clipboard

Response

Response Body

sql_job_info_list

List of information about SQL jobs.

Status Code

200
Information about recently submitted SQL jobs of the requested type. The list might be empty.
400
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
401
The request did not specify a valid bearer token for authentication.
403
You are not authorized to perform this action for the specified service instance.
406
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
500
An internal error occurred while processing the request.

Example responses

Status 200

{
  "jobs": [
    {
      "job_id": "7ebed7f7-00dc-44a2-acfa-5bdb53889648",
      "status": "completed",
      "submit_time": "2018-08-14T08:45:54.012Z",
      "user_id": "user1@ibm.com"
    },
    {
      "job_id": "ffde4c5a-1cc2-448b-b377-43573818e5d8",
      "status": "completed",
      "submit_time": "2018-08-14T08:47:33.350Z",
      "user_id": "user1@ibm.com"
    }
  ]
}
Copy to clipboard

Returns information about the specified SQL job.

GET /sql_jobs/{job_id}

Request

Path Parameters

job_id
Required*
string
ID of the SQL job for which information is to be retrieved. This ID is returned when an SQL job is submitted, and when information about recently submitted SQL jobs is requested.

Query Parameters

instance_crn
Required*
string
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

Example request

curl -XGET   --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs/YOUR_JOB_ID?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 
Copy to clipboard

Response

Response Body

sql_job_info_full

Full information about an SQL job, including output or error information.

Status Code

200
Status of the specified SQL job.
400
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
401
The request did not specify a valid bearer token for authentication.
403
You are not authorized to perform this action for the specified service instance.
404
No information was found for the specified job. Note that the system periodically deletes information about completed or failed jobs.
406
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
500
An internal error occurred while processing the request.

Example responses

Status 200

{
  "job_id": "7ebed7f7-00dc-44a2-acfa-5bdb53889648",
  "status": "completed",
  "statement": "SELECT e.firstname employee, e.city FROM cos://us-geo/sql/employees.parquet STORED AS PARQUET e",
  "plan_id": "e03a38d0-5ec1-41c5-b3b3-5e081dc19c8c",
  "submit_time": "2018-08-14T08:45:54.012Z",
  "resultset_location": "cos://s3.us.cloud-object-storage.appdomain.cloud/result/test/jobid=7ebed7f7-00dc-44a2-acfa-5bdb53889648",
  "resultset_format": "parquet",
  "rows_returned": 9,
  "rows_read": 9,
  "bytes_read": 4928,
  "end_time": "2018-08-14T08:46:01.516Z",
  "user_id": "user1@ibm.com"
}
Copy to clipboard

Stops the execution of a specific SQL streaming job.

PUT /sql_jobs/{job_id}/stop

Request

Path Parameters

job_id
Required*
string
ID of the SQL streaming job that is to be stopped. This ID is returned when an SQL job is submitted, and when information about recently submitted SQL jobs is requested.

Query Parameters

instance_crn
Required*
string
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

Example request

curl -XPUT   --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs/YOUR_JOB_ID/stop?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 
Copy to clipboard

Response

Status Code

200
The specified job could be stopped successfully.
400
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
401
The request did not specify a valid bearer token for authentication.
403
You are not authorized to perform this action for the specified service instance.
404
No information was found for the specified job. Note that the system periodically deletes information about completed or failed jobs.
406
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
500
An internal error occurred while processing the request.

No Sample Response

This method does not specify any sample responses.

Retrieve a list of the first 100 tables that are defined for the given instance in the catalog.

GET /tables

Request

Query Parameters

instance_crn
Required*
string
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".
name_pattern
string
A table name pattern for filtering the tables that should be listed. The pattern follows Hive syntax conventions and can include asterisks as wildcards and vertical bars to separate alternatives.
type
string
A table type for filtering the tables that should be listed, can be "table" or "view".

Example request

curl -XGET   --url "https://api.dataengine.cloud.ibm.com/v3/tables?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 
Copy to clipboard

Response

Response Body

table_list

List of catalog tables.

Status Code

200
Names of defined catalog tables. The list might be empty.
400
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
401
The request did not specify a valid bearer token for authentication.
403
You are not authorized to perform this action for the specified service instance.
406
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
429
The client has submitted too many requests for catalog information. Retry the request after a short wait time.
500
An internal error occurred while processing the request.

Example responses

Status 200

{
  "tables": [
    "employees",
    "customers",
    "products",
    "orders"
  ]
}
Copy to clipboard

Returns information about the specified catalog table.

GET /tables/{table_name}

Request

Path Parameters

table_name
Required*
string
Name of the catalog table for which information is to be retrieved. Table names are case-insensitive and must only contain alphabetic and numeral characters, and underscore (_).

Query Parameters

instance_crn
Required*
string
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

Example request

curl -XGET   --url "https://api.dataengine.cloud.ibm.com/v3/tables/YOUR_TABLE_NAME?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 
Copy to clipboard

Response

Response Body

table_information

Detailed information about a catalog table.

Status Code

200
Information about the specified table.
400
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
401
The request did not specify a valid bearer token for authentication.
403
You are not authorized to perform this action for the specified service instance.
404
The specified table is not in the catalog
406
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
429
The client has submitted too many requests for catalog information. Retry the request after a short wait time.
500
An internal error occurred while processing the request.

Example responses

Status 200

{
  "name": "employees",
  "type\"": "TABLE",
  "creation_time": "2022-03-15T14:44:29.000Z",
  "data_format": "CSV",
  "location": "cos://sql-0fd3de82-d91f-42a7-b460-d2e2c319ee88.us-geo/employees.csv",
  "partitioning_columns\"": [
    "city"
  ],
  "columns": [
    {
      "name": "employeeID",
      "type": "integer",
      "nullable": true
    },
    {
      "name": "lastName",
      "type": "string",
      "nullable": true
    },
    {
      "name": "firstName",
      "type": "string",
      "nullable": true
    },
    {
      "name": "birthDate",
      "type": "timestamp",
      "nullable": true
    },
    {
      "name": "hireDate",
      "type": "timestamp",
      "nullable": true
    },
    {
      "name": "city",
      "type": "string",
      "nullable": true
    }
  ]
}
Copy to clipboard

Retrieve the list of partitions of the specified catalog table.

GET /tables/{table_name}/partitions

Request

Path Parameters

table_name
Required*
string
Name of the catalog table for which information is to be retrieved. Table names are case-insensitive and must only contain alphabetic and numeral characters, and underscore (_).

Query Parameters

instance_crn
Required*
string
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".

Example request

curl -XGET   --url "https://api.dataengine.cloud.ibm.com/v3/tables/YOUR_TABLE_NAME/partitions?instance_crn=YOUR_DATAENGINE_CRN"  -H "Accept: application/json"  -H "Authorization: Bearer YOUR_BEARER_TOKEN" 
Copy to clipboard

Response

Response Body

partition_list

List of table partitions.

Status Code

200
List of the partitions for the specified table. The list might be empty.
400
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
401
The request did not specify a valid bearer token for authentication.
403
You are not authorized to perform this action for the specified service instance.
404
The specified table is not in the catalog
406
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
429
The client has submitted too many requests for catalog information. Retry the request after a short wait time.
500
An internal error occurred while processing the request.

Example responses

Status 200

{
  "partitions": [
    "country=America/customerID=1",
    "country=America/customerID=2",
    "country=Spain/customerID=1",
    "country=Spain/customerID=2"
  ]
}
Copy to clipboard

Introduction

Running an SQL statement

Error handling

Rate limiting

Resources

Methods

Run an SQL job

Request

Query Parameters

instance_crn

Request Body

statement

Response

Response Body

job_id

status

user_id

submit_time

last_change_time

has_hints

Status Code

201

400

401

403

406

429

500

Get information about recent SQL jobs

Request

Query Parameters

type

instance_crn

Response

Response Body

jobs

Status Code

200

400

401

403

406

500

Get information about a specific SQL job

Request

Path Parameters

job_id

Query Parameters

instance_crn

Response

Response Body

job_id

status

user_id

submit_time

statement

last_change_time

plan_id

resultset_format

resultset_location

end_time

rows_returned

rows_read

bytes_read

objects_skipped

objects_qualified

rows_per_second

last_activity_time

is_behind

error

error_message

hints

Status Code

200

400

401

403

404

406

500