Introduction
IBM Cloud Data Engine is a cloud-native service that provides stream ingestion, data preparation, ETL, and data query from IBM Cloud Object Storage and Kafka. It also manages tables and views in a catalog that is compatible with Hive metastore and other big data engines and services can connect to it. Data Engine supports full standard ANSI SQL to submit work as serverless jobs. It is hosted on IBM Cloud and includes a publicly accessible REST API. The V3 API documentation intends to help you get going with the Data Engine API, and it offers resources on how to operationalize it.
API endpoint
https://api.dataengine.cloud.ibm.com/
Running an SQL statement
You can use the Data Engine service REST V3 API to run queries and retrieve information about their status. This is especially helpful when writing code that automatically queries data.
Data Engine provides the following REST APIs:
- POST request to submit a new SQL job.
- GET request to receive details about a specific SQL job.
- GET request to receive a list of submitted SQL jobs.
- GET request to receive a list of available tables and views.
- GET request to receive details about a specific table or view.
- GET request to receive a list of partitions belonging to a specific table.
Before you can call any of the above REST APIs, create an IAM bearer token and have your Data Engine instance Cloud Resource Name (CRN) available. You find the CRN in the Data Engine instance dashboard that provides a copy button to get it into your clipboard.
Creating an IAM bearer token
The recommended method to retrieve an IAM token programmatically is to create an API key for your IBM Cloud identity and then use the IAM token API to exchange that key for a token. Each token is valid only for one hour, and after a token expires you have to create a new one if you want to continue using the API.
You can create a token in IBM Cloud or by using the IBM Cloud command line interface (CLI).
To create a token in the IBM Cloud:
- Log in to IBM Cloud and select Manage > Security > Platform API Keys.
- Create an API key for your own personal identity, copy the key value, and save it in a secure place. After you leave the page, you will no longer be able to access this value.
- With your API key, set up Postman or another REST API tool and run the command to the right.
- Use the value of the
access_token
property for your Data Engine API calls. Set theaccess_token
value as the authorization header parameter for requests to the Watson Data APIs. The format isAuthorization: Bearer <access_token_value_here>
. For example:
Authorization: Bearer eyJraWQiOiIyMDE3MDgwOS0wMDowMDowMCIsImFsZyI6IlJTMjU2In0...
To create a token by using the IBM Cloud CLI:
Follow the steps to install the CLI, log in to IBM Cloud, and get the token described here.
Curl command with API key to retrieve token
curl "https://iam.cloud.ibm.com/identity/token" -d "apikey=YOUR_API_KEY_HERE&grant_type=urn%3Aibm%3Aparams%3Aoauth%3Agrant-type%3Aapikey" -H "Content-Type: application/x-www-form-urlencoded" -H "Authorization: Basic Yng6Yng="
Response
{
"access_token": "eyJraWQiOiIyMDE3MDgwOS0wMDowMDowMCIsImFsZyI6...",
"refresh_token": "zmRTQFKhASUdF76Av6IUzi9dtB7ip8F2XV5fNgoRQ0mbQgD5XCeWkQhjlJ1dZi8K...",
"token_type": "Bearer",
"expires_in": 3600,
"expiration": 1505865282
}
Error handling
This API uses standard HTTP response codes to indicate if a method completed successfully:
- A 200 or 201 type response indicates success.
- A 400 type response indicates an error in the specified parameters.
- A 401 type response indicates an authorization error.
- A 404 type response indicates that a resource related to this request was not found.
Rate limiting
Rate limits for API POST requests are enforced for each instance. If the number of POST requests for an instance reaches the request limit, no further requests are accepted until one of the Data Engine jobs that are running for that instance finishes.
An HTTP status code of 429 indicates that the rate limit has been exceeded.
The number of requests that are allowed depends on your plan.
Resources
- Getting started tutorial
- How to run basic queries
- How to use the Data Engine REST API
- How to operationalize SQL code and call it from an application
- How to connect to IBM Cloud Object Storage from the command Line
- Sample Python client for interacting with the Data Engine APIs
- Example using cloud functions with Data Engine APIs
Methods
Run an SQL job
Runs a batch or streaming SQL job and stores the result either in IBM Cloud Object Storage or IBM® Db2® on Cloud. The FROM
clause references rectangular data that is stored in Parquet, CSV, ORC, AVRO or JSON format in IBM Cloud Object Storage or a topic of a streaming source. For more information, see the Data Engine overview documentation.
POST /sql_jobs
Request
Query Parameters
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".
SQL job specification
This is the SQL statement to be submitted. The table names specified in the FROM clause must correspond to objects in one or more IBM Cloud Object Storage instances or to table definitions specified in the database catalog. The INTO clause of the query indicates the endpoint, bucket, and (optionally) subfolder in IBM Cloud Object Storage or a Db2 table URI, in which the query result is to be stored. Alternatively, a Database Definition Language statement for table or index management can be specified. Within the specified target, each result is stored in a separate subfolder with a name that indicates the job ID. The job ID of a query is returned by the GET endpoint.
Possible values: length ≥ 1
curl -XPOST --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs?instance_crn=YOUR_DATAENGINE_CRN" -H "Accept: application/json" -H "Authorization: Bearer YOUR_BEARER_TOKEN" -H "Content-Type: application/json" -d '{"statement":"SELECT firstname FROM cos://us-geo/sql/employees.parquet STORED AS PARQUET WHERE EMPLOYEEID=5 INTO cos://us-geo/target-bucket/q1-results" }'
Response
Abridged information about an SQL job, including its identifier and processing status.
Identifier for an SQL job.
Example:
637a0b7d-069d-453f-a418-a35d4db3ea64
Execution status of an SQL job.
Possible values: [
queued
,running
,stopping
,stopped
,completed
,failed
]ID of the user who submitted an SQL job.
Example:
test_user@my.org
Timestamp indicating when an SQL job was accepted by the service.
Timestamp indicating when the status of an SQL job changed to its current value.
Boolean indicating when an SQL job has an improvement hint.
Status Code
Successful submission. Returns information about the SQL job, including its status and an identifier for future reference.
The request provided an invalid job specification or other invalid data. Returns details about the validation problem.
The request did not specify a valid bearer token for authentication.
You are not authorized to perform this action for the specified service instance.
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
This instance is currently running its maximum number of SQL jobs. A new job can be accepted only after at least one of the currently running jobs completes.
An internal error occurred while processing the request.
{ "job_id": "7ebed7f7-00dc-44a2-acfa-5bdb53889648", "status": "queued" }
Get information about recent SQL jobs
Returns information about recently submitted SQL jobs.
GET /sql_jobs
Request
Query Parameters
The type of jobs that should be listed, can be 'batch' or 'stream' jobs
Allowable values: [
batch
,stream
]The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".
curl -XGET --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs?type=stream&instance_crn=YOUR_DATAENGINE_CRN" -H "Accept: application/json" -H "Authorization: Bearer YOUR_BEARER_TOKEN"
Response
List of information about SQL jobs.
The SQL jobs.
Status Code
Information about recently submitted SQL jobs of the requested type. The list might be empty.
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
The request did not specify a valid bearer token for authentication.
You are not authorized to perform this action for the specified service instance.
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
An internal error occurred while processing the request.
{ "jobs": [ { "job_id": "7ebed7f7-00dc-44a2-acfa-5bdb53889648", "status": "completed", "submit_time": "2018-08-14T08:45:54.012Z", "user_id": "user1@ibm.com" }, { "job_id": "ffde4c5a-1cc2-448b-b377-43573818e5d8", "status": "completed", "submit_time": "2018-08-14T08:47:33.350Z", "user_id": "user1@ibm.com" } ] }
Get information about a specific SQL job
Returns information about the specified SQL job.
GET /sql_jobs/{job_id}
Request
Path Parameters
ID of the SQL job for which information is to be retrieved. This ID is returned when an SQL job is submitted, and when information about recently submitted SQL jobs is requested.
Query Parameters
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".
curl -XGET --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs/YOUR_JOB_ID?instance_crn=YOUR_DATAENGINE_CRN" -H "Accept: application/json" -H "Authorization: Bearer YOUR_BEARER_TOKEN"
Response
Full information about an SQL job, including output or error information.
Identifier for an SQL job.
Example:
637a0b7d-069d-453f-a418-a35d4db3ea64
Execution status of an SQL job.
Possible values: [
queued
,running
,stopping
,stopped
,completed
,failed
]ID of the user who submitted an SQL job.
Example:
test_user@my.org
Timestamp indicating when an SQL job was accepted by the service.
The SQL query that the job processes.
Timestamp indicating when the status of an SQL job changed to its current value.
The service plan id of the instance.
Format of the query result.
Example:
csv
A URI that indicates where the query result is stored. This URI can be used as input for another SQL query. The result comprises all objects that have a name with this URI as its prefix.
Example:
cos://s3.dal.us.cloud-object-storage.appdomain.cloud/target-bucket/q1-results/jobid=411323a4-04de-440a-9e41-011d31052f54
Timestamp indicating when a batch job finished processing.
Number of rows returned by a batch job.
Number of rows read by a batch job.
Number of bytes read by a batch job.
Number of objects skipped using index management.
Number of objects qualified using index management.
Number of rows (messages) that a streaming job is currently processing.
Timestamp indicating when a streaming job last tried to process messages from an event source. This timestamp increases even if no new data was available and no output data was written.
Currently always false.
An error that was encountered while processing the job.
Detailed information about the error.
Suggests possible optimizations for a query.
Status Code
Status of the specified SQL job.
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
The request did not specify a valid bearer token for authentication.
You are not authorized to perform this action for the specified service instance.
No information was found for the specified job. Note that the system periodically deletes information about completed or failed jobs.
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
An internal error occurred while processing the request.
{ "job_id": "7ebed7f7-00dc-44a2-acfa-5bdb53889648", "status": "completed", "statement": "SELECT e.firstname employee, e.city FROM cos://us-geo/sql/employees.parquet STORED AS PARQUET e", "plan_id": "e03a38d0-5ec1-41c5-b3b3-5e081dc19c8c", "submit_time": "2018-08-14T08:45:54.012Z", "resultset_location": "cos://s3.us.cloud-object-storage.appdomain.cloud/result/test/jobid=7ebed7f7-00dc-44a2-acfa-5bdb53889648", "resultset_format": "parquet", "rows_returned": 9, "rows_read": 9, "bytes_read": 4928, "end_time": "2018-08-14T08:46:01.516Z", "user_id": "user1@ibm.com" }
Stops a specific SQL streaming job
Stops the execution of a specific SQL streaming job.
PUT /sql_jobs/{job_id}/stop
Request
Path Parameters
ID of the SQL streaming job that is to be stopped. This ID is returned when an SQL job is submitted, and when information about recently submitted SQL jobs is requested.
Query Parameters
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".
curl -XPUT --url "https://api.dataengine.cloud.ibm.com/v3/sql_jobs/YOUR_JOB_ID/stop?instance_crn=YOUR_DATAENGINE_CRN" -H "Accept: application/json" -H "Authorization: Bearer YOUR_BEARER_TOKEN"
Response
Status Code
The specified job could be stopped successfully.
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
The request did not specify a valid bearer token for authentication.
You are not authorized to perform this action for the specified service instance.
No information was found for the specified job. Note that the system periodically deletes information about completed or failed jobs.
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
An internal error occurred while processing the request.
No Sample Response
List catalog tables
Retrieve a list of the first 100 tables that are defined for the given instance in the catalog.
GET /tables
Request
Query Parameters
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".
A table name pattern for filtering the tables that should be listed. The pattern follows Hive syntax conventions and can include asterisks as wildcards and vertical bars to separate alternatives.
A table type for filtering the tables that should be listed, can be "table" or "view".
curl -XGET --url "https://api.dataengine.cloud.ibm.com/v3/tables?instance_crn=YOUR_DATAENGINE_CRN" -H "Accept: application/json" -H "Authorization: Bearer YOUR_BEARER_TOKEN"
Response
List of catalog tables.
The table names.
Metadata about the returned tables.
The name of a catalog table.
Example:
customer_address
The type of a catalog table (for example, "TABLE" or "VIEW").
Example:
table
tables_metadata
Status Code
Names of defined catalog tables. The list might be empty.
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
The request did not specify a valid bearer token for authentication.
You are not authorized to perform this action for the specified service instance.
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
The client has submitted too many requests for catalog information. Retry the request after a short wait time.
An internal error occurred while processing the request.
{ "tables": [ "employees", "customers", "products", "orders" ] }
Get information about a specific catalog table
Returns information about the specified catalog table.
GET /tables/{table_name}
Request
Path Parameters
Name of the catalog table for which information is to be retrieved. Table names are case-insensitive and must only contain alphabetic and numeral characters, and underscore (_).
Query Parameters
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".
curl -XGET --url "https://api.dataengine.cloud.ibm.com/v3/tables/YOUR_TABLE_NAME?instance_crn=YOUR_DATAENGINE_CRN" -H "Accept: application/json" -H "Authorization: Bearer YOUR_BEARER_TOKEN"
Response
Detailed information about a catalog table.
The name of a catalog table.
Example:
customer_address
The type of a catalog table (for example, "TABLE" or "VIEW").
Example:
table
The date when the table got created.
Example:
2022-03-14T15:51:04.000Z
The format of the sources.
Example:
CSV
The location of the source for the table.
Example:
cos://sql-0fd3de82-d91f-42a7-b460-d2e2c319ee88.us-geo/customers.csv
The column names used for partitioning.
Example:
[ 'country' ]
The columns of the table.
The name of the column.
The data type of the column.
Example:
string
Whether the column may contain NULL values.
columns
Status Code
Information about the specified table.
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
The request did not specify a valid bearer token for authentication.
You are not authorized to perform this action for the specified service instance.
The specified table is not in the catalog
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
The client has submitted too many requests for catalog information. Retry the request after a short wait time.
An internal error occurred while processing the request.
{ "name": "employees", "type\"": "TABLE", "creation_time": "2022-03-15T14:44:29.000Z", "data_format": "CSV", "location": "cos://sql-0fd3de82-d91f-42a7-b460-d2e2c319ee88.us-geo/employees.csv", "partitioning_columns\"": [ "city" ], "columns": [ { "name": "employeeID", "type": "integer", "nullable": true }, { "name": "lastName", "type": "string", "nullable": true }, { "name": "firstName", "type": "string", "nullable": true }, { "name": "birthDate", "type": "timestamp", "nullable": true }, { "name": "hireDate", "type": "timestamp", "nullable": true }, { "name": "city", "type": "string", "nullable": true } ] }
List partitions of the catalog table
Retrieve the list of partitions of the specified catalog table.
GET /tables/{table_name}/partitions
Request
Path Parameters
Name of the catalog table for which information is to be retrieved. Table names are case-insensitive and must only contain alphabetic and numeral characters, and underscore (_).
Query Parameters
The cloud resource name (CRN) of the Data Engine service instance. See the following example of a CRN: "crn:v1:bluemix:public:sql-query:us-south:a/33e58e0da6e6926e09fd68480e66078e:d30102ec-3444-4512-80bd-51ab7e7f8388::".
curl -XGET --url "https://api.dataengine.cloud.ibm.com/v3/tables/YOUR_TABLE_NAME/partitions?instance_crn=YOUR_DATAENGINE_CRN" -H "Accept: application/json" -H "Authorization: Bearer YOUR_BEARER_TOKEN"
Response
List of table partitions.
The partitions of a the table.
Status Code
List of the partitions for the specified table. The list might be empty.
The request specified invalid data, for example, incorrect headers. Returns details about the validation problem.
The request did not specify a valid bearer token for authentication.
You are not authorized to perform this action for the specified service instance.
The specified table is not in the catalog
The request requires an unsupported output format. Data Engine produces JSON output with UTF-8 encoding. A request cannot be processed if its header specifies that this format is not accepted.
The client has submitted too many requests for catalog information. Retry the request after a short wait time.
An internal error occurred while processing the request.
{ "partitions": [ "country=America/customerID=1", "country=America/customerID=2", "country=Spain/customerID=1", "country=Spain/customerID=2" ] }