Release notes for Data Engine

Use these release notes to learn about the latest IBM Cloud® Data Engine that are grouped by date. Release notes are available for a minimum of three years.

January 2024

Deprecation: IBM Cloud® Data Engine is deprecated. As of 18 February 2024 you can't create new instances, and access to free instances will be removed. Existing Standard plan instances are supported until 18 January 2025. Any instances that still exist on that date will be deleted. For more information, see Deprecation of Data Engine.

July 2023

New IBM Cloud Object Storage browser: You can now select Object Storage result location and sources of a query using the new Object Storage browser.

June 2023

Store passwords and API keys in IBM Cloud® Secrets Manager: You can now store passwords and API keys as secrets in Secrets Manager.

March 2023

Dark mode available in the Data Engine UI: The Data Engine UI follows your system theme. You can also choose your theme directly in the profile menu.

January 2023

Create a new instance with Terraform: Detailed documentation is now available on how to create new Data Engine instances with Terraform.

November 2022

Read as text: Files with an unstable schema, for example logs, can be read faster by using STORED AS TEXT. There is no schema inference and values must further be extracted, as explained in Query data with an unstable schema.

August 2022

Support for BYOK encryption for table metadata: Table metadata for tables that are created after 23 August 2022 that is associated with an instance that uses BYOK get encrypted by using IBM® Key Protect for IBM Cloud®. See Securing your data in Data Engine
Chennai deprecation: You cannot create new instances in the Chennai region anymore. Existing instances still work but will be fully deprecated on 31 October.

May 2022

Rebranding: IBM Cloud SQL Query was rebranded to IBM Cloud Data Engine.
Hive: Data Engine provides an external Hive metastore (HMS) service.

November 2021

Add columns to Catalog tables: You can add columns to existing Catalog tables with the newly supported ALTER TABLE ... ADD COLUMNS statement.

July 2021

Stream landing tutorial: A detailed getting started tutorial for stream landing with Data Engine is now available.
New region for stream landing: The stream landing capability is now also available in Frankfurt, in addition to Dallas.

June 2021

Stream landing support: Data Engine now supports stream landing that enables you to stream your data in real time from a topic to a bucket of your choice. This capability enables efficient analytics on the new objects created.
Connect to data lakes with Cloud Pak for Data: IBM Cloud Pak® for Data now comes with an integrated connector to Data Engine that allows to connect to cloud data lakes and import data assets into projects and catalogs in Cloud Pak for Data. For more information, see Connecting to a Cloud Data Lake with IBM Cloud Pak for Data.

December 2020

Supported regions: Data Engine is available in Chennai, India. When you provision new instances, you can select whether it is being provisioned in Dallas, Frankfurt, or Chennai.
IBM Cloud Object Storage: IBM Cloud® Object Storage web console discovers SQL-queryable objects and folders and directly starts the Data Engine web console with a prefilled SQL statement for seamless interactive data exploration.

November 2020

Modify location of Hive partitions: The location of Hive partitions can be modified by using the ALTER TABLE SET LOCATION feature.

October 2020

Index management: Data Engine index management, also referred to as data skipping is generally available with full production support.
New samples category Reference data statement: IBM Cloud® Data Engine comes with open data out of the box, including geolocation, and demographic data that can be used as reference data to combine with your own data sets. It is based on open data from US Census, Eurostat Census, UNdata, OpenStreetMap, and Natural Earth. Explore it by using the new category Reference data statements in SAMPLES.
Time series functions: Data Engine time series functions: The anchor functions are deprecated and replaced by the new and more powerful expression creation functions.
Python SDK: The ibmcloudsql Python SDK significantly expanded in functionality for even more powerful Python analytics with SQL. Take a tour of the functions in the Data Engine Starter Notebook. The Python SDK also comes with a dedicated online documentation.
Usage of legacy SoftLayer endpoints discontinued: The usage of the legacy SoftLayer endpoints of Cloud IBM Cloud Object Storage is discontinued. Check out the Cloud Object Storage announcement for more details.

September 2020

Use JDBC to connect to business intelligence tools: You can use our JDBC driver to connect Data Engine to business intelligence tools and other applications. To download and configure the driver, see the JDBC documentation.
Monitoring with Sysdig: Data Engine supports monitoring metrics for submitted jobs by using IBM Cloud® Monitoring. You can view completed and failed jobs, the number of bytes processed, and the jobs in progress. A default Data Engine dashboard exists, and you can define custom dashboards and alerts.

May 2020

Index management: Data Engine supports index management, also referred to as data skipping. Index management can significantly boost performance and reduce cost of your SQL queries by skipping over irrelevant data.; Service access role Manager is required to run catalog management or index management commands.
Catalog management: Data Engine database catalog support is extended to support views.; Data Engine catalog management is out of the Beta stage and can be used with the Standard plan.

April 2020

Database catalog: Data Engine support for database catalog is generally available with full production support. The database catalog is based on Hive Metastore and significantly speeds up query execution and decouples data management from SQL users and applications.

January 2020

Support for all endpoints: Data Engine fully supports all current public and private IBM Cloud Object Storage endpoints (ending with appdomain.cloud, for example, s3.us.cloud-object-storage.appdomain.cloud) and all new single data center endpoints (for example, sng01).

December 2019

Key Protect: You can use Key Protect as a secure credential broker to pass credentials to data resources referenced by your queries, thus ensuring safe handling of your secrets. For more information, see the authentication documentation.

November 2019

MULTILINE option: You can specify a MULTILINE option for JSON input data if individual JSON records are stored across multiple lines.

October 2019

New open source script for uploading large volumes: The new open source cos-upload script can be used to efficiently upload large volumes of data to IBM Cloud Object Storage buckets with Aspera by merely providing an IAM API Key.
JSON parser: Data Engine JSON parser is processing and extracting all JSON keys in lowercase, so it can work correctly on LogDNA data.
Run automatic SQL-based post processing: Storing new objects in IBM Cloud Object Storage can trigger Data Engine executions. It is enabled by the IBM Cloud Object Storage event provider for IBM Cloud® Functions. By combining it with the SQL Cloud Functions, you can automatically run SQL-based post processing for new objects.
Query hints: Data Engine has query hints for SQL queries that have potential for faster execution by using certain features of Data Engine. These hints are flagged with a light bulb icon in the job list and the specific hint is available inside the Details pane.

September 2019

Support for ETL to IBM® Db2® on Cloud: You can specify Db2 target tables in your SQL queries to process data from IBM Cloud Object Storage and save the Data Engine result into Db2 on Cloud.

August 2019

Support for DESCRIBE table transformation function: Support for the DESCRIBE table transformation function that enables easier exploration of the schema of data by returning the schema definition instead of data as the table content. Check out the new Starter Query sample in the UI.

July 2019

JSON preview: You can directly preview query results in JSON format in the SQL console. Add INTO <COS URI> STORED AS JSON to your SQL statement to produce JSON output and preview it in the web console.
Support for Parquet schema evolution: Support for Parquet schema evolution through the MERGE SCHEMA sub clause for STORED AS PARQUET input data. Check out the new samples in the UI.
New table transformation functions: Support for CLEANCOLS table transformation function that generically cleanses all input column names from characters that are not supported by the Parquet target format. Check out the new Samples.; Support for FLATTEN table transformation function that generically flattens all nested input columns into a flat hierarchy, allowing to easily work with, for example, JSON input data and write the results out to flat CSV files. Check out the new samples in the UI.

June 2019

Data Engine now available in Frankfurt: Data Engine is available in Frankfurt, Germany. When you provision new instances, you can select whether it is being provisioned in Dallas or in Frankfurt.
Support for time series SQL functions: Support for time series SQL functions to process time series data, for example, to identify trends and to predict future values based on these trends.

May 2019

Updates to the SQL reference: JOBPREFIX JOBID/NONE, you can specify whether you want the job ID to be appended to the target prefix, or not.; The SORT BY clause for SQL targets is new. You can use it to sort SQL result sets in many ways before you write the results to IBM Cloud Object Storage. It can be used in combination with PARTITIONED BY, PARTITIONED INTO (to cluster the results), or without the PARTITIONED clause.; PARTITIONED INTO BUCKETS and PARTITIONED INTO OBJECTS are both supported, thus you can use them synonymously.

April 2019

Support for encryption with IBM® Key Protect for IBM Cloud®: Support for encrypting SQL queries with IBM Key Protect. IBM Key Protect is a centralized key management system (KMS) for generating, managing, and destroying encryption keys used by IBM Cloud® services. If you are processing sensitive data in your queries, you can use customer-managed keys to encrypt SQL query texts and error messages that are stored in the job information.; IBM Cloud Data Engine with IBM Key Protect for managing encryption keys meets the required IBM controls that are commensurate with the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Security and Privacy Rule requirements.

February 2019

Beta support for JDBC driver: Beta support for JDBC driver of Data Engine. Request to participate by sending an email to Joshua.Mintz@ibm.com.
Beta support for data skipping: Beta support for data skipping indexes. You can create custom indexes on any column for minimum and maximum values, list of values, and geospatial bounding box for any object queried. This significantly reduces I/O and query cost and lower the query execution time.
Beta support for time series: Beta support for SQL-native time series in Data Engine. This includes functions for time series segmentation, prediction, alignment, temporal joins, and subsequence mining. Request to participate by sending an email to Joshua.Mintz@ibm.com.

December 2018

New SQl reference guide: Release of a complete SQL Reference Guide, an SQL introduction for Cloud SQL/Spark SQL. The new reference guide includes examples that can be copied and directly pasted into the web UI to be run.

November 2018

Support for hive-style partitioning: Support for controlling the layout of SQL results. Including support for creating hive-style partitioning and paginated result data.
Support for Python SDK extensions: Support for extensions in Python SDK for result data partitioning, pagination, and exporting SQL job history to IBM Cloud Object Storage.

October 2018

Control your result's format: Support for SELECT INTO to control the format the SQL result is written in.

August 2018

General availability: IBM Cloud Data Engine is generally available. Its open beta phase ended.
New built-in functions: Support for new built-in SQL functions released with Apache Spark 2.3.; Set of SQL optimizer and ANSI SQL and Hive SQL compliance enhancements that are introduced with Apache Spark 2.3.

June 2018

ORC: Support for ORC data (STORED AS ORC).
Geospatial functions: Support for geospatial SQL functions for calculations, aggregations, and joins on location data.
ibmcloudsql Node.js client SDK: Release of ibmcloudsql Node.js client SDK.

April 2018

Introducing IBM Cloud Data Engine: IBM Cloud Data Engine release beta.