Release notes for Data Engine
Use these release notes to learn about the latest IBM Cloud® Data Engine that are grouped by date. Release notes are available for a minimum of three years.
January 2024
- Deprecation
- IBM Cloud® Data Engine is deprecated. As of 18 February 2024 you can't create new instances, and access to free instances will be removed. Existing Standard plan instances are supported until 18 January 2025. Any instances that still exist on that date will be deleted. For more information, see Deprecation of Data Engine.
July 2023
- New IBM Cloud Object Storage browser
- You can now select Object Storage result location and sources of a query using the new Object Storage browser.
June 2023
- Store passwords and API keys in IBM Cloud® Secrets Manager
- You can now store passwords and API keys as secrets in Secrets Manager.
March 2023
- Dark mode available in the Data Engine UI
- The Data Engine UI follows your system theme. You can also choose your theme directly in the profile menu.
January 2023
- Create a new instance with Terraform
- Detailed documentation is now available on how to create new Data Engine instances with Terraform.
November 2022
- Read as text
- Files with an unstable schema, for example logs, can be read faster by using
STORED AS TEXT
. There is no schema inference and values must further be extracted, as explained in Query data with an unstable schema.
August 2022
- Support for BYOK encryption for table metadata
- Table metadata for tables that are created after 23 August 2022 that is associated with an instance that uses BYOK get encrypted by using IBM® Key Protect for IBM Cloud®. See Securing your data in Data Engine
- Chennai deprecation
- You cannot create new instances in the Chennai region anymore. Existing instances still work but will be fully deprecated on 31 October.
May 2022
- Rebranding
- IBM Cloud SQL Query was rebranded to IBM Cloud Data Engine.
- Hive
- Data Engine provides an external Hive metastore (HMS) service.
November 2021
- Add columns to Catalog tables
- You can add columns to existing Catalog tables with the newly supported
ALTER TABLE ... ADD COLUMNS
statement.
July 2021
- Stream landing tutorial
- A detailed getting started tutorial for stream landing with Data Engine is now available.
- New region for stream landing
- The stream landing capability is now also available in Frankfurt, in addition to Dallas.
June 2021
- Stream landing support
- Data Engine now supports stream landing that enables you to stream your data in real time from a topic to a bucket of your choice. This capability enables efficient analytics on the new objects created.
- Connect to data lakes with Cloud Pak for Data
- IBM Cloud Pak® for Data now comes with an integrated connector to Data Engine that allows to connect to cloud data lakes and import data assets into projects and catalogs in Cloud Pak for Data. For more information, see Connecting to a Cloud Data Lake with IBM Cloud Pak for Data.
December 2020
- Supported regions
- Data Engine is available in Chennai, India. When you provision new instances, you can select whether it is being provisioned in Dallas, Frankfurt, or Chennai.
- IBM Cloud Object Storage
- IBM Cloud® Object Storage web console discovers SQL-queryable objects and folders and directly starts the Data Engine web console with a prefilled SQL statement for seamless interactive data exploration.
November 2020
- Modify location of Hive partitions
- The location of Hive partitions can be modified by using the
ALTER TABLE SET LOCATION
feature.
October 2020
- Index management
- Data Engine index management, also referred to as data skipping is generally available with full production support.
- New samples category Reference data statement
- IBM Cloud® Data Engine comes with open data out of the box, including geolocation, and demographic data that can be used as reference data to combine with your own data sets. It is based on open data from US Census, Eurostat Census, UNdata, OpenStreetMap, and Natural Earth. Explore it by using the new category Reference data statements in SAMPLES.
- Time series functions
- Data Engine time series functions: The anchor functions are deprecated and replaced by the new and more powerful expression creation functions.
- Python SDK
- The ibmcloudsql Python SDK significantly expanded in functionality for even more powerful Python analytics with SQL. Take a tour of the functions in the Data Engine Starter Notebook. The Python SDK also comes with a dedicated online documentation.
- Usage of legacy SoftLayer endpoints discontinued
- The usage of the legacy SoftLayer endpoints of Cloud IBM Cloud Object Storage is discontinued. Check out the Cloud Object Storage announcement for more details.
September 2020
- Use JDBC to connect to business intelligence tools
- You can use our JDBC driver to connect Data Engine to business intelligence tools and other applications. To download and configure the driver, see the JDBC documentation.
- Monitoring with Sysdig
- Data Engine supports monitoring metrics for submitted jobs by using IBM Cloud® Monitoring. You can view completed and failed jobs, the number of bytes processed, and the jobs in progress. A default Data Engine dashboard exists, and you can define custom dashboards and alerts.
May 2020
- Index management
- Data Engine supports index management, also referred to as data skipping. Index management can significantly boost performance and reduce cost of your SQL queries by skipping over irrelevant data.
- Service access role Manager is required to run catalog management or index management commands.
- Catalog management
- Data Engine database catalog support is extended to support views.
- Data Engine catalog management is out of the Beta stage and can be used with the Standard plan.
April 2020
- Database catalog
- Data Engine support for database catalog is generally available with full production support. The database catalog is based on Hive Metastore and significantly speeds up query execution and decouples data management from SQL users and applications.
January 2020
- Support for all endpoints
- Data Engine fully supports all current public and private IBM Cloud Object Storage endpoints (ending with
appdomain.cloud
, for example,s3.us.cloud-object-storage.appdomain.cloud
) and all new single data center endpoints (for example,sng01
).
December 2019
- Key Protect
- You can use Key Protect as a secure credential broker to pass credentials to data resources referenced by your queries, thus ensuring safe handling of your secrets. For more information, see the authentication documentation.
November 2019
MULTILINE
option- You can specify a
MULTILINE
option for JSON input data if individual JSON records are stored across multiple lines.
October 2019
- New open source script for uploading large volumes
- The new open source cos-upload script can be used to efficiently upload large volumes of data to IBM Cloud Object Storage buckets with Aspera by merely providing an IAM API Key.
- JSON parser
- Data Engine JSON parser is processing and extracting all JSON keys in lowercase, so it can work correctly on LogDNA data.
- Run automatic SQL-based post processing
- Storing new objects in IBM Cloud Object Storage can trigger Data Engine executions. It is enabled by the IBM Cloud Object Storage event provider for IBM Cloud® Functions. By combining it with the SQL Cloud Functions, you can automatically run SQL-based post processing for new objects.
- Query hints
- Data Engine has query hints for SQL queries that have potential for faster execution by using certain features of Data Engine. These hints are flagged with a light bulb icon in the job list and the specific hint is available inside the Details pane.
September 2019
- Support for ETL to IBM® Db2® on Cloud
- You can specify Db2 target tables in your SQL queries to process data from IBM Cloud Object Storage and save the Data Engine result into Db2 on Cloud.
August 2019
- Support for
DESCRIBE
table transformation function - Support for the
DESCRIBE
table transformation function that enables easier exploration of the schema of data by returning the schema definition instead of data as the table content. Check out the new Starter Query sample in the UI.
July 2019
- JSON preview
- You can directly preview query results in JSON format in the SQL console. Add
INTO <COS URI> STORED AS JSON
to your SQL statement to produce JSON output and preview it in the web console. - Support for Parquet schema evolution
- Support for Parquet schema evolution through the
MERGE SCHEMA
sub clause forSTORED AS PARQUET
input data. Check out the new samples in the UI. - New table transformation functions
- Support for
CLEANCOLS
table transformation function that generically cleanses all input column names from characters that are not supported by the Parquet target format. Check out the new Samples. - Support for
FLATTEN
table transformation function that generically flattens all nested input columns into a flat hierarchy, allowing to easily work with, for example, JSON input data and write the results out to flat CSV files. Check out the new samples in the UI.
June 2019
- Data Engine now available in Frankfurt
- Data Engine is available in Frankfurt, Germany. When you provision new instances, you can select whether it is being provisioned in Dallas or in Frankfurt.
- Support for time series SQL functions
- Support for time series SQL functions to process time series data, for example, to identify trends and to predict future values based on these trends.
May 2019
- Updates to the SQL reference
JOBPREFIX JOBID/NONE
, you can specify whether you want the job ID to be appended to the target prefix, or not.- The
SORT BY
clause for SQL targets is new. You can use it to sort SQL result sets in many ways before you write the results to IBM Cloud Object Storage. It can be used in combination withPARTITIONED BY
,PARTITIONED INTO
(to cluster the results), or without the PARTITIONED clause. PARTITIONED INTO BUCKETS
andPARTITIONED INTO OBJECTS
are both supported, thus you can use them synonymously.
April 2019
- Support for encryption with IBM® Key Protect for IBM Cloud®
- Support for encrypting SQL queries with IBM Key Protect. IBM Key Protect is a centralized key management system (KMS) for generating, managing, and destroying encryption keys used by IBM Cloud® services. If you are processing sensitive data in your queries, you can use customer-managed keys to encrypt SQL query texts and error messages that are stored in the job information.
- IBM Cloud Data Engine with IBM Key Protect for managing encryption keys meets the required IBM controls that are commensurate with the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Security and Privacy Rule requirements.
February 2019
- Beta support for JDBC driver
- Beta support for JDBC driver of Data Engine. Request to participate by sending an email to Joshua.Mintz@ibm.com.
- Beta support for data skipping
- Beta support for data skipping indexes. You can create custom indexes on any column for minimum and maximum values, list of values, and geospatial bounding box for any object queried. This significantly reduces I/O and query cost and lower the query execution time.
- Beta support for time series
- Beta support for SQL-native time series in Data Engine. This includes functions for time series segmentation, prediction, alignment, temporal joins, and subsequence mining. Request to participate by sending an email to Joshua.Mintz@ibm.com.
December 2018
- New SQl reference guide
- Release of a complete SQL Reference Guide, an SQL introduction for Cloud SQL/Spark SQL. The new reference guide includes examples that can be copied and directly pasted into the web UI to be run.
November 2018
- Support for hive-style partitioning
- Support for controlling the layout of SQL results. Including support for creating hive-style partitioning and paginated result data.
- Support for Python SDK extensions
- Support for extensions in Python SDK for result data partitioning, pagination, and exporting SQL job history to IBM Cloud Object Storage.
October 2018
- Control your result's format
- Support for SELECT INTO to control the format the SQL result is written in.
August 2018
- General availability
- IBM Cloud Data Engine is generally available. Its open beta phase ended.
- New built-in functions
- Support for new built-in SQL functions released with Apache Spark 2.3.
- Set of SQL optimizer and ANSI SQL and Hive SQL compliance enhancements that are introduced with Apache Spark 2.3.
June 2018
- ORC
- Support for ORC data (STORED AS ORC).
- Geospatial functions
- Support for geospatial SQL functions for calculations, aggregations, and joins on location data.
- ibmcloudsql Node.js client SDK
- Release of ibmcloudsql Node.js client SDK.
April 2018
- Introducing IBM Cloud Data Engine
- IBM Cloud Data Engine release beta.