IBM Cloud Docs
Streaming to Cloud Object Storage by using Data Engine

Streaming to Cloud Object Storage by using Data Engine

IBM Cloud® Data Engine is deprecated. As of 18 February 2024 you can't create new instances, and access to free instances will be removed. Existing Standard plan instances are supported until 18 January 2025. Any instances that still exist on that date will be deleted. For more information, see Deprecation of Data Engine.

Extend your data pipeline to Cloud IBM Cloud® Object Storage to easily archive data for long-term storage or to gain insight by leveraging interactive queries or big data analytics. From the Event Streams UI, topics can be selected and linked to Cloud Object Storage buckets, with data automatically and securely streamed by using the fully managed Data Engine service. All data is stored in Parquet format, making it easy to manage and process.

Streaming to Cloud Object Storage by usingData Engine
Figure 1. Diagram showing streaming to Cloud Object Storage by using Data Engine

The following task walks you through:

  • Creating the required services.
  • Setting up Cloud Object Storage landing by using Data Engine.
  • Verifying that the events are stored in Object Storage.

Data Engine consumes batches of events from Kafka and stores the data as Parquet objects in the Cloud Object Storage service. The process is triggered in the background by Event Streams submitting an SQL landing statement to Data Engine.

Complete the following steps to start the streams landing.

Step 1. Prerequisites

Ensure that you configured the following services:

  • An Event Streams instance - Standard or Enterprise plan. You must create credentials.
  • A Cloud Object Storage instance with at least one bucket.
  • A Data Engine instance - Standard plan.
  • An IBM® Key Protect instance.

These services can also be created after you start configuring your stream landing job in the set-up wizard.

Ensure that you have the following permissions:

  • Permission to create service-to-service authentication.
  • Permission to create service IDs and API keys.
  • Permission to write to IBM® Key Protect (to store the API key).
  • Reader access role for the cluster, topic and group resources within the Event Streams service instance (or a Reader access role for the service instance as a whole).
  • Writer role for the Cloud Object Storage bucket.

Step 2. Set up the Cloud Object Storage stream landing

  1. Click the Overflow menu (three vertical dots next to the topic) to start and select the streaming topic data option to see the streams landing overview page.

  2. Click Start to start the wizard.

  3. Select the required Cloud Object Storage instance, then click Next.

  4. Within the Cloud Object Storage instance, select the bucket where you want the events to be stored, then click Next.

  5. Select the Data Engine instance. Only instances with a Standard plan are listed.

  6. Configure the streams landing by completing the following steps:

    • Define the prefix of the Object Storage objects.
    • Specify the event format (JSON).
    • Create or select a service ID with the correct IAM access policies. This service ID is used to create an API key.
    • Select a IBM® Key Protect instance to store the new API key that can be used later by Data Engine to run the landing until you stop it again.
    • Click Start streaming data to enable a stream landing job.

Step 3. Validate that your stream landing job is working

To validate that streams landing is working, complete the following steps:

  • Verify that the specified prefix in Object Storage is filled with Parquet objects.
  • Check the status of all streaming jobs in the Data Engine UI.
  • Alternatively, use the REST API of Data Engine to get the list and the details of running stream landing jobs.
  • In the Event Streams UI, you also get information about the active stream landing jobs per topic. Using Event Streams, you can view and stop the landing configuration.

Estimating cost

You can evaluate the cost of your own planned usage with the IBM Cloud cost estimator

Limitations

  • With Data Engine you can process up to 1 MB event data per second. The final reached data throughput depends on parameters, such as topic partitions, and size and format of the events.
  • For one Data Engine instance, the limit is five concurrent stream landing jobs. The limit can be raised upon request by a support ticket.
  • The Event Streams feature is available only for Data Engine instances that are created in the US-South region and in Frankfurt.

For more information, see Streaming to Cloud Object Storage by using Data Engine in the Data Engine documentation.