Spark engine

The watsonx.data platform includes a built-in native Spark engine that allows you to perform big data analytics seamlessly. Additionally, watsonx.data supports external Spark engines, enabling you to leverage Spark clusters outside of the watsonx.data environment. You can use watsonx.data Spark engine to achieve the following use cases:

Ingesting large volumes of data into watsonx.data tables. You can also cleanse and transform data before ingestion.
Table maintenance operation to enhance watsonx.data performance of the table
Complex analytics workload which are difficult to represent as queries.

For more information about provisioning the engine, see Provisioning a Spark engine.

IBM® watsonx.data allows you to integratrate with the following types of Spark:

Native Spark engine Native Spark engine is a compute engine that is available within watsonx.data instance. With native Spark engine, you can fully manage Spark Engine configuration, manage access to Spark Engines and run applications by using watsonx.data UI and REST API endpoints.

For more information, see Working with native Spark engine section.

Gluten accelerated Spark engine Performance optimized data processing engine capable of processing Spark applications. It uses Gluten, which relies on Velox (C++) generic database acceleration library that optimize the queries. This is an effective solution to speed up and simplify your process if you work with very huge data set. For more information, see Gluten accelerated Spark engine.

For more information, see Working with native Spark engine section.

External Spark engine External Spark engines are engines that exist in a different cluster from where watsonx.data is provisioned. You can deploy them in the following environments:

Spark instance on Cloud
Spark on EMR

For more information, see Working with external Spark engine section.