About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Spark engine
The watsonx.data platform includes a built-in native Spark engine that allows you to perform big data analytics seamlessly. Additionally, watsonx.data supports external Spark engines, enabling you to leverage Spark clusters outside of the watsonx.data environment. You can use watsonx.data Spark engine to achieve the following use cases:
- Ingesting large volumes of data into watsonx.data tables. You can also cleanse and transform data before ingestion.
- Table maintenance operation to enhance watsonx.data performance of the table
- Complex analytics workload which are difficult to represent as queries.
For more information about provisioning the engine, see Provisioning a Spark engine.
IBM® watsonx.data allows you to integratrate with the following types of Spark:
Native Spark engine Native Spark engine is a compute engine that is available within watsonx.data instance. With native Spark engine, you can fully manage Spark Engine configuration, manage access to Spark Engines and run applications by using watsonx.data UI and REST API endpoints.
For more information, see Working with native Spark engine section.
Gluten accelerated Spark engine Performance optimized data processing engine capable of processing Spark applications. It uses Gluten, which relies on Velox (C++) generic database acceleration library that optimize the queries. This is an effective solution to speed up and simplify your process if you work with very huge data set. For more information, see Gluten accelerated Spark engine.
For more information, see Working with native Spark engine section.
External Spark engine External Spark engines are engines that exist in a different cluster from where watsonx.data is provisioned. You can deploy them in the following environments:
- Spark instance on Cloud
- Spark on EMR
For more information, see Working with external Spark engine section.