IBM Cloud Docs
Spark history server

Spark history server

The Spark applications that are submitted on an IBM Analytics Engine instance forward their Spark events to the Object Storage bucket that was defined as the instance home. The Spark history server provides a Web UI to view these Spark events. The Web UI helps you to analyze how your Spark applications ran by displaying useful information like:

  • A list of the stages that the application goes through when it is run
  • The number of tasks in each stage
  • The configuration details such as the running executors and memory usage

See the Spark History server documentation for more details.

You can disable forwarding Spark events from a Spark application by setting the property spark.eventLog.enabled to false in the Spark application configuration.

Starting and stopping the Spark history server

Before accessing the Spark history server, you need to start the server. When you no longer need it, you should stop the server. You will be charged for the CPU cores and memory consumed by the Spark history server while it is running.

The Spark history server can be started and stopped by using:

Analytics Engine REST API

You can use the Analytics Engine REST API:

  1. To view the status of the Spark history server

    curl "https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance_id>/spark_history_server" --header "Authorization: bearer <iam token>"
    
  2. To start the Spark history server

    curl --location --request POST "https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance_id>/spark_history_server" --header "Authorization: bearer <iam token>"
    
  3. To stop the Spark history server

    curl --location --request DELETE "https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance_id>/spark_history_server" --header "Authorization: bearer <iam token>"
    

Analytics Engine instance UI

You can use the Analytics Engine instance UI: ​

  1. To view the Spark history server status:

    1. Open your resource list on IBM Cloud.
    2. Click Services and software and select your instance to open the details page.
    3. Select the Spark history tab. The current status of the server is shown on this page.

    If the status of the Spark history server is set to Started, you can also click View Spark history to launch the Web UI of the Spark history server in a new browser tab.

  2. To start the Spark history server:

    1. On the Spark history page, click Start history server.
    2. Choose the server configuration and click Start to start the Spark history server.
  3. To stop the Spark history server:

    1. On the Spark history page, click Stop history server.

Opening the Spark history server Web UI

You can open the Spark history Web UI by opening the instance details page of your Analytics Engine service instance, switching to the Spark history tab and clicking View Spark history.

Alternatively, the Spark history server Web UI URL can be obtained through a service endpoint that is made available to you as a service key (also known as a service credential). See Retrieving service endpoints.

Ensure that the Spark history server is running before you open the Web UI.

Log links under the Stages and Executors tabs of the Spark history server UI will not work as logs are not preserved with the Spark events. To review the task and executor logs, enable platform logging. For details, see Configuring and viewing logs.

To view older applications on Spark history server UI, copy the spark-events from old path to new path by using the command: ibmcloud cos object-copy --bucket <destination_bucket> --key /spark-events/<eventlog_app-1> --copy-source /spark-events/<eventlog_app-1>

Accessing the Spark history server REST API

In addition to the web UI, the Spark history server also provides a REST API which can be queried to view the Spark events generated by your Spark applications. The Spark history server REST API is available as a service endpoint in a service key (also known as service credential). See Retrieving service endpoints.

When you invoke the Spark history server REST API, you must specify your IAM token as a bearer token in the Authorization header. E.g.

curl --location --request GET 'https://spark-console.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance id>/spark_history_api/v1/applications?status=completed' \
--header 'Authorization: Bearer <iam token>'

See the Spark history server REST API documentation for more details.

Customizing the Spark history server

By default, the Spark history server consumes 1 CPU core and 4 GiB of memory while it is running. If you want your Spark history server to use more resources, you can set the following properties in your Analytics Engine instance default configurations:

  • ae.spark.history-server.cores for the number of CPU cores
  • ae.spark.history-server.memory for the amount of memory

Updating the CPU cores and memory settings using the REST API

You can update the CPU cores and memory settings using the Analytics Engine REST API as follows:

curl --location --request PATCH "https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance_id>/default_configs" \
--header "Authorization: bearer <iam_token>" \
--header 'Content-Type: application/json' \
--data-raw '{
        "ae.spark.history-server.cores": "2",
        "ae.spark.history-server.memory": "8G"
}'

Only a pre-defined list of Spark driver and executor vCPU, and memory size combinations are supported. See Supported Spark driver and executor vCPU and memory combinations.

Additional customisations

You can customize the Spark history server further by adding properties to the default Spark configuration of your Analytics Engine instance. See standard Spark history configuration options.

As this is a managed offering, you can't customize all of the standard Spark configuration options.

For a list of the supported configurations and default values for settings, see Default Spark configuration.

Best practices

Always stop the Spark history server when you no longer need to use it. Bear in mind that the Spark history server consumes CPU amd memory resources continuously while its state is Started.