Configuring Spark log level information

Review the applications that run and identify the issues that are present by using the logs that the IBM Analytics Engine Spark application generates. The standard logging levels available are ALL, TRACE, DEBUG, INFO, WARN, ERROR, FATAL, and OFF. By default, the Analytics Engine Spark application logs at the WARN log level for spark drivers and OFF for spark executors. You can configure the logging level to display relevant, and fewer verbose messages.

The IBM Analytics Engine logging configuration configures the log level information of the Spark framework. It does not affect the logs written by the user code using commands such as 'logger.info()', 'logger.warn()', 'print()', or'show()' in the Spark application.

Configuring options

Configure the following IBM Analytics Engine logs to set up the Spark application log level:

Spark driver logs (by using ae.spark.driver.log.level)
Spark executor logs (by using ae.spark.executor.log.level)

Specify the option in the Spark configurations section at the time of provisioning an IBM Analytics Engine instance or submitting a Spark application. You can specify the following standard log level values:

ALL
TRACE
DEBUG
INFO
WARN
ERROR
FATAL
OFF

The default value for driver log level is WARN and executor log level is OFF.

You can apply the configuration in the following two ways:

Instance level configuration
Application level configuration

Configuring Spark log level information at the instance level

At the time of provisioning an IBM Analytics Engine instance, specify the log level configurations under the default_config attribute. For more information, see Default Spark configuration.

Example :


    "default_config": {
        "ae.spark.driver.log.level": "WARN",
        "ae.spark.executor.log.level": "ERROR"
    }

Configuring Spark log level information at the application level

At the time of submitting a job, specify the options in the payload under conf. For more information, see Spark application REST API.

Example :


{
     "conf": {
	"ae.spark.driver.log.level":"WARN",
	"ae.spark.executor.log.level":"WARN",
     }
}

Sample use case

Setting the log-level Spark configuration at an instance level : The sample use case considers the scenario where you provision an IBM Analytics Engine instance and configure the log level such that all the applications in the instance log at ERROR.

Set the following configurations as default Spark configurations:
- ae.spark.driver.log.level = ERROR
- ae.spark.executor.log.level = ERROR
After setting the default Spark configuration, the log level for all applications that are submitted to the instance is set to ERROR (provided the application payload does not specify the Spark configuration during submission).

Setting log-level Spark configuration at job level : The sample use case considers a scenario where you have an application and the log level configuired such that logs are logged at the INFO level. You can specify the spark configuration in the payload. Consider the sample payload:

{
  "application_details": {
     "application": "cos://<application-bucket-name>.<cos-reference-name>/my_spark_application.py",
     "arguments": ["arg1", "arg2"],
     "conf": {
        "spark.hadoop.fs.cos.<cos-reference-name>.endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud",
        "spark.hadoop.fs.cos.<cos-reference-name>.access.key": "<access_key>",
        "spark.hadoop.fs.cos.<cos-reference-name>.secret.key": "<secret_key>",
        "spark.app.name": "MySparkApp",
     "ae.spark.driver.log.level":"INFO",
     "ae.spark.executor.log.level":"INFO",
     }
  }
}

In the sample use case, the Spark application overrides the log level Spark configuration set at instance level that is, ERROR to INFO.