IBM Cloud Docs
Monitoring cluster health

Monitoring cluster health

Set up monitoring in IBM Cloud® Kubernetes Service to help you troubleshoot issues and improve the health and performance of your Kubernetes clusters and apps.

Continuous monitoring and logging is the key to detecting attacks on your cluster and troubleshooting issues as they arise. By continuously monitoring your cluster, you're able to better understand your cluster capacity and the availability of resources that are available to your app. With this insight, you can prepare to protect your apps against downtime.

Choosing a monitoring solution

Metrics help you monitor the health and performance of your clusters. You can use the standard Kubernetes and container runtime features to monitor the health of your clusters and apps.

Every Kubernetes master is continuously monitored by IBM. IBM Cloud Kubernetes Service automatically scans every node where the Kubernetes master is deployed for vulnerabilities that are found in Kubernetes and OS-specific security fixes. If vulnerabilities are found, IBM Cloud Kubernetes Service automatically applies fixes and resolves vulnerabilities on behalf of the user to ensure master node protection. You are responsible for monitoring and analyzing the logs for the rest of your cluster components.

To avoid conflicts when using metrics services, be sure that clusters across resource groups and regions have unique names.

IBM Cloud® Monitoring
Gain operational visibility into the performance and health of your apps and your cluster by deploying a Monitoring agent to your worker nodes. The agent collects pod and cluster metrics, and sends these metrics to IBM Cloud Monitoring. For more information about IBM Cloud Monitoring, see the service documentation. To set up the Monitoring agent in your cluster, see Viewing cluster and app metrics with IBM Cloud Monitoring.
Kubernetes dashboard
The Kubernetes dashboard is an administrative web interface where you can review the health of your worker nodes, find Kubernetes resources, deploy containerized apps, and troubleshoot apps with logging and monitoring information. For more information about how to access your Kubernetes dashboard, see Launching the Kubernetes dashboard for IBM Cloud Kubernetes Service.

Forwarding cluster and app metrics to IBM Cloud Monitoring

Use the IBM Cloud Kubernetes Service observability plug-in to create a monitoring configuration for IBM Cloud Monitoring in your cluster, and use this monitoring configuration to automatically collect and forward metrics to IBM Cloud Monitoring.

With IBM Cloud Monitoring, you can collect cluster and pod metrics, such as the CPU and memory usage of your worker nodes, incoming and outgoing HTTP traffic for your pods, and data about several infrastructure components. In addition, the agent can collect custom application metrics by using either a Prometheus-compatible scraper or a statsd facade.

Considerations for using the IBM Cloud Kubernetes Service observability plug-in:

  • You can have only one monitoring configuration for IBM Cloud Monitoring in your cluster at a time. If you want to use a different IBM Cloud Monitoring service instance to send metrics to, use the ibmcloud ob monitoring config replace command.
  • If you created a Monitoring configuration in your cluster without using the IBM Cloud Kubernetes Service observability plug-in, you can use the ibmcloud ob monitoring agent discover command to make the configuration visible to the plug-in. Then, you can use the observability plug-in commands and functionality in the IBM Cloud console to manage the configuration.

Before you begin:

To set up a monitoring configuration for your cluster:

  1. Create an IBM Cloud Monitoring service instance and note the name of the instance. The service instance must belong to the same IBM Cloud account where you created your cluster, but can be in a different resource group and IBM Cloud region than your cluster.

  2. Set up a monitoring configuration for your cluster. When you create the monitoring configuration, a Kubernetes namespace ibm-observe is created and a Monitoring agent is deployed as a Kubernetes daemon set to all worker nodes in your cluster. This agent collects cluster and pod metrics, such as the worker node CPU and memory usage, or the amount incoming and outgoing network traffic to your pods.

    In the console.

    1. From the Kubernetes clusters console, select the cluster for which you want to create a Monitoring configuration.
    2. On the cluster Overview page, click Connect.
    3. Select the region and the IBM Cloud Monitoring service instance that you created earlier, and click Connect.

    In the CLI.

    1. Create the Monitoring configuration. When you create the Monitoring configuration, the access key that was last added is retrieved automatically. If you want to use a different access key, add the --sysdig-access-key <access_key> option to the command.

      To use a different service access key after you created the monitoring configuration, use the ibmcloud ob monitoring config replace command.

      Version 1.30 and later: If your cluster has outbound traffic protection enabled, you must set up monitoring by using the private endpoint. To do this, specify the --private-endpoint option.

      ibmcloud ob monitoring config create --cluster <cluster_name_or_ID> --instance <Monitoring_instance_name_or_ID> [--private-endpoint]
      

      Example output

      Creating configuration...
      OK
      
    2. Verify that the monitoring configuration was added to your cluster.

      ibmcloud ob monitoring config list --cluster <cluster_name_or_ID>
      

      Example output

      Listing configurations...
      
      OK
      Instance Name                Instance ID                            CRN   
      IBM Cloud Monitoring-aaa     1a111a1a-1111-11a1-a1aa-aaa11111a11a   crn:v1:prod:public:sysdig:us-south:a/a11111a1aaaaa11a111aa11a1aa1111a:1a111a1a-1111-11a1-a1aa-aaa11111a11a::  
      
  3. Optional: Verify that the Monitoring agent was set up successfully.

    1. If you used the console to create the Monitoring configuration, log in to your cluster.

    2. Verify that the daemon set for the Monitoring agent was created and all instances are listed as AVAILABLE.

      kubectl get daemonsets -n ibm-observe
      

      Example output

      NAME           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
      sysdig-agent   9         9         9       9            9           <none>          14m
      

      The number of daemon set instances that are deployed equals the number of worker nodes in your cluster.

    3. Review the ConfigMap that was created for your Monitoring agent.

      kubectl describe configmap -n ibm-observe
      
  4. Access the metrics for your pods and cluster from the Monitoring dashboard.

    1. From the Kubernetes clusters console, select the cluster that you configured.
    2. On the cluster Overview page, click Launch. The Monitoring dashboard opens.
    3. Review the pod and cluster metrics that the Monitoring agent collected from your cluster. It might take a few minutes for your first metrics to show.
  5. Review how you can work with the Monitoring dashboard to further analyze your metrics.

Viewing cluster states

Review the state of a Kubernetes cluster to get information about the availability and capacity of the cluster, and potential problems that might occur.

To view information about a specific cluster, such as its zones, service endpoint URLs, Ingress subdomain, version, and owner, use the ibmcloud ks cluster get --cluster <cluster_name_or_ID> command. Include the --show-resources option to view more cluster resources such as add-ons for storage pods or subnet VLANs for public and private IPs.

You can review information about the overall cluster, the IBM-managed master, and your worker nodes. To troubleshoot your cluster and worker nodes, see Troubleshooting clusters.

Master states

Your IBM Cloud Kubernetes Service cluster includes an IBM-managed master with highly available replicas, automatic security patch updates applied for you, and automation in place to recover in case of an incident. You can check the health, status, and state of the cluster master by running ibmcloud ks cluster get --cluster <cluster_name_or_ID>.

The Master Health reflects the state of master components and notifies you if something needs your attention. The health might be one of the following states.

  • error: The master is not operational. IBM is automatically notified and takes action to resolve this issue. You can continue monitoring the health until the master is normal. You can also open an IBM Cloud support case.
  • normal: The master is operational and healthy. No action is required.
  • unavailable: The master might not be accessible, which means some actions such as resizing a worker pool are temporarily unavailable. IBM is automatically notified and takes action to resolve this issue. You can continue monitoring the health until the master is normal.
  • unsupported: The master runs an unsupported version of Kubernetes. You must update your cluster to return the master to normal health.

The Master Status provides details of what operation from the master state is in progress. The status includes a timestamp of how long the master has been in the same state, such as Ready (1 month ago). The Master State reflects the lifecycle of possible operations that can be performed on the master, such as deploying, updating, and deleting. Each state is described in the following table.

Master states
Master state Description
deployed The master is successfully deployed. Check the status to verify that the master is Ready or to see if an update is available.
deploying The master is currently deploying. Wait for the state to become deployed before working with your cluster, such as adding worker nodes.
deploy_failed The master failed to deploy. IBM Support is notified and works to resolve the issue. Check the Master Status field for more information, or wait for the state to become deployed.
deleting The master is currently deleting because you deleted the cluster. You can't undo a deletion. After the cluster is deleted, you can no longer check the master state because the cluster is completely removed.
delete_failed The master failed to delete. IBM Support is notified and works to resolve the issue. You can't resolve the issue by trying to delete the cluster again. Instead, check the Master Status field for more information, or wait for the cluster to delete. You can also open an IBM Cloud support case.
scaled_down The master resources have been scaled down to zero replicas. This is a temporary state that occurs while etcd is being restored after a backup. You cannot interact with your cluster while it is in this state. Wait for the etcd restoration to complete and the master state to return to deployed.
updating The master is updating its Kubernetes version. The update might be a patch update that is automatically applied, or a minor or major version that you applied by updating the cluster. During the update, your highly available master can continue processing requests, and your app workloads and worker nodes continue to run. After the master update is complete, you can update your worker nodes. If the update is unsuccessful, the master returns to a deployed state and continues running the previous version. IBM Support is notified and works to resolve the issue. You can check if the update failed in the Master Status field.
update_cancelled The master update is canceled because the cluster was not in a healthy state at the time of the update. Your master remains in this state until your cluster is healthy and you manually update the master. To update the master, use the ibmcloud ks cluster master update command. If you don't want to update the master to the default major.minor version during the update, include the --version option and specify the latest patch version that is available for the major.minor version that you want, such as 1.30. To list available versions, run ibmcloud ks versions.
update_failed The master update failed. IBM Support is notified and works to resolve the issue. You can continue to monitor the health of the master until the master reaches a normal state. If the master remains in this state for more than 1 day, open an IBM Cloud support case. IBM Support might identify other issues in your cluster that you must fix before the master can be updated.

Setting up IBM Cloud® Monitoring alerts

When you set up alerts, make sure to allow your cluster enough time to self-heal. Because Kubernetes has self healing capabilities, configure your alerts only for the issues that arise over time. By observing your cluster over time, you can learn which issues Kubernetes can resolve itself and which issues require alerts to avoid downtime.

On 15 June 2022, the naming convention for IBM Cloud® Monitoring alerts is change to a Prometheus compatible format. For more information, see the Sysdig release notes, Mapping Legacy Sysdig Kubernetes Metrics with Prometheus Metrics, and Mapping Classic Metrics with PromQL.

Depending on the size of your cluster, consider setting up alerts on the following levels:

Set up autorecovery on your worker nodes to enable your cluster to automatically resolve issues.

App alerts

Review the following app level metrics and alert thresholds for help setting up app monitoring in your cluster.

Common app level conditions to monitor include things such as,

  • Multiple app pods or containers are restarted within 10 minutes.
  • More than one replica of an app is not running.
  • More than ten 5XX HTTP response codes received within 10 minutes.
  • More than one pod in a namespace is in an unknown state.
  • More than five pods can't be scheduled on a worker node (pending state).

The underlying issues for these symptoms include things such as,

  • One or more worker node is in an unhealthy state.
  • Worker nodes ran out of CPU, memory, or disk space.
  • Maximum pod limit per cluster reached.
  • App itself has an issue.

To set up monitoring for these conditions, configure alerts based on the following IBM Cloud Monitoring metrics. Note that your alert thresholds might change depending on your cluster configuration.

App level metrics
Metric IBM Cloud Monitoring metric Alert threshold
Multiple restarts of a pod in a short amount of time. kubernetes_pod_restart_count Greater than 4 for the last 10 minutes
No running replicas in a ReplicaSet. kube_replicaset_status_replicas in kubernetes_deployment_name Less than one.
More than 5 pods pending in cluster. kube_pod_container_status_waiting Status equals pending greater than five.
No replicas in a deployment available. kubernetes_deployment_replicas_available Less than one.
Number of pods per node reaching threshold of 110. Count by (kube_cluster_name,kube_node_name)(kube_pod_container_info) Greater than or equal to 100. Note that this query is a promQL query.
Workloads that are in an unknown state. (kube_workload_status_unavailable) Greater than or equal to one. Note that this query is a promQL query.

Worker node alerts

Review the following thresholds and alerts for worker nodes.

Worker node metrics
Metric IBM Cloud Monitoring metric Alert threshold
CPU utilization of the worker node over threshold. cpu_used_percent Greater than 80% for 1 hour.
CPU utilization of the worker node over threshold. cpu_used_percent Greater than 65% for 24 hours.
Memory utilization of the worker node over threshold. memory_used_percent Greater than 80 % for 1 hour.
Memory utilization of the worker node over threshold. memory_used_percent Greater than 65% for 24 hours.
Amount of memory used over threshold. memory_bytes_used Greater than NUMBER_OF_BYBTES.
Nodes with disk pressure exist. kube_node_status_condition Greater than or equal to 1 for 10 minutes.
Kubernetes nodes not ready exist. kube_node_status_ready >= 1

Resolving worker node alerts

Reloading or rebooting the worker can resolve the issue. However, you might need add more workers to increase capacity.

  1. Get your worker nodes and review the state.

    kubectl get nodes
    
  2. If all the worker nodes are not in the Ready state, add worker nodes to your cluster.

  3. If all the worker nodes are in the Ready state, reload or reboot your worker nodes.

    1. Describe your worker node and review the Events section for common error messages.

      kubectl describe node <node>
      
    2. Cordon the node that isn't Ready so that you can start investigating.

    3. Drain the worker node. Review the Kubernetes documentation to safely drain pods from your worker node.

      kubectl drain <node>
      
    4. Reload or reboot your worker node.

Zone alerts

To set up zone level alerts, edit the sysdig-agent ConfigMap to include the required label filters.

  1. Edit the ConfigMap by running the following command.
    kubectl edit configmap sysdig-agent -n ibm-observe
    
  2. Add the following YAML block after k8s_cluster_name: <cluster_name>. Replace <cluster_name> with the name of the cluster that you want to you want to monitor.
    k8s_labels_filter:
      - include: "kubernetes.node.label.kubernetes.io/hostname"
      - include: "kubernetes.node.label.kubernetes.io/role"
      - include: "kubernetes.node.label.ibm-cloud.kubernetes.io/zone"
      - exclude: "*.kubernetes.io/*"
      - exclude: "*.pod-template-hash"
      - exclude: "*.pod-template-generation"
      - exclude: "*.controller-revision-hash"
      - include: "*"
    
  3. Restart the IBM Cloud Monitoring pods. Delete all the pods and wait for them to restart. Get the list of pods.
    kubectl get pods -n ibm-observe
    
  4. Delete the pods to restart them.
    kubectl delete pods sysdig-agent-1111 sysdig-agent-2222 sysdig-agent-3333 -n ibm-observe
    
  5. Wait 5 minutes for the pods restart. After the pods have restarted, the label that you added earlier is available in IBM Cloud Monitoring
  6. Verify that the labels now show by opening the IBM Cloud Monitoring dashboard > Explore > PromQL query.
  7. Enter kube_node_labels in the query field and click Run Query.
Zone level alerts
Metric PromQL query
CPU usage per zone over threshold sum(sysdig_container_cpu_used_percent{agent_tag_cluster="<cluster_name>"}) by (kube_node_label_ibm_cloud_kubernetes_io_zone) / sum (kube_node_info) by (kube_node_label_ibm_cloud_kubernetes_io_zone) > 80
Memory usage per zone over threshold sum(sysdig_container_cpu_used_percent{agent_tag_cluster="<cluster_name>"}) by (kube_node_label_ibm_cloud_kubernetes_io_zone)/ sum (kube_node_info) by (kube_node_label_ibm_cloud_kubernetes_io_zone) > 80

Cluster alerts

Review the following example thresholds for creating alerts at the cluster level.

  • All worker nodes in a region are reaching capacity threshold of 80%.
  • More than 50% of all worker nodes are in an unhealthy state.
  • Reaching maximum number of file and block storage volumes per account (250).
  • Reaching maximum number of worker nodes per cluster (500).

Account alerts

You might set up an alert for when the maximum number of clusters per account is reaching the limit. For example, 100 per region/infrastructure provider.

Block Storage for VPC alerts

The following metrics are available for Block Storage for VPC alerts:

  • kubelet_volume_stats_available_bytes
  • kubelet_volume_stats_capacity_bytes
  • kubelet_volume_stats_inodes
  • kubelet_volume_stats_inodes_free
  • kubelet_volume_stats_inodes_used
  1. Create a monitoring instance for Block Storage for VPC alerts. See instructions in Forwarding cluster and app metrics to IBM Cloud Monitoring.

  2. Install the syslog agent.

    1. In the IBM Cloud console, select Observability from the menu.
    2. Select Monitoring.
    3. In the row of the instance for Block Storage for VPC alerts, select Open dashboard.
    4. From the menu, select Get started.
    5. Under the Install the Agent section, select Add Sources.
    6. Follow the instructions to install the agent.
    7. Make sure the agent is running by using the kubectl get pods -n CLUSTER_NAME | grep syslog command.
  3. Configure the notification channels.

    1. In the IBM Cloud Monitoring dashboard, select Monitoring Operations > Settings.
    2. Select Notification Channels > Add Notification Channel and pick one of the available notification methods.
    3. Complete the settings for the new channel and click Save.
    4. Optional: Repeat the previous steps to add more channels.
  4. Create an alert.

    1. In the IBM Cloud Monitoring dashboard, select Alerts > Library.
    2. Choose one of the templates and select Enable Alert. For example, you can search for PVC storage and enable the PVC Storage Usage Is Reaching The Limit alert.
    3. Customize the alert settings on the template and select Enable Alert to apply your settings.

If your Block Storage for VPC volumes are reaching capacity, you can set up volume expansion.

Monitoring worker node health in with Autorecovery

The Autorecovery system uses various checks to query worker node health status. If Autorecovery detects an unhealthy worker node based on the configured checks, Autorecovery triggers a corrective action like rebooting a VPC worker node or reloading the operating system in a classic worker node. Only one worker node undergoes a corrective action at a time. The worker node must complete the corrective action before any other worker node undergoes a corrective action.

Autorecovery requires at least one healthy worker node to function properly. Configure Autorecovery with active checks only in clusters with two or more worker nodes.

Before you begin:

To configure Autorecovery:

  1. Follow the instructions to install the Helm version 3 client on your local machine.

  2. Create a configuration map file that defines your checks in JSON format. For example, the following YAML file defines three checks: an HTTP check and two Kubernetes API server checks. Refer to the component descriptions and the health check component table for information about the three kinds of checks and information about the individual components of the checks.

    Define each check as a unique key in the data section of the configuration map.

    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: ibm-worker-recovery-checks
      namespace: kube-system # The `kube-system` namespace is a constant and can't be changed.
    data:
      checknode.json: |
        {
          "Check":"KUBEAPI",
          "Resource":"NODE",
          "FailureThreshold":3,
          "CorrectiveAction":"RELOAD",
          "CooloffSeconds":1800,
          "IntervalSeconds":180,
          "TimeoutSeconds":10,
          "Enabled":true
        }
      checkpod.json: |
        {
          "Check":"KUBEAPI",
          "Resource":"POD",
          "PodFailureThresholdPercent":50,
          "FailureThreshold":3,
          "CorrectiveAction":"RELOAD",
          "CooloffSeconds":1800,
          "IntervalSeconds":180,
          "TimeoutSeconds":10,
          "Enabled":true
        }
      checkhttp.json: |
        {
          "Check":"HTTP",
          "FailureThreshold":3,
          "CorrectiveAction":"REBOOT",
          "CooloffSeconds":1800,
          "IntervalSeconds":180,
          "TimeoutSeconds":10,
          "Port":80,
          "ExpectedStatus":200,
          "Route":"/myhealth",
          "Enabled":false
        }
    
  3. Create the configuration map in your cluster.

    kubectl apply -f ibm-worker-recovery-checks.yaml
    
  4. Verify that you created the configuration map with the name ibm-worker-recovery-checks in the kube-system namespace with the proper checks.

    kubectl -n kube-system get cm ibm-worker-recovery-checks -o yaml
    
  5. Deploy Autorecovery into your cluster by installing the ibm-worker-recovery Helm chart.

    helm install ibm-worker-recovery iks-charts/ibm-worker-recovery --namespace kube-system
    
  6. After a few minutes, you can check the Events section in the output of the following command to see activity on the Autorecovery deployment.

    kubectl -n kube-system describe deployment ibm-worker-recovery
    
  7. If you don't see activity on the Autorecovery deployment, you can check the Helm deployment by running the tests that are in the Autorecovery chart definition.

    helm test ibm-worker-recovery -n kube-system
    

Understanding the configmap components

Review the following information on the individual components of health checks.

  • name: The configuration name ibm-worker-recovery-checks is a constant and can't be changed.
  • namespace: The kube-system namespace is a constant and can't be changed.
  • checknode.json: Defines a Kubernetes API node check that checks whether each worker node is in the Ready state. The check for a specific worker node counts as a failure if the worker node is not in the Ready state. The check in the example YAML runs every 3 minutes. If it fails three consecutive times, the worker node is reloaded. This action is equivalent to running ibmcloud ks worker reload. The node check is enabled until you set the Enabled field to false or remove the check. Note that reloading is supported only for worker nodes on classic infrastructure.
  • checkpod.json: Defines a Kubernetes API pod check that checks the total percentage of NotReady pods on a worker node based on the total pods that are assigned to that worker node. The check for a specific worker node counts as a failure if the total percentage of NotReady pods is greater than the defined PodFailureThresholdPercent. The check in the example YAML runs every 3 minutes. If it fails three consecutive times, the worker node is reloaded. This action is equivalent to running ibmcloud ks worker reload. For example, the default PodFailureThresholdPercent is 50%. If the percentage of NotReady pods is greater than 50% three consecutive times, the worker node is reloaded. The check runs on all namespaces by default. To restrict the check to only pods in a specified namespace, add the Namespace field to the check. The pod check is enabled until you set the Enabled field to false or remove the check. Note that reloading is supported only for worker nodes on classic infrastructure.
  • checkhttp.json: Defines an HTTP check that checks if an HTTP server that runs on your worker node is healthy. To use this check, you must deploy an HTTP server on every worker node in your cluster by using a daemon set. You must implement a health check that is available at the /myhealth path and that can verify whether your HTTP server is healthy. You can define other paths by changing the Route parameter. If the HTTP server is healthy, you must return the HTTP response code that is defined in ExpectedStatus. The HTTP server must be configured to listen on the private IP address of the worker node. You can find the private IP address by running kubectl get nodes. For example, consider two nodes in a cluster that have the private IP addresses 10.10.10.1 and 10.10.10.2. In this example, two routes are checked for a 200 HTTP response: http://10.10.10.1:80/myhealth and http://10.10.10.2:80/myhealth. The check in the example YAML runs every 3 minutes. If it fails three consecutive times, the worker node is rebooted. This action is equivalent to running ibmcloud ks worker reboot. The HTTP check is disabled until you set the Enabled field to true.

Understanding the individual components of health checks

Review the following table for information on the individual components of health checks.

Health check components
Component Description
Check Enter the type of check that you want Autorecovery to use.
HTTP: Autorecovery calls HTTP servers that run on each node to determine whether the nodes are running properly.
KUBEAPI: Autorecovery calls the Kubernetes API server and reads the health status data reported by the worker nodes.
Resource When the check type is KUBEAPI, enter the type of resource that you want Autorecovery to check. Accepted values are NODE or POD.
FailureThreshold Enter the threshold for the number of consecutive failed checks. When this threshold is met, Autorecovery triggers the specified corrective action. For example, if the value is 3 and Autorecovery fails a configured check three consecutive times, Autorecovery triggers the corrective action that is associated with the check.
PodFailureThresholdPercent When the resource type is POD, enter the threshold for the percentage of pods on a worker node that can be in a NotReady state. This percentage is based on the total number of pods that are scheduled to a worker node. When a check determines that the percentage of unhealthy pods is greater than the threshold, the check counts as one failure.
CorrectiveAction Enter the action to run when the failure threshold is met. A corrective action runs only while no other workers are being repaired and when this worker node is not in a cool-off period from a previous action.
REBOOT: Reboots the worker node.
RELOAD: Reloads all the necessary configurations for the worker node from a clean OS.
CooloffSeconds Enter the number of seconds Autorecovery must wait to issue another corrective action for a node that was already issued a corrective action. The cool off period starts at the time a corrective action is issued.
IntervalSeconds Enter the number of seconds between consecutive checks. For example, if the value is 180, Autorecovery runs the check on each node every 3 minutes.
TimeoutSeconds Enter the maximum number of seconds that a check call to the database takes before Autorecovery terminates the call operation. The value for TimeoutSeconds must be less than the value for IntervalSeconds.
Port When the check type is HTTP, enter the port that the HTTP server must bind to on the worker nodes. This port must be exposed on the IP of every worker node in the cluster. Autorecovery requires a constant port number across all nodes for checking servers. Use daemon sets when you deploy a custom server into a cluster.
ExpectedStatus When the check type is HTTP, enter the HTTP server status that you expect to be returned from the check. For example, a value of 200 indicates that you expect an OK response from the server.
Route When the check type is HTTP, enter the path that is requested from the HTTP server. This value is typically the metrics path for the server that runs on all the worker nodes.
Enabled Enter true to enable the check or false to disable the check.
Namespace Optional: To restrict checkpod.json to checking only pods in one namespace, add the Namespace field and enter the namespace.