IBM Cloud Docs
Reviewing master health

Reviewing master health

Virtual Private Cloud Classic infrastructure

Review your cluster master health.

Reviewing master health, status, and states

Your Red Hat OpenShift on IBM Cloud cluster includes an IBM-managed master with highly available replicas, automatic security patch updates applied for you, and automation in place to recover in case of an incident. You can check the health, status, and state of the cluster master by running ibmcloud oc cluster get --cluster <cluster_name_or_ID>.

The Master Health reflects the state of master components and notifies you if something needs your attention. The health might be one of the following states.

  • error: The master is not operational. IBM is automatically notified and takes action to resolve this issue. You can continue monitoring the health until the master is normal. You can also open an IBM Cloud support case.
  • normal: The master is operational and healthy. No action is required.
  • unavailable: The master might not be accessible, which means some actions such as resizing a worker pool are temporarily unavailable. IBM is automatically notified and takes action to resolve this issue. You can continue monitoring the health until the master is normal.
  • unsupported: The master runs an unsupported version of Kubernetes. You must update your cluster to return the master to normal health.

The Master Status provides details of what operation from the master state is in progress. The status includes a timestamp of how long the master has been in the same state, such as Ready (1 month ago). The Master State reflects the lifecycle of possible operations that can be performed on the master, such as deploying, updating, and deleting. Each state is described in the following table.

Master states
Master state Description
deployed The master is successfully deployed. Check the status to verify that the master is Ready or to see if an update is available.
deploying The master is currently deploying. Wait for the state to become deployed before working with your cluster, such as adding worker nodes.
deploy_failed The master failed to deploy. IBM Support is notified and works to resolve the issue. Check the Master Status field for more information, or wait for the state to become deployed.
deleting The master is currently deleting because you deleted the cluster. You can't undo a deletion. After the cluster is deleted, you can no longer check the master state because the cluster is completely removed.
delete_failed The master failed to delete. IBM Support is notified and works to resolve the issue. You can't resolve the issue by trying to delete the cluster again. Instead, check the Master Status field for more information, or wait for the cluster to delete. You can also open an IBM Cloud support case.
scaled_down The master resources have been scaled down to zero replicas. This is a temporary state that occurs while etcd is being restored after a backup. You cannot interact with your cluster while it is in this state. Wait for the etcd restoration to complete and the master state to return to deployed.
updating The master is updating its Kubernetes version. The update might be a patch update that is automatically applied, or a minor or major version that you applied by updating the cluster. During the update, your highly available master can continue processing requests, and your app workloads and worker nodes continue to run. After the master update is complete, you can update your worker nodes. If the update is unsuccessful, the master returns to a deployed state and continues running the previous version. IBM Support is notified and works to resolve the issue. You can check if the update failed in the Master Status field.
update_cancelled The master update is canceled because the cluster was not in a healthy state at the time of the update. Your master remains in this state until your cluster is healthy and you manually update the master. To update the master, use the ibmcloud oc cluster master update command. If you don't want to update the master to the default major.minor version during the update, include the --version option and specify the latest patch version that is available for the major.minor version that you want, such as 1.29. To list available versions, run ibmcloud oc versions.
update_failed The master update failed. IBM Support is notified and works to resolve the issue. You can continue to monitor the health of the master until the master reaches a normal state. If the master remains in this state for more than 1 day, open an IBM Cloud support case. IBM Support might identify other issues in your cluster that you must fix before the master can be updated.

Understanding the impact of a master outage

The Red Hat OpenShift master is the main component that keeps your cluster up and running. The master stores cluster resources and their configurations in the etcd database that serves as the single point of truth for your cluster. The Red Hat OpenShift API server is the main entry point for all cluster management requests from the worker nodes to the master, or when you want to interact with your cluster resources.

If a master failure occurs, your workloads continue to run on the worker nodes, but you can't use oc commands to work with your cluster resources or view the cluster health until the Red Hat OpenShift API server in the master is back up. If a pod goes down during the master outage, the pod can't be rescheduled until the worker node can reach the Red Hat OpenShift API server again.

During a master outage, you can still run ibmcloud oc commands against the IBM Cloud Kubernetes Service API to work with your infrastructure resources, such as worker nodes or VLANs. If you change the current cluster configuration by adding or removing worker nodes to the cluster, your changes don't happen until the master is back up.

Do not restart or reboot a worker node during a master outage. This action removes the pods from your worker node. Because the Kubernetes API server is unavailable, the pods can't be rescheduled onto other worker nodes in the cluster.