IBM Cloud Docs
Why do pods repeatedly fail to restart or are unexpectedly removed?

Why do pods repeatedly fail to restart or are unexpectedly removed?

Virtual Private Cloud Classic infrastructure

Your pod was healthy but unexpectedly gets removed or gets stuck in a restart loop.

Your containers might exceed their resource limits, or your pods might be replaced by higher priority pods.

See the following sections:

Fixing container resource limits

  1. Get the name of your pod. If you used a label, you can include it to filter your results.
    oc get pods --selector='app=wasliberty'
    
  2. Describe the pod and look for the Restart Count.
    oc describe pod
    
  3. If the pod restarted many times in a short period of time, fetch its status.
    oc get pod <pod_name> -n <namespace> -o go-template='{{range.status.containerStatuses}}{{"Container Name: "}}{{.name}}{{"\r\nLastState: "}}{{.lastState}}{{end}}'
    
  4. Review the reason. For example, OOM Killed means out of memory indicating that the container is crashing because of a resource limit.
  5. Add capacity to your cluster such as by resizing worker pools so that the resources can be fulfilled. For more information, see Resize your Classic worker pool or Resize your VPC worker pool.

Fixing pod replacement by higher priority pods

To see if your pod is being replaced by higher priority pods:

  1. Get the name of your pod.

    oc get pods
    
  2. Describe your pod YAML.

    oc get pod <pod_name> -o yaml
    
  3. Check the priorityClassName field.

    1. If there is no priorityClassName field value, then your pod has the globalDefault priority class. If your cluster admin did not set a globalDefault priority class, then the default is zero (0), or the lowest priority. Any pod with a higher priority class can preempt, or remove, your pod.

    2. If there is a priorityClassName field value, get the priority class.

      oc get priorityclass <priority_class_name> -o yaml
      
    3. Note the value field to check your pod's priority.

  4. List existing priority classes in the cluster.

    oc get priorityclasses
    
  5. For each priority class, get the YAML file and note the value field.

    oc get priorityclass <priority_class_name> -o yaml
    
  6. Compare your pod's priority class value with the other priority class values to see if it is higher or lower in priority.

  7. Repeat steps 1 to 3 for other pods in the cluster, to check what priority class they are using. If those other pods' priority class is higher than your pod, your pod is not provisioned unless there is enough resources for your pod and every pod with higher priority.

  8. Contact your cluster admin to add more capacity to your cluster and confirm that the correct priority classes are assigned.