IBM Cloud Docs
Why don't my pods restart after my workers are down when I use Container Registry?

Why don't my pods restart after my workers are down when I use Container Registry?

When you are using IBM Cloud® Container Registry, the PodsA group of containers that are running on a Kubernetes cluster. A pod is a runnable unit of work, which can be a either a stand-alone application or a microservice. do not restart after your cluster workers are down.

Portieris is deployed. The cluster workers are showing as working correctly, but nothing is scheduled.

By default, Portieris adds a fail closed admission webhook. If all Portieris pods are down, the pods are not available to approve their own recovery.

To recover the cluster when it's in this state, you must change the webhook configuration to make it fail open instead of closed.

You must have sufficient role-based access control (RBAC) privileges to use the GET and PATCH verbs on the following resources:

  • admissionregistration.k8s.io/v1/MutatingWebhookConfiguration
  • admissionregistration.k8s.io/v1/ValidatingWebhookConfiguration

For more information about RBAC, see Understanding RBAC permissions, Creating custom RBAC permissions for users, groups, or service accounts, and Kubernetes - Using RBAC Authorization.

To change the webhook configuration so that it fails open, and, when at least one Portieris pod is running, restore the webhook configuration so that it fails closed, complete the following steps:

  1. Update MutatingWebhookConfiguration by running the following command.

    kubectl edit MutatingWebhookConfiguration image-admission-config
    

    Change failurePolicy to Ignore, save, and close.

  2. Update ValidatingWebhookConfiguration by running the following command.

    kubectl edit ValidatingWebhookConfiguration image-admission-config
    

    Change failurePolicy to Ignore, save, and close.

  3. Wait for some Portieris pods to start. If you want to check when the pods start, run the following command until you see the STATUS column for at least one pod is displaying Running:

    kubectl get po -n ibm-system -l app=ibmcloud-image-enforcement
    
  4. When at least one Portieris pod is running, update MutatingWebhookConfiguration by running the following command.

    kubectl edit MutatingWebhookConfiguration image-admission-config
    

    Change failurePolicy to Fail, save, and close.

  5. Update ValidatingWebhookConfiguration by running the following command.

    kubectl edit ValidatingWebhookConfiguration image-admission-config
    

    Change failurePolicy to Fail, save, and close.