Why does my worker node enter a critical state after reload with VNIs attached?
Virtual Private Cloud 4.20 and later Bare metal worker nodes only RHCOS only OVN-Kubernetes CNI required
When you reload a bare metal worker node that has Virtual Network Interfaces (VNIs) attached, the worker node enters a critical state and shows Not Ready status.
When you check the worker node, you see the following issues:
- The
ovnkube-nodepod is inCrashLoopBackOffstate - The
ovnkube-controllerlogs show an error similar to:admission webhook "node.network-node-identity.openshift.io" denied the request: user: "system:ovn-node:..." is not allowed to set k8s.ovn.org/node-chassis-id on node - Network policies are not properly reconciled
- The Kubernetes API is unavailable from pods on the affected node
Starting with OpenShift 4.20, OVN enforces immutability on the k8s.ovn.org/node-chassis-id annotation. During a bare metal worker node reload operation, the node's chassis ID changes (stored in /etc/openvswitch/system-id.conf),
but the old OVN annotations remain on the node object. When the ovnkube-controller pod attempts to set the new chassis ID, it is denied by the admission webhook because the annotation cannot be changed once set.
To resolve this issue, manually remove the stale OVN annotations from the worker node after it is powered off during the reload operation.
-
Initiate the worker node reload.
ibmcloud ks worker reload --cluster <cluster_name_or_ID> --worker <worker_node_ID> -
Wait for the worker node to power off and enter the
Reloadingstate. You can monitor the reload process by checking the worker node status.ibmcloud ks worker get --cluster <cluster_name_or_ID> --worker <worker_node_ID> -
After the worker node is powered off (or if the
ovnkube-nodepod is already crashing), remove the stale OVN annotations from the node. Replace<NODE_NAME>with your worker node name.NODE="<NODE_NAME>"; oc annotate node "$NODE" \ k8s.ovn.org/host-cidrs- \ k8s.ovn.org/l3-gateway-config- \ k8s.ovn.org/node-chassis-id- \ k8s.ovn.org/node-encap-ips- \ k8s.ovn.org/node-id- \ k8s.ovn.org/node-masquerade-subnet- \ k8s.ovn.org/node-primary-ifaddr- \ k8s.ovn.org/node-subnets- \ k8s.ovn.org/node-transit-switch-port-ifaddr- \ k8s.ovn.org/zone-name- \ --overwriteThe timing is flexible - you can run this command after the node is powered off during the reload, or when the
ovnkube-nodepod is already crashing. You don't need to worry about precise timing beyond waiting for the shutdown to begin. -
Wait for the reload to complete. The worker node should return to a
Normalstate withReadystatus.ibmcloud ks worker get --cluster <cluster_name_or_ID> --worker <worker_node_ID> -
Verify that the
ovnkube-nodepod is running successfully on the reloaded worker node.oc get pods -n openshift-ovn-kubernetes -o wide | grep <NODE_NAME> -
Verify that the worker node is ready and network connectivity is restored.
oc get node <NODE_NAME>
For more information about managing VNIs with OpenShift Virtualization, see Managing virtual network interfaces for OpenShift Virtualization.