Why can't I create or delete clusters or worker nodes?

You can't perform infrastructure-related commands on your cluster, such as:

Adding worker nodes in an existing cluster or when creating a new cluster.
Removing worker nodes.
Reloading or rebooting worker nodes.
Resizing worker pools.
Updating your cluster.
Deleting your cluster.

Review the error messages in the following sections to troubleshoot infrastructure-related issues that are caused by incorrect cluster permissions, orphaned clusters in other infrastructure accounts, or a time-based one-time passcode (TOTP) on the account.

Unable to create or delete clusters or worker nodes due to permission and credential errors

You can't manage worker nodes for your cluster, and you receive an error message that mentions permissions, credentials, SoftLayer, API keys, or role.

Review the information on permission and credential errors and follow the relevant steps.

Unable to create or delete worker nodes due to incorrect account error

Classic infrastructure

You can't manage worker nodes for your cluster or view the cluster worker nodes in your classic IBM Cloud infrastructure account. However, you can update and manage other clusters in the account.

Further, you verified that you have the proper infrastructure credentials.

You might receive an error message in your worker node status similar to the following example.

incorrect account for worker - The 'classic' infrastructure user credentials changed and no longer match the worker node instance infrastructure account.

The cluster might be provisioned in a classic IBM Cloud infrastructure account that is no longer linked to your Red Hat OpenShift on IBM Cloud account. The cluster is orphaned. Because the resources are in a different account, you don't have the infrastructure credentials to modify the resources.

Consider the following example scenario to understand how clusters might become orphaned.

You have an IBM Cloud Pay-As-You-Go account.
You create a cluster named Cluster1. The worker nodes and other infrastructure resources are provisioned into the infrastructure account that comes with your Pay-As-You-Go account.
Later, you find out that your team uses a legacy or shared classic IBM Cloud infrastructure account. You use the ibmcloud oc credential set command to change the IBM Cloud infrastructure credentials to use your team account.
You create another cluster named Cluster2. The worker nodes and other infrastructure resources are provisioned into the team infrastructure account.
You notice that Cluster1 needs a worker node update, a worker node reload, or you just want to clean it up by deleting it. However, because Cluster1 was provisioned into a different infrastructure account, you can't modify its infrastructure resources. Cluster1 is orphaned.
You follow the resolution steps in the following section, but don't set your infrastructure credentials back to your team account. You can delete Cluster1, but now Cluster2 is orphaned.
You change your infrastructure credentials back to the team account that created Cluster2. Now, you no longer have an orphaned cluster!

Follow the steps to review your infrastructure credentials and determine why you are seeing the credentials error.

Log in to the console.
Access your Red Hat OpenShift cluster..
Check which infrastructure account the region that your cluster is in currently uses to provision clusters. Replace REGION with the IBM Cloud region that the cluster is in.
```
ibmcloud oc credential get --region REGION
```
If you see a message similar to the following, then the account uses the default, linked infrastructure account.
```
No credentials set for resource group <resource group>.: The user credentials could not be found.
```
Check which infrastructure account was used to provision the cluster.
1. In the Worker Nodes tab, select a worker node and note its ID.
2. Open the menu and click Infrastructure > Classic Infrastructure.
3. From the infrastructure navigation pane, click Devices > Device List.
4. Search for the worker node ID that you previously noted.
5. If you don't find the worker node ID, the worker node is not provisioned into this infrastructure account. Switch to a different infrastructure account and try again.
Compare the infrastructure accounts.
- If the worker nodes are in the linked infrastructure account: Use the ibmcloud oc credential unset command to resume using the default infrastructure credentials that are linked with your Pay-As-You-Go account.
- If the worker nodes are in a different infrastructure account: Use the ibmcloud oc credential set command to change your infrastructure credentials to the account that the cluster worker nodes are provisioned in, which you found in the previous step.
  
  If you no longer have access to the infrastructure credentials, you can open an IBM Cloud support case to determine an email address for the administrator of the other infrastructure account. However, IBM Cloud Support can't remove the orphaned cluster for you, and you must contact the administrator of the other account to get the infrastructure credentials.
- If the infrastructure accounts match: Check the rest of the worker nodes in the cluster and see if any are assigned to different infrastructure account. Make sure that you checked the worker nodes in the cluster that have the credentials issue. Review other common infrastructure credential issues.
Now that the infrastructure credentials are updated, retry the blocked action, such as updating or deleting a worker node, and verify that the action succeeds.
If you have other clusters in the same region and resource that require the previous infrastructure credentials, repeat Step 3 to reset the infrastructure credentials to the previous account. Note that if you created clusters with a different infrastructure account than the account that you switch to, you might orphan those clusters.

Tired of switching infrastructure accounts each time you need to perform a cluster or worker action? Consider re-creating all the clusters in the region and resource group in the same infrastructure account. Then, migrate your workloads and remove the old clusters from the different infrastructure account.

Unable to create or delete worker nodes due to endpoints error

You can't manage worker nodes for your cluster, and you receive an error message similar to one of the following.

Worker deploy failed due to network communications failing to master or registry endpoints. Please verify your network setup is allowing traffic from this subnet then attempt a worker replace on this worker

Pending endpoint gateway creation

Worker nodes can communicate with the Kubernetes master through the cluster's virtual private endpoint (VPE).

One VPE gateway resource is created per cluster in your VPC. If the VPE gateway for your cluster is not correctly created in your VPC, the VPE gateway is deleted from your VPC, or the IP address that is reserved for the VPE is deleted from your VPC subnet, worker nodes lose connectivity with the Kubernetes master.

Re-establish the VPE connection between your worker nodes and Kubernetes master.

To check the VPE gateway for your cluster in the VPC infrastructure console, open the Virtual private endpoint gateways for VPC dashboard and look for the VPE gateway in the format iks-<cluster_ID>.
- If the gateway for your cluster is not listed, continue to the next step.
- If the gateway for your cluster is listed but its status is not Stable, open a support case. In the case details, include the cluster ID.
- If the gateway for your cluster is listed and its status is Stable, you might have firewall or security group rules that are blocking worker node communication to the cluster master. Configure your security group rules to allow outgoing traffic to the appropriate ports and IP addresses.
Refresh the cluster master. If the VPE gateway does not exist in your VPC, it is created, and connectivity to the reserved IP addresses on the subnets that your worker nodes are connected to is re-established. After you refresh the cluster, wait a few minutes to allow the operation to complete.
```
ibmcloud oc cluster master refresh -c <cluster_name_or_ID>
```
Verify that the VPE gateway for your cluster is created by opening the Virtual private endpoint gateways for VPC dashboard and looking for the VPE gateway in the format iks-<cluster_ID>.
If you still can't manage worker nodes after the cluster master is refreshed, replace the worker nodes that you can't access.
1. List all worker nodes in your cluster and note the name of the worker node that you want to replace.
```
oc get nodes
```
  The name that is returned in this command is the private IP address that is assigned to your worker node. You can find more information about your worker node when you run the ibmcloud oc worker ls --cluster <cluster_name_or_ID> command and look for the worker node with the same Private IP address.
2. Replace the worker node. As part of the replace process, the pods that run on the worker node are drained and rescheduled onto remaining worker nodes in the cluster. The worker node is also cordoned, or marked as unavailable for future pod scheduling. Use the worker node ID that is returned from the ibmcloud oc worker ls --cluster <cluster_name_or_ID> command.
```
ibmcloud oc worker replace --cluster <cluster_name_or_ID> --worker <worker_node_ID>
```
3. Verify that the worker node is replaced.
```
ibmcloud oc worker ls --cluster <cluster_name_or_ID>
```

Unable to create or delete worker nodes due to paid account or one time password error

Classic infrastructure

You can't manage worker nodes for your cluster, and you receive an error message similar to one of the following examples.

Unable to connect to the IBM Cloud account. Ensure that you have a paid account.

can't authenticate the infrastructure user: Time-based One Time Password authentication is required to log in with this user.

Your IBM Cloud account uses its own automatically linked infrastructure through a Pay-as-you-Go account.

However, the account administrator enabled the time-based one-time passcode (TOTP) option so that users are prompted for a time-based one-time passcode (TOTP) at login. This type of multifactor authentication (MFA) is account-based, and affects all access to the account. TOTP MFA also affects the access that IBM Cloud Kubernetes Service requires to make calls to IBM Cloud infrastructure. If TOTP is enabled for the account, you can't create and manage clusters and worker nodes in IBM Cloud Kubernetes Service.

The IBM Cloud account owner or an account administrator must take one of the following actions.

Disable TOTP for the account, and continue to use the automatically linked infrastructure credentials for IBM Cloud Kubernetes Service.
Continue to use TOTP, but create an infrastructure API key that IBM Cloud Kubernetes Service can use to make direct calls to the IBM Cloud infrastructure API.

Disabling TOTP MFA for the account

Log in to the IBM Cloud console. From the menu bar, select Manage > Access (IAM).
Click the Settings page.
Under Multifactor authentication, click Edit.
Select None, and click Update.

Using TOTP MFA to create an infrastructure API key for IBM Cloud Kubernetes Service

From the IBM Cloud console, select Manage > Access (IAM) > Users and click the name of the account owner. Note: If you don't use the account owner's credentials, ensure that the user whose credentials you use has the correct permissions.
In the API Keys section, find or create a classic infrastructure API key.

Use the infrastructure API key to set the infrastructure API credentials for IBM Cloud Kubernetes Service. Repeat this command for each region where you create clusters.

ibmcloud oc credential set classic --infrastructure-username <infrastructure_API_username> --infrastructure-api-key <infrastructure_API_authentication_key> --region <region>

Verify that the correct credentials are set.

ibmcloud oc credential get --region <region>

Example output

Infrastructure credentials for user name user@email.com set for resource group default.

To ensure that existing clusters use the updated infrastructure API credentials, run ibmcloud oc api-key reset --region <region> in each region where you have clusters.