Preparing your cluster for autoscaling

Virtual Private Cloud Classic infrastructure

With the cluster-autoscaler add-on, you can scale the worker pools in your IBM Cloud® Kubernetes Service classic or VPC cluster automatically to increase or decrease the number of worker nodes in the worker pool based on the sizing needs of your scheduled workloads. The cluster-autoscaler add-on is based on the Kubernetes Cluster-Autoscaler project. For a list of supported add-on versions by cluster version, see Supported cluster add-on versions.

You can't enable the cluster autoscaler on worker pools that use reservations.

Understanding autoscaling

The cluster autoscaler periodically scans the cluster to adjust the number of worker nodes within the worker pools that it manages in response to your workload resource requests and any custom settings that you configure, such as scanning intervals.
Every minute, the cluster autoscaler checks for the following situations.
- Pending pods to scale up: A pod is considered pending when insufficient compute resources exist to schedule the pod on a worker node. When the cluster autoscaler detects pending pods, the autoscaler scales up your worker nodes evenly across zones to meet the workload resource requests.
- Underutilized worker nodes to scale down: By default, worker nodes that run with less than 50% of the total compute resources that are requested for 10 minutes or more and that can reschedule their workloads onto other worker nodes are considered underutilized. If the cluster autoscaler detects underutilized worker nodes, it scales down your worker nodes one at a time so that you have only the compute resources that you need. If you want, you can customize the default scale-down utilization threshold of 50% for 10 minutes.
Scanning and scaling up and down happens at regular intervals over time, and depending on the number of worker nodes might take a longer period of time to complete, such as 30 minutes.
The cluster autoscaler adjusts the number of worker nodes by considering the resource requests that you define for your deployments, not actual worker node usage. If your pods and deployments don't request appropriate amounts of resources, you must adjust their configuration files.
The cluster autoscaler can't adjust them for you. Also, keep in mind that worker nodes use some compute resources for basic cluster functionality, default and custom add-ons, and resource reserves.
In general, the cluster autoscaler calculates the number of worker nodes that your cluster needs to run its workload. Scaling the cluster up or down depends on many factors, including the following.
- The minimum and maximum worker node size per zone that you set.
- Your pending pod resource requests and certain metadata that you associate with the workload, such as anti-affinity, labels to place pods only on certain flavors, or pod disruption budgets.
- The worker pools that the cluster autoscaler manages, potentially across zones in a multizone cluster.
Earlier versions of the cluster autoscaler relied only on existing worker nodes for scheduling simulations. For example, if a worker pool scaled down to 0 nodes, the autoscaler had no information about that pool’s capacity or labels, which meant that it could not scale the worker pool back up. As a result, scale-to-zero was not supported.
Beginning with version 2.0.0, the autoscaler creates a template node for every worker pool. This template is used to provide a model of the allocatable CPU, memory, labels, and taints of a new node in the pool.
Additionally, in version 2.0.0, two new optional settings are available in the iks-ca-configmap in the kube-system namespace: OSReservedMemoryGi and OSReservedCPUMili.
- These values represent the amount of CPU and memory that the operating system reserves on each worker node.
- The kernel utilization can't be adjusted using these values, as those are already defined.
- The autoscaler subtracts these values from the node’s capacity when computing allocatable resources for scheduling simulations.
- By default, the autoscaler uses the recommended OS-reserved values, but users can override them to adjust how much capacity can be scheduled.

For more information, see the Kubernetes Cluster Autoscaler FAQ for How does scale-up work? and How does scale-down work?.

What are the best practices for autoscaling?

Make the most out of the cluster autoscaler by using the following strategies for your worker node and workload deployment strategies. For more information, see the Kubernetes Cluster Autoscaler FAQ.
Try out the cluster autoscaler with a few test workloads to get a good feel for how scale-up and scale-down work, you might want to configure, and any other aspects that you might want, like overprovisioning worker nodes or limiting apps.
Then, clean up your test environment and plan to include these custom values and additional settings with a fresh installation of the cluster autoscaler.

Can I change how scale-up and scale-down work?

Yes, you can customize settings or use other Kubernetes resources to affect how scaling up and down work.

For scale up, you can customize the cluster autoscaler ConfigMap values such as scanInterval, expander, skipNodes, or maxNodeProvisionTime. Review ways to overprovision worker nodes so that you can scale up worker nodes before a worker pool runs out of resources. You can also set up Kubernetes pod budget disruptions and pod priority cutoffs to affect how scaling up works.
For scale down, customize the cluster autoscaler ConfigMap values such as scaleDownUnneededTime, scaleDownDelayAfterAdd, scaleDownDelayAfterDelete, or scaleDownUtilizationThreshold.

Can I increase the minimum size per zone to trigger a scale up my cluster to that size?

No, setting a minSize does not automatically trigger a scale-up. The minSize is a threshold so that the cluster autoscaler does not scale to fewer than a certain number of worker nodes per zone.

If your cluster does not yet have that number per zone, the cluster autoscaler does not scale up until you have workload resource requests that require more resources. For example, if you have a worker pool with one worker node per three zones (three total worker nodes) and set the minSize to 4 per zone, the cluster autoscaler does not immediately provision an additional three worker nodes per zone (12 worker nodes total). Instead, the scale-up is triggered by resource requests.
If you create a workload that requests the resources of 15 worker nodes, the cluster autoscaler scales up the worker pool to meet this request. Now, the minSize means that the cluster autoscaler does not scale down to fewer than four worker nodes per zone even if you remove the workload that requests the amount.

How is this behavior different from worker pools that are not managed by the cluster autoscaler?

When you create a worker pool, you specify how many worker nodes per zone it has. The worker pool maintains that number of worker nodes until you resize or rebalance it. The worker pool does not add or remove worker nodes for you. If you have more pods than can be scheduled, the pods remain in pending state until you resize the worker pool. When you enable the cluster autoscaler for a worker pool, worker nodes are scaled up or down in response to your pod spec settings and resource requests. You don't need to resize or rebalance the worker pool manually.

How does GPU autoscaling work?

Autoscaling GPU worker nodes is supported only with cluster autoscaler version 1.2.4 and later and only with NVIDIA GPUs worker node flavors. Scale up happens when a pod goes into pending state due to GPU resource crunch. The autoscaler then scales up the cluster by adding more nodes. Scale down happens when the utilization goes below the configured scaleDownGPUUtilizationThreshold. When this happens, the node is considered for scale down.

Can I autoscale multiple worker pools at once?

Yes, after you install the cluster autoscaler, you can choose which worker pools within the cluster to autoscale in the ConfigMap. You can run only one autoscaler per cluster. Create and enable autoscaling on worker pools other than the default worker pool, because the default worker pool has system components that can prevent automatically scaling down.

How can I make sure that the cluster autoscaler responds to what resources my app needs?

The cluster autoscaler scales your cluster in response to your workload resource requests. As such, specify resource requests for all your deployments because the resource requests are what the cluster autoscaler uses to calculate how many worker nodes are needed to run the workload. Keep in mind that autoscaling is based on the compute usage that your workload configurations request, and does not consider other factors such as machine costs.

Can I scale down a worker pool to zero (0) nodes?

Yes! Starting with Cluster Autoscaler add-on version 2.0.0, you can scale specific worker pools down to zero nodes.

Why is this useful?

Scaling to zero helps save costs when no workloads are running. The autoscaler automatically brings nodes back when needed.

How does it work?

When there are no pods to run, the autoscaler can reduce the worker pool to 0 nodes. If new pods need resources, the autoscaler automatically scales the pool back up.

What do you need to do?

Check your add-on version and ensure that you are using v2.0.0 or later.

Set minSize = 0 for the worker pool in your autoscaler configuration. If you have public ALBs enabled, set minSize = 2 per zone for high availability.

What is the cluster quorum requirement?

Note that the entire cluster cannot scale down to zero. A minimum number of nodes must remain active to keep the cluster healthy and maintain etcd quorum. If this quorum is satisfied, you can scale down other worker pools to zero.

Can I optimize my deployments for autoscaling?

Yes, you can add several Kubernetes features to your deployment to adjust how the cluster autoscaler considers your resource requests for scaling.

Taint your worker pool to allow only the deployments or pods with the matching toleration to be deployed to your worker pool.
Add a label to your worker pool other than the default worker pool. This label is used in your deployment configuration to specify nodeAffinity or nodeSelector which limits the workloads that can be deployed on the worker nodes in the labeled worker pool.
Use pod disruption budgets to prevent abrupt rescheduling or deletions of your pods.
If you're using pod priority, you can edit the priority cutoff to change what types of priority trigger scaling up. By default, the priority cutoff is zero (0).

Can I use taints and tolerations with autoscaled worker pools?

Yes, but make sure to apply taints at the worker pool level so that all existing and future worker nodes get the same taint. Then, you must include a matching toleration in your workload configuration so that these workloads are scheduled onto your autoscaled worker pool with the matching taint. Keep in mind that if you deploy a workload that is not tolerated by the tainted worker pool, the worker nodes are not considered for scale-up and more worker nodes might be ordered even if the cluster has sufficient capacity. However, the tainted worker pool is still identified as underutilized if they have less than the threshold (by default 50%) of their resources utilized and thus are considered for scale-down.

Preparing clusters for autoscaling

Before you install the IBM Cloud cluster autoscaler add-on, you can set up your cluster to prepare the cluster for autoscaling.

The cluster autoscaler add-on is not supported for baremetal worker nodes.

Before you begin, Install the required CLI and plug-ins.
- IBM Cloud CLI (ibmcloud)
- IBM Cloud Kubernetes Service plug-in (ibmcloud ks)
- IBM Cloud Container Registry plug-in (ibmcloud cr)
- Kubernetes (kubectl)
Create a standard cluster.
Log in to your account. If applicable, target the appropriate resource group. Set the context for your cluster.
Confirm that your IBM Cloud Identity and Access Management credentials are stored in the cluster. The cluster autoscaler uses this secret to authenticate credentials. If the secret is missing, create it by resetting credentials.
```
kubectl get secrets -n kube-system | grep storage-secret-store
```
Plan to autoscale a worker pool other than the default worker pool, because the default worker pool has system components that can prevent automatically scaling down. Include a label for the worker pool so that you can set node affinity for the workloads that you want to deploy to the worker pool that has autoscaling enabled. For example, your label might be app: nginx. Choose from the following options:
- Create a VPC or classic worker pool other than the default worker pool with the label that you want to use with the workloads to run on the autoscaled worker pool.
- Add the label to an existing worker pool other than the default worker pool.
Confirm that your worker pool has the necessary labels for autoscaling. In the output, you see the required ibm-cloud.kubernetes.io/worker-pool-id label and the label that you previously created for node affinity. If you don't see these labels, add a worker pool, then add your label for node affinity.
```
ibmcloud ks worker-pool get --cluster <cluster_name_or_ID> --worker-pool <worker_pool_name_or_ID> | grep Labels
```
Example output of a worker pool with the label.
```
Labels:             ibm-cloud.kubernetes.io/worker-pool-id=a1aa111111b22b22cc3c3cc444444d44-4d555e5
```
Taint the worker pools that you want to autoscale so that the worker pool does not accept workloads except the ones that you want to run on the autoscaled worker pool. You can learn more about taints and tolerations in the community Kubernetes documentation. As an example, you might set a taint of use=autoscale:NoExecute. In this example, the NoExecute taint evicts pods that don't have the toleration corresponding to this taint.

Next steps

After preparing your cluster, Install the cluster autoscaler add-on.