Setting up horizontal pod autoscaling on GPU worker nodes
Review the following steps to enable horizontal pod autoscaling on your GPU worker nodes.
- Why horizontal pod autoscaling?
- You might want to configure horizontal pod autoscaling to scale the number of pods when the workload consumes more or less than a certain amount of GPU. Because GPUs are an expensive resource you might not want workloads to be run at the fullest capacity long periods of time. Instead, you can scale pods or up down based on the running workload in the cluster.
Prerequisites
To configure HPA, the following components must be installed on your cluster.
- NVIDIA Data Center GPU Manager (DCGM) exporter to gather GPU metrics in Kubernetes. The DCGM exporter exposes GPU metrics for Prometheus which can be visualized using Grafana.
- Prometheus and the Prometheus adapter to generate custom metrics.
-
Install Prometheus.
helm install prom-stack prometheus-community/kube-prometheus-stack -f ~/ca-prom-val.yamlcat ~/ca-prom-val.yamlprometheus: prometheusSpec: additionalScrapeConfigs: - job_name: gpu-metrics scrape_interval: 1s metrics_path: /metrics scheme: http kubernetes_sd_configs: - role: endpoints namespaces: names: - nvidia-gpu-operator relabel_configs: - source_labels: [__meta_kubernetes_endpoints_name] action: drop regex: .*-node-feature-discovery-master - source_labels: [__meta_kubernetes_pod_node_name] action: replace target_label: kubernetes_node -
Get the Prometheus service details.
oc get svc -
Install the Prometheus adapter.
helm upgrade --install prometheus-adapter prometheus-community/prometheus-adapter --set prometheus.url="http://prom-stack-kube-prometheus-prometheus.default.svc.cluster.local"
Setting up HPA
Complete the following steps to create a deployment that uses HPA.
-
Create a deployment.
apiVersion: apps/v1 kind: Deployment metadata: name: cuda-test labels: app: cuda-test spec: selector: matchLabels: app: cuda-test template: metadata: labels: app: cuda-test spec: containers: - name: cuda-test-main image: "k8s.gcr.io/cuda-vector-add:v0.1" command: ["bash", "-c", "for (( c=1; c<=5000; c++ )); do ./vectorAdd; done"] resources: limits: nvidia.com/gpu: 1 -
Create a
HorizontalPodAutoscalerresource.kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2 metadata: name: cuda-hpa namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: cuda-test minReplicas: 1 maxReplicas: 3 metrics: - type: Pods pods: metric: name: DCGM_FI_DEV_GPU_UTIL #the metric you want to use for autoscaling target: type: AverageValue averageValue: '5' -
Run the following commands to review the results.
oc get pods | grep cudacuda-test-d987464bf-brd48 1/1 Running 0 4m19s cuda-test-d987464bf-gsx82 0/1 Pending 0 4m19s cuda-test-d987464bf-zstzs 1/1 Running 0 7m35sThere was 1 replica that was scaled up to 3 as the workload resource increased.
Min replicas: 1 Max replicas: 3 Deployment pods: 3 current / 3 desired Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 50s horizontal-pod-autoscaler New size: 3; reason: pods metric DCGM_FI_DEV_GPU_UTIL above target