Managing GPUs and accelerators for VPC
The GPU-enabled family of profiles provides on demand, cost-effective access to GPUs and accelerators. GPUs and accelerators help to accelerate the processing time that is required for compute intensive workloads such as AI, machine learning, inferencing and more. To use the GPUs and accelerators, make sure that you install the appropriate driver and associated toolkit for your workloads.
Configuring a virtual server instance with an NVIDIA GPU
-
Provision a virtual server instance by choosing an NVIDIA GPU profile in the Profile field. Stock and custom operating system images are supported.
-
Install the NVIDIA GPU driver for your virtual server instance's image and GPU profile. The following table describes minimum driver and CUDA software version levels for Linux and Windows operating systems. For more information, see NVIDIA's Download drivers page. For an overview of drivers for NVIDIA data center products, see NVIDIA Data Center Drivers.
NVIDIA drivers and CUDA version for Linux
GPUs and minimum NVIDIA drivers and CUDA versionsGPU NVIDIA driver CUDA version A100 550 12.4 L4 550 12.4 L40s 550 12.4 V100 535 12.2 H100 550 12.4 H200 570 12.8 B300 Select availability 590 13.1 NVIDIA drivers and CUDA version for Windows 2019, 2022, 2025
GPUs and minimum NVIDIA drivers and CUDA versionsGPU NVIDIA driver CUDA version A100 538 12.2 L4 538 12.2 L40s 538 12.2 V100 535 12.2 H100 N/A N/A H200 N/A N/A B300 N/A N/A NVIDIA drivers and CUDA version for Windows 2016
GPUs and minimum NVIDIA drivers and CUDA versionsGPU NVIDIA driver CUDA version A100 529 12.0 L4 529 12.0 L40s N/A N/A V100 535 12.0 H100 N/A N/A H200 N/A N/A B300 N/A N/A -
Install associated toolkit for your workload. Visit NVIDIA's CUDA toolkit downloads page.
For detailed instructions to complete Steps 2 and 3, other GPU tools, and examples, see How to Use V100-Based GPUs on IBM Cloud VPC.
For a Linux-focused guide on installing the NVIDIA drivers, see the NVIDIA Driver Installation Guide.
If you want to automate the installation of the drivers, you can use the User data section of the virtual server. By using the user data field, you can input a script that issues the commands to install the NVIDIA drivers.
Configuring a virtual server instance with an NVIDIA B300 GPU
NVIDIA HGX B300 accelerated virtual server profiles are available for select customers. Create a support case if you are interested in purchasing and using this offering.
When you use the B300 GPU profile, the guest operating system needs an update. If the guest OS is not modified, you might see errors such as, "NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:"
To configure the guest OS for the B300 GPU:
- These commands must be run as root. Sudo to root.
sudo -i - Edit the grub configuration file. The following example uses vi.
vi /etc/default/grub.d/50-cloudimg-settings.cfg - Add the following kernel parameters to the
GRUB_CMDLINE_LINUX_DEFAULTline:
The updated line should look like this:pci=assign-busses pci=realloc pci=nocrs pci=big_root_window# CLOUD_IMG: This file was created/modified by the Cloud Image build process GRUB_CMDLINE_LINUX_DEFAULT="nofb console=ttyS0 console=tty1 pci=assign-busses pci=realloc pci=nocrs pci=big_root_window" - Update grub to apply the changes.
update-grub2 - Restart the virtual server.
reboot
Configuring a virtual server instance with an Intel Gaudi 3 AI Accelerator
- Provision a virtual server instance by choosing the IntelĀ® GaudiĀ® 3 AI Accelerator instance profile in the Profile field. Stock and custom operating system images are supported.
- Install the Intel Gaudi 3 AI Accelerator software and drivers for your virtual server. To download the drivers, see Intel Gaudi Driver and Software Installation page.
Integrating drivers into a custom image from volume
- Provision a virtual server instance with a GPU and install the drivers.
- Create an image from the virtual server instance stock image boot volume. For more information, see Creating an image from a volume.
- Repeat the Image from volume process to deploy across multiple instances.
Next steps
For more information, see the NVIDIA driver documentation.