A number of our customers and partners in the compute-intensive computing area, especially those who are deploying machine learning, asked us whether they can use a combination of virtual GPU (vGPU) and Passthrough (DirectPath I/O) modes of addressing different GPUs from different VMs. The answer is – yes, we can. We note that the best practice you should follow is to use those two modes of GPU access in separate VMs on one or more host servers. We go into the details of the various NVIDIA drivers that allow that setup here. For maximum flexibility of operations in particular, VMware and NVIDIA would recommend using vGPU as the access method to your GPUs, though there are some cases where Passthrough access fits the need.
Passthrough (DirectPath I/O) and Virtual GPU Comparison
If you have more than one physical GPU on an ESXi host server, then a subset of those physical GPUs can be used with the NVIDIA vGPU setup, seen on the right above, while a separate subset of your GPUs can be used as Passthrough GPUs (also called DirectPath I/O). This was the question posed in a number of customer requests that were presented to us. We will see an example later in this article with one GPU in each category. The VMs using these separate modes can be on the same ESXi host server or on different host servers.
In order to have a supported configuration by NVIDIA, you should use those different types of access (i.e. Passthrough mode/vGPU mode) in separate VMs – and use separate physical GPUs in each mode. A VM should NOT have a Passthrough GPU and a virtual GPU associated with it at any one time. Although, you can switch the VM from using from one mode to the other by shutting the VM down first and then reconfiguring it using "Edit Settings - Add PCIe Device" in the vSphere Client. You would also need to assign the physical GPU device to Passthrough mode in the vSphere Client, if you were converting it from vGPU usage.
We show some example VMs with these GPU access modes below that happen to be from a vSphere with Tanzu Kubernetes Cluster (TKC), as that is the context n which our users raised some questions with GPUs.
The Differences between Passthrough and Virtual GPU (vGPU)
1. Passthrough, also called "DirectPath I/O" is a means of accessing a GPU from a VM where the full GPU is made available to one VM only and the hypervisor does minimal intervention. The end-user applications’ commands to execute various instructions on the GPU are “passed through” the ESXi hypervisor on their way to the physical GPU with minimal hypervisor action and maximum speed. To set up Passthrough, you enable “passthrough mode” for the particular GPU device in the vSphere Client interface. Here is how your GPU appears in the “PCI Devices – Passthrough-Enabled Devices” list for the host.
Once you have set up Passthrough access, and assigned the device to a VM,you can install the NVIDIA Data Center Driver onto your VM – let’s call this driver #1. You do not need a GPU-specific driver at the host ESXi level for this.
2. The Virtual GPU or vGPU approach (from NVIDIA with support on VMware) is an alternate method of accessing a GPU, or multiple GPUs, from a VM. You choose between vGPU and Passthrough access for a particular GPU device. The vGPU approach is ideal for sharing a physical GPU among two or more VMs, which you cannot achieve with Passthrough. You can also vMotion a vGPU-aware VM from one host to another suitably equipped host without interrupting the machine learning job it contains. Sharing a physical GPU among a number of VMs on one host and the vMotion of a VM with a GPU are not available today with Passthrough.
To enable vGPU access, you first set your ESXi host’s Graphics Mode. This mode can be set to SharedPassthru at the ESXi host command line using the command:
esxcli graphics host set --default-type SharedPassthru
This SharedPassthru graphics setting is required for the NVIDIA vGPU drivers (both host and guest vGPU drivers) to operate correctly. Note the two minus signs before the "default-type" argument. This setting is referred to as “Shared Direct” in the vSphere Client window that gives the Host Graphics settings below. Here is how that same graphics mode appears in the vSphere Client. Note the “Default graphics type” setting to “Shared Direct” in the right-side pane – this is the correct setting for vGPU support.
The vGPU mode of access is available to a VM if you have installed the NVIDIA vGPU Host Driver, referred to in the NVIDIA docs as the vGPU Manager, directly on the ESXi server on which the VM is to be placed. This is done as a VIB installation process onto ESXi. You get the NVIDIA vGPU Host driver with the NVIDIA AI Enterprise Suite package.
With that NVIDIA vGPU Host driver in place, the system administrator can then choose a vGPU profile for a powered off VM, using the “Edit Settings – Add PCIe Device” sequence in the vSphere Client. A set of available vGPU profiles is presented in the vSphere Client when the vGPU host driver is installed and running. The presented vGPU profiles will differ for the various models of GPU (A100, A30, V100) and will differ also if a MIG setting is enabled for any one GPU (multi-instance GPU - see blog article).The vGPU profile determines how much space, in terms of framebuffer memory on a physical GPU will be used by this VM, and in the MIG case it also determines how the streaming multiprocessors (SM, or collections of cores) are allocated to a VM.
Assigning a vGPU profile allows the VM to access a part of, or all of, the physical GPU. Here is a view of some example vGPU profiles the system administrator can choose from, when we have an A100 GPU being presented to a VM at PCIe device addition time. These happen to be MIG-backed vGPU profiles, because we enabled that MIG feature at the physical GPU level, by logging in to the host server and issuing a command "nvidia-smi -i 0 -mig 1" earlier. They could instead be time-sliced backed vGPU profiles, if MIG were not enabled on that GPU. MIG is required to be explicitly turned on for any one GPU. Time-sliced access, the alternative to MIG, is the default behavior for a GPU if MIG is not on. You can learn more about this area here.
After you assign a vGPU profile to a VM, you then power on the VM. It is automatically placed by DRS onto an appropriate host server that has a GPU, if there is one. At this point, you install the Guest OS vGPU Driver - driver #2.
If your end user should need multiple GPUs to be used in one VM, then with time-sliced vGPU profiles, you can assign multiple full vGPUs to a single VM, using multiple full-memory allocated vGPU profiles.
So far, we have mentioned three separate NVIDIA GPU drivers. Let’s quickly summarize those:
1. The NVIDIA Data Center Driver
This is documented at https://www.nvidia.com/Download/driverResults.aspx/185202/en-us ). This driver is for use in a Passthrough mode VM only. We will name this driver #1 for our current discussion. This driver is not to be used in vGPU settings.
2. The NVIDIA Virtual GPU Host Driver (also called the vGPU Manager)
This is installed into the ESXi hypervisor itself using a VIB install process. It is one of a pair of drivers used to support vGPU access from a VM. This is driver #2. The purpose of driver #2 is to offer different vGPU profiles for use in a VM and to control access to the parts of the GPU. Each vGPU profile expresses a share of the physical GPU’s resources such as memory and cores (or SMs).
3. The NVIDIA vGPU Grid Linux Driver. We refer to this one as driver #3.
This driver is referred to as the Guest vGPU driver. It is installed into the guest operating system of a VM, either directly by the user, or via an automated container-based installation. The latter can be done using the NVIDIA GPU Operator for Kubernetes. The VM will need first to have a vGPU profile assigned to it and be successfully powered up in the vSphere Client BEFORE the guest vGPU Driver (driver #3) is installed.
It is important for you to be aware of the difference between driver #1 and driver #3 above when you are setting up a VM for access to a GPU. They can look very similar at installation time. One easy way to tell them apart is the installation file name. Here are examples from the same family, version 470.103.
Driver #1: NVIDIA-Linux-x86_64-470.103.01.run
Driver #3: NVIDIA-Linux-x86_64-470.103.01-grid.run
Note the term “grid” in the vGPU guest driver name. That is your clue that you are dealing with the vGPU driver and not the Data Center driver. If you are in a vSphere with Tanzu Kubernetes environment, then your guest vGPU driver installation is normally done by the NVIDIA GPU Operator installation. With this, you can still see the driver file name by doing a kubectl exec into one of your “driver daemonset” pods as follows:-
kubectl exec -it nvidia-driver-daemonset-rn4mc -n gpu-operator – bash
while you are logged into your Tanzu Kubernetes Cluster (TKC).
Then, issue an “ls” command within the directory you will be logged into in that pod. This is usually the “/drivers” directory within one of the GPU Operator's containers - to see the name of the driver installation file. The string “rn4mc” in the pod name used above is an example from our lab environment, and it will be different in your case.
Avoiding Errors with Driver Setup
Some users have attempted to install driver #1 (the Data Center Driver) into a VM and expect it to understand a virtual GPU that has been presented to the VM using a vGPU profile. Those vGPU profiles are made visible on a host ESXi machine by the vGPU Host Driver, driver #2. But the NVIDIA Data Center Driver (driver #1) is for passthrough situations only, so it will try to access only GPU devices that are set up in Passthrough mode ONLY. The driver #1 does not understand vGPU profiles or vGPU concepts and it does not cooperate with the vGPU Host Driver (the vGPU Manager).
Another situation we have met is that of users proceeding to install the vGPU Guest Driver by hand into their Kubernetes nodes (VMs) and then installing the GPU Operator into the same set of nodes (VMs). The GPU Operator does the vGPU Driver installation in container form for you, so there is no need to do a manual vGPU driver installation into your VMs in Tanzu. If both are installation methods are used together, then there will be a clash between them.
Multiple Uses of the NVIDIA Guest OS vGPU Driver
The Data Center driver (driver #1) cannot behave in the same way as the vGPU driver (driver #3), as mentioned above. If you attempt to install the Data Center guest OS driver into a VM that has been given a vGPU profile in the vSphere Client, you will see errors that indicate this is the wrong driver, when you try to boot up that VM. It may mention an "libvgu.so" incompatibility in that error message, giving you a clue on what the cause is.
The NVIDIA Guest OS vGPU driver (driver #3), however, CAN function in a VM that is configured for a Passthrough GPU. In fact, that guest OS vGPU driver can report on the processes that happen to be using the passthrough-enabled GPU, within the VM itself, using the nvidia-smi command. We tried this out in the lab using an NVIDIA GPU Operator setup, which installs the driver into a container that it runs in the VM. The complete discussion of the GPU Operator functionality is available here. Any GPU that is in Passthrough mode is not visible to the vGPU Host driver (#2) and so will not be reported by it.
Multiple GPU Devices on One Host - Separate Your VMs
If you should need to, you can have two VMs running together – within the same Tanzu Kubernetes Cluster (TKC) or in separate ones – that are using two different approaches for accessing their own GPUs. This could be needed if one GPU is required for a machine learning, compute-intensive application and the second for another purpose, such as graphics-intensive display.
In our lab example here, we have two VMs that happen to be worker nodes in a TKG cluster. These VMs are running on separate ESXi host servers. The first VM highlighted below is on the ESXi host with IP address ending in .68. This first VM is in Passthrough mode to its GPU, as seen in the PCI Device 0 at the bottom of the screen, showing Dynamic DirectPath I/O (often abbreviated to “Passthrough”).
The second VM, seen highlighted below, is a node in the same TKG cluster. This VM is a vGPU-enabled one, also seen at PCI Device 0 at the bottom right. Here, we see the vGPU profile, namely “nvidia_a40-24c”. This VM is running on the ESXi host whose IP address ends in .67 (seen at the "Host:" line in the center pane)
These two VMs belong to the same Tanzu Kubernetes Cluster or TKC. But this is the simple case, where we are using different ESXi hosts. Now what if we have more than one GPU device on a single host? Can we do the same separation of GPU access. The answer is yes. Let's look at a different lab environment now where the host servers have two GPU devices on them. They could of course have more than two GPUs if needed.
In this second lab, there is a pair of NVIDIA Ampere A30 model GPUs that are setup for Passthrough and vGPU respectively on one host. The A30 GPU with Device ID “0000:81:00.0” in the first row on the right side"Graphics Devices" pane is enabled for Passthrough. We know that this is so because its Active Type is “Direct” and its Configured Type is “Shared” below.
Passthrough setup on one of two GPUs on a host
In contrast, the second A30 GPU device on the same host, seen with Device ID “0000:e2:00.0” in the center-right panel, is enabled for vGPU use. We know that because in the above screen, the second GPU’s Active Type and Configured Type columns both show “Shared Direct”, which signify the vGPU setup.
Now let's say that within one TKC (Tanzu Kubernetes Cluster), we want to have two separate nodes (i.e. VMs) that each have a different mode of access to the GPU they own. Here is the full view of the TKC cluster with its two different VMs, each accessing a GPU in their separate ways.
Here is more detail on the settings of the first VM with Passthrough access, whose name ends in “8vb9m”.
Below, we see the second VM, named “tkc-5-vgpuworker-16txd….” shown below with its settings.
Finally, to prove to ourselves that the second VM, the tkc-5-vgpuworker VM, is running against a separate GPU on the same server, we see that evidence below. The “VMs associated with the graphics device NVIDIA A30” pane on the bottom right side shows that our vGPU worker VM is occupying the second A30 GPU in this host ESXi server. The "Shared Direct" value for the Active Type and Configured Type columns below indicate the second GPU is in vGPU mode for its graphics setting.
We explore in this article the steps to take in GPU setup for using Passthrough and NVIDIA vGPU modes for accessing a GPU or part of one, from a VM. We can allocate GPUs in these two ways independently of one another to different VMs. This can be done with VMs that are either on separate host servers or if there are multiple GPUs on the same host server, using two or more VMs on the same host server. This shows the high level of flexibility of use of two of the different GPU access mechanisms with VMware vSphere and with Kubernetes clusters for modern applications running on VMware vSphere with Tanzu.
How Virtual GPUs Enhance Sharing in Kubernetes for Machine Learning on VMware vSphere
Accelerating Workloads on vSphere 7 with Tanzu - a Technical Preview of Kubernetes Clusters with GPUs
Determining GPU Memory for Machine Learning Application on VMware vSphere with Tanzu
Unexplored Territory Podcast : A Machine Learning Conversation with VMware and ITQ Consulting
NVIDIA AI Enterprise Documentation Site
NVIDIA AI Enterprise - Deploy the GPU Operator
Deploy an AI-Ready Enterprise Platform on VMware vSphere 7 with VMware Tanzu Kubernetes Grid Service
vSphere 7 Update 2 vGPU Operations Guide
Sizing Guidance for AI/ML in VMware Environments
Using GPUs with Virtual Machines on vSphere - Part 3: Installing the NVIDIA Virtual GPU Technology
vSphere 7 with Multi-Instance GPUs (MIG) on the NVIDIA A100 for Machine Learning Applications