Accelerating Workloads on vSphere 7 with Tanzu - A Technical Preview of Kubernetes Clusters with GPUs
In this article we walk you through, in technical preview, the proposed steps to take to configure one or more GPUs/vGPUs on VMs that support Kubernetes. This will be useful to systems administrators and developers/devops people who intend to use Kubernetes with vSphere for compute-intensive machine learning (ML) applications.
In the AI-Ready Enterprise Platform architecture, seen below, VMware’s vSphere with Tanzu allows developers, data scientists and devops people to rapidly provision multiple Kubernetes clusters onto a set of VMs to enable faster development turnaround. These Kubernetes clusters are called Tanzu Kubernetes Grid clusters (TKG Cluster or TKC for short).
TKG clusters host and manage the pods and containers that make up the NVIDIA AI Enterprise suite for the data scientist, as well as their own application pods. Training of an ML model and subsequent inference/deployment with that model very often require compute acceleration. This is because these operations are compute intensive - many matrix multiplications are being executed in parallel during the model training process on large quantities of data.
GPUs have been virtualized on VMware vSphere for many years – and are used today for both graphics-intensive and compute-intensive workloads. Today, using the VMware vSphere Client, a virtual GPU (vGPU) profile can be associated with a VM. These profiles can represent a part of, or all of, a physical GPU to a VM. You are now guided through the parts of this vGPU profile creation at VMClass creation/editing time in vSphere with Tanzu. With Kubernetes tightly integrated into vSphere with Tanzu, vGPUs for VMs on vSphere now accelerate the nodes of a TKG cluster. Naturally, vGPUs are a core component of the joint NVIDIA and VMware architecture for the AI-Ready Enterprise.
On vSphere, the Kubernetes “node” (i.e. a machine destination for deploying pods) is implemented as a VM. There are VMs that play the role of control plane nodes and worker nodes in a TKG cluster. When you create a TKG cluster, as a devops or data scientist developer, using “kubectl apply -f tkc-spec.yaml” what we see in the vSphere Client UI are new VMs that carry out the work of those nodes in Kubernetes. This set of VMs is shown in the vSphere Client for a simple TKG cluster named "tkc-1", highlighted below.
To create a TKG cluster, all the devops/developer person needs is appropriate editing access to a namespace, access to the kubectl tool, along with a suitable YAML spec file for their cluster. That namespace access is provided to the user by the System Administrator who manages the Supervisor Cluster (i.e., the Kubernetes management cluster) in vSphere.
Looking at the example namespace in the vSphere Client below that was created as part of the Tanzu setup, we see the various aspects of that namespace, including any resource limitations that may be imposed on it.
If we choose the “Services” item in the menu shown above, we see a summary screen describing the set of VM classes that are part of the Tanzu setup. We drill down into the “VM Classes” tab within the VM Service screen to see the set of VM classes that come associated with this namespace.
Many of these VM class examples are provided with the vSphere with Tanzu functionality, but custom VM classes can also be created by the user. We look at the set of steps to create a custom VM class next. The following is a technical preview of proposed functionality. Let’s click on the “Create VM Class” tile from the above set of tiles to start that process. We enter a name for our new VM class, decide on VM sizing, and choose to add a PCIe device, which is going to be our GPU device. This is a familiar step to those who have dealt with vGPU profiles in the vSphere Client in the past.
You can adjust the amount of RAM and vCPUs for your VM class at this point. We can specify more VM memory than that shown here, if needed. Notice that when you choose to add a PCIe device, the VMs created from your VM class are going to have a full reservation set on their RAM. This is the normal case when GPUs or other PCIe devices are assigned to a VM. Hit “Next” and confirm on the next screen that we are adding a PCIe device. We choose “Add PCIe Device” on the next screen.
When we choose “Add PCIe Device”, we are given an option to choose the “NVIDIA vGPU” device type and then taken to a details screen for the GPU hardware type that will back the added vGPU. The options available here come from different types of GPU hardware that are visible to our Tanzu cluster. Different GPU models may be on different hosts in the cluster. Here, our cluster has a mix of A100 and A40 model GPUs.
In our example VM class, it is the A100 GPU that we are interested in using – especially for compute-intensive ML work. In your own setup, you may see different choices of GPU hardware here. These are presented to vSphere by the NVIDIA vGPU Host Driver that is installed into the ESXi kernel itself by the system administrator, as part of the installation and setup of the GPU-aware hosts. There may be a Passthrough option for a GPU here in certain cases, but we will set that to one side for now and focus on vGPU setup.
We proceed by choosing the NVIDIA A100 GPU option here. This takes us to a dialog where we supply more details on the vGPU setup that VMs created from this VM class will have associated with them.
There are some important features of a vGPU being described here. Firstly, the model of GPU hardware that backs up this vGPU is seen at the top.
The type of vGPU Sharing is either “Time Sharing” or “Multi-instance GPU Sharing”. These are alternate backing mechanisms, built by NVIDIA, for a vGPU that determine how that vGPU will share the physical GPU exactly. “Time Sharing” does not provide physical separation of the GPU cores between one VM that is sharing the GPU with another. Time sharing depends on GPU memory (i.e. framebuffer memory) segmentation for each VM - but not core segmentation. On the other hand, the “Multi-instance GPU Sharing” type of vGPU backing enables strict hardware-level separation (of cores, memory, cache and other parts) between one VM that is sharing a physical GPU and another. Isolation of these hardware-level items is good for giving predictable performance on each VM that is sharing the GPU. More technical detail on these options can be found at this site.
You will also see in the screen above that we chose “Compute” for the GPU Mode, indicating that we want the compute-intensive rather than graphics-intensive mode for the new vGPU. "Compute" here is designed to optimize performance of the GPU for the frequent Machine Learning mathematical calculations, such as matrix multiply and accumulate.
Importantly, we can decide how much of the GPU’s own framebuffer memory is assigned to the VM in the GPU Memory entry on this screen. This determines how large a portion of the total GPU memory is given to any VM of this VM class.
Only if we chose “Time Sharing” for “vGPU Sharing” and we allocate ALL of the framebuffer memory on a GPU to one VM would the option for multiple vGPUs of that type be allowed. This is the last entry on the screen above. It would allow a VM to make use of multiple full physical GPUs - if that were needed. The higher-end machine learning models may require multiple GPUs at training time. As we see here, that option is not available to us when we are in Multi-Instance GPU Sharing mode or when we use less than the full GPU’s physical memory.
Click “Next” to complete the details of this new VM class.
We review the details here, confirm that all is in order and click “Finish” to complete the creation of the new VM class. Once the process has completed, we see the new VM class in the collection of VM classes that are available (third from the left on the top row).
There are just a few more steps to take before we are ready to use the VM class in a TKC specification YAML file. When we create a new VM class, it is initially not associated with any namespace, as you see in the tile for the “testvmclass” above.
We must first ensure that the new VM class is associated with a namespace. We can use the namespace that we installed when we set up Tanzu initially or a new one that is created at the Supervisor cluster level. In our example here, the namespace that was initially constructed is called “tkg-ns”. To do this, we choose “Namespaces” from the top-level menu and then choose “Summary” to see the user interface tiles that come with a namespace, as seen here.
Choose “Manage VM classes” in the VM Service tile above. This shows all the VM classes that are already associated with this tkg-ns namespace. Our new “testvmclass” is not yet associated with the namespace, as you see here.
We click on the checkbox adjacent to our “testvmclass” entry and hit “OK”. The VM class is added to the namespace – and so in the Summary screen, we see that the number of “Associated Classes” has increased to 20.
Our last setup step is to ensure that a suitable VM image (OVA) is available for creation of the VMs that will take their structure from the new VM class. This VM image is loaded into a vSphere Content Library. We see the names of the associated content libraries by clicking on “Manage Content Libraries” in the VM Service tile.
Earlier, we created a new content library in vSphere named “ubuntu2004” to store an OVA image that has been tailored to be ready for adding a vGPU profile to it. Loading an image into a content library is done by means of the “Import” functionality in vSphere content libraries. From the vSphere Client main navigation options, choose “Content Libraries”, click on the name of your user-created content library and you will see your imported OVA image there.
vSphere with Tanzu understands the creation of this VM image and registers it such that it can be seen using a “kubectl get tkr” command – where “tkr” stands for Tanzu Kubernetes Release. The VM image is referred to in our YAML for TKG cluster creation using just its version number as shown below. In our example deployment YAML, we used version “1.20.8+vmware.1-tkg.1” that is seen second from last in this list.
Finally, provided suitable access to the tkg-ns namespace is allowed, the devops or developer can create their TKG cluster as seen in our first screenshot, using the YAML specification below. You will see that the VM class is used in the description of one category of Worker nodes – the ones that use the GPU. A second category of Worker nodes does not need GPUs and so does not participate in that class.
This TKG Cluster is deployed by the user, using a command “kubectl apply -f tkc1.yaml” where the contents of the tkc1.yaml file are shown below. Now, you are ready to deploy further pods and applications into this cluster. The TKG cluster can be expanded with further worker nodes that have vGPUs, as applicable, simply by editing the number of replicas in the "workers" Node Pool and re-applying the YAML specification, using the kubectl apply command again.
apiVersion: run.tanzu.vmware.com/v1alpha2 kind: TanzuKubernetesCluster metadata: name: tkc-1 namespace: tkg-ns spec: distribution: fullVersion: v1.20.8+vmware.1-tkg.1 topology: controlPlane: replicas: 3 storageClass: k8s-gold-storage-policy tkr: reference: name: v1.20.8---vmware.1-tkg.1 vmClass: best-effort-small nodePools: - name: workers replicas: 1 storageClass: k8s-gold-storage-policy tkr: reference: name: v1.20.8---vmware.1-tkg.1 vmClass: testvmclass volumes: - capacity: storage: 50Gi mountPath: /var/lib/containerd name: containerd - capacity: storage: 50Gi mountPath: /var/lib/kubelet name: kubelet - name: worker2 replicas: 1 storageClass: k8s-gold-storage-policy vmClass: best-effort-small
We explore in this article, in technical preview, the ease with which a vSphere system administrator may in the future create a Kubernetes namespace for resource management, design a VM class providing a pattern for GPU-aware VMs and provide access to these objects to data scientists/developers who want to build their own TKG clusters (TKCs) within their allocated namespaces. This gives the data scientist/developer the freedom to create the Kubernetes clusters they need, when they need them - whether that be for ML model training, testing or inference using their models. These end users can supply the clusters with the appropriate number of GPUs as they see fit. This removes the need for ticketing systems for such operations, once a library of suitable VM classes are created and supplied to the data scientist user. On vSphere, these different user communities (administrators, devops, data scientists) can now collaborate in an elegant way to accelerate the production of machine learning models that enhance the business and save costs.