vSphere 8 with TKG 2 introduces a new api resource called ClusterClass. The casual user of Kubernetes may struggle to understand what this is or why it's needed. TKG is an implementation of the open source Kubernetes Cluster API specification. It defines an upstream aligned, standard way to lifecycle manage Kubernetes clusters. Prior to the inclusion of ClusterClass, Cluster API provided no native way to provision multiple clusters with the same configuration. DevOps users would create a basket of underlying resources to define the machines on a particular infrastructure and bootstrap the cluster. VMware simplified this experience by implementing a single cluster resource called TanzuKubernetesCluster (TKC), which contained a specification that would be resolved into all of the underlying custom resources needed in the Cluster API specification. The TKC specification is specific to deploying a Kubernetes cluster into a vSphere based Supervisor Cluster managed environment. Essentially the TKC is a VMware decorated cluster.
ClusterClass is a collection of templates that define a cluster topology and configuration. The ClusterClass can be associated with one or more clusters and will be used to continually reconcile all the clusters it is associated with. Though it has broader configuration, you can think of it as T-Shirt sizes for an entire cluster in the way that VMClass specifies CPU and Memory for VMs. Now the DevOps user can create a much simpler cluster specification that references the ClusterClass. Unlike the TKC, this specification would be more portable across infrastructure. ClusterClass attempts to provide a way to consistently provision machines, load balancers, etc across infrastructure and to improve automation around deploying, scaling, upgrading and deleting of clusters.
Tanzu Kubernetes Cluster Specification and Reconciliation
You can think of the TKC as the legacy approach to creating a Tanzu Kubernetes Cluster. It defines a Custom Resource of kind: TanzuKubernetesCluster. The TanzuKubernetesCluster is reconciled by a Controller purpose built by VMware to instantiate the underlying resources needed by Cluster API. So this specification could not be reconciled on infrastructure that did not contain this controller. In other words, it would only work on vSphere with Tanzu Supervisor clusters. The specification below is not comprehensive, but an example of it's proprietary syntax is the Settings section where the supported overlay networking is defined for the TKC.
apiVersion: run.tanzu.vmware.com/v1alpha3
kind: TanzuKubernetesCluster
metadata:
name: tkc-cluster
namespace: tkg
spec:
topology:
controlPlane:
replicas: 1
vmClass: best-effort-small
storageClass: k8s-policy
tkr:
reference:
name: v1.23.8---vmware.2-tkg.2-zshippable
nodePools:
- name: tkg-cluster-nodeool-1
replicas: 2
failureDomain: zone1
vmClass: best-effort-medium
storageClass: k8s-policy
tkr:
reference:
name: v1.23.8---vmware.2-tkg.2-zshippable
settings:
network:
cni:
name: antrea
pods:
cidrBlocks:
- 100.96.0.0/11
services:
cidrBlocks:
- 100.64.0.0/13
When the TKC is deployed, you can see that it is reconciled into a Cluster Custom Resource, a set of Machine resources and finally into VirtualMachine resources. There are many other custom resources that are created and reconciled but are not shown here. If you are interested in a more comprehensive discussion of tkc and cluster api, check out this series of blogs: tkc deep dive
ubuntu@cli-vm:~/demo-applications$ kubectl get tkc,cluster,machines,vm
NAME CONTROL PLANE WORKER TKR NAME AGE READY TKR COMPATIBLE UPDATES AVAILABLE
tanzukubernetescluster.run.tanzu.vmware.com/tkc-cluster 1 2 v1.23.8---vmware.2-tkg.2-zshippable 19m True True
NAME PHASE AGE VERSION
cluster.cluster.x-k8s.io/tkc-cluster Provisioned 19m v1.23.8+vmware.2
NAME CLUSTER PROVIDERID PHASE AGE VERSION
machine.cluster.x-k8s.io/tkc-cluster-qp4qq-5hlv6 tkc-cluster vsphere://423f39fd-642e-60d1-2c60-edc360c5566e Provisioned 19m v1.23.8+vmware.2
machine.cluster.x-k8s.io/tkc-cluster-tkg-cluster-nodeool-1-475gt-5f895d8d79-2hxkc tkc-cluster vsphere://423f39fd-642e-60d1-2c60-edc360c5566e Provisioned 19m v1.23.8+vmware.2
machine.cluster.x-k8s.io/tkc-cluster-tkg-cluster-nodeool-1-475gt-5f895d8d79-pdsl7 tkc-cluster vsphere://423f39fd-642e-60d1-2c60-edc360c5566e Provisioned 19m v1.23.8+vmware.2
NAME POWERSTATE AGE
virtualmachine.vmoperator.vmware.com/tkc-cluster-hsc8h-xc8m4 poweredOn 19m
virtualmachine.vmoperator.vmware.com/tkc-cluster-nodeoop-1-defsa poweredOn 19m
virtualmachine.vmoperator.vmware.com/tkc-cluster-qp4qq-5hlv6 poweredOn 19m
ubuntu@cli-vm:~/demo-applications$
ClusterClass Specification and Reconciliation
The first thing you will notice in the ClusterClass specification is that you are not creating a TanzuKubernetesCluster Custom Resource. You are creating a resource of Kind: cluster. The other custom resources (Machines, Virtualmachines, etc.) are created in a similar way to the TKC reconciliation, however there is no TKC resource. In the Topology section of the cluster specification, notice the reference to class: tanzukubernetescluster. This is the ClusterClass used to define cluster details. Tanzukubernetescluster ClusterClass is the default ClusterClass that is deployed when you enable vSphere with Tanzu. ClusterClass templates can be modified to change the cluster topology without having to meaningfully change the specification files.
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: cc-01
namespace: tkg
spec:
clusterNetwork:
services:
cidrBlocks: ["198.51.100.0/12"]
pods:
cidrBlocks: ["192.0.2.0/16"]
serviceDomain: "cluster.local"
topology:
class: tanzukubernetescluster
version: v1.23.8---vmware.2-tkg.2-zshippable
controlPlane:
metadata:
annotations:
# run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu # This optional annotation is to be used to make OS selection in case the TKR referenced supports multiple OSImages. If omitted the default OS "Photon" is going to be used for the cluster
replicas: 1
workers:
# node pools
machineDeployments:
- class: node-pool
name: node-pool-1
failureDomain: zone-1
replicas: 1
- class: node-pool
name: node-pool-2
failureDomain: zone-2
replicas: 1
- class: node-pool
name: node-pool-3
failureDomain: zone-3
replicas: 1
variables:
- name: vmClass
value: best-effort-small
# default storageclass for control plane and node pool
- name: storageClass
value: "k8s-storage-policy"
The Tanzukubernetescluster ClusterClass is immutable, however one way to customize a cluster is to use variables. Variables are name:value pairs defined in the variables section of the cluster spec. Storage class and VM class are required to provision the cluster, but there are additional variables that can be specified to further customize your cluster. A comprehensive list of the variables that are available can be found HERE. Variables can also be included in the Workers and ControlPlane scopes so that you can override an entire cluster variable. For example, vmClass best-effort-small could be specified as the cluster default and guaranteed-large could be specified for worker nodes that are part of node-pool named compute.
workers:
machineDeployments:
- class: node-pool
name: compute
replicas: 1
variables:
overrides:
- name: vmClass
value: guaranteed-large
ClusterClass Custom Configuration
Let's put the pieces together and deploy a a cluster with a few customizations. We want our cluster to run the PhotonOS in the control plane nodes, but Ubuntu on the gpuworkers workers. We also plan to run some applications that require GPUs, so we will use a GPU enabled vmClass for some of the workers and Non-GPU enabled vmClass for other workers. Our GPU enabled nodes will need a couple of large Persistent Volumes so we will configure those as well. There are many other customizations we could add and some examples are HERE. This is the entire manifest for the cluster. Below it we will break out the relevant sections for our customizations.
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster # We are creating a Cluster
metadata:
name: clusterclass-01
namespace: utilities
spec:
clusterNetwork:
services:
cidrBlocks: ["198.51.100.0/12"]
pods:
cidrBlocks: ["192.0.2.0/16"]
serviceDomain: "isvlab.vmware.com"
topology:
class: tanzukubernetescluster
version: v1.23.8---vmware.2-tkg.2-zshippable
controlPlane:
replicas: 3
workers:
machineDeployments:
- class: node-pool
name: gpuworkers
metadata:
annotations:
run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu # This optional annotation is to be used to make OS selection in case the TKR referenced supports multiple OSImages. If omitted the default OS "Photon" is going to be used for the cluster
replicas: 1
variables:
overrides:
- name: nodePoolVolumes
value:
- name: containerd
mountPath: /var/lib/containerd
storageClass: kubernetes-policy
capacity:
storage: 50Gi
- name: kubelet
mountPath: /var/lib/kubelet
storageClass: kubernetes-policy
capacity:
storage: 50Gi
- name: vmClass
value: gpuclass-a100
- class: node-pool
name: non-gpuworkers-1
replicas: 1
- class: node-pool
name: non-gpuworkers-2
replicas: 1
variables:
- name: vmClass
value: best-effort-small
# default storageclass for control plane and node pool
- name: storageClass
value: "kubernetes-policy"
- name: nodePoolVolumes
value: []
This is a standard cluster deployment manifest until we get to the definition of the Worker nodes. We want workers grouped into pools that support GPUs and pools that do not. This is accomplished by defining node-pools. For our customizations we will focus on the gpuworkers node-pool. The first thing to note is that clusters will default to PhotonOS for the machine image unless we add an annotation to specify another OS. We plan to run the NVIDIA GPU Operator in our cluster, which requires Ubuntu. We add the run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu annotation. Our GPU enabled nodes also need additional storage volumes and we can add them through the nodePoolVolumes override. Overrides are a way to replace default cluster configuration with specific changes for part of the cluster. Whatever you override must be declared as a variable at the Cluster level. Note above, the empty nodePoolVolumes declaration.
- name: nodePoolVolumes
value: []
Since we want machines in this node-pool to be placed on hosts with specific GPU resources, and we need a GPU PCIe device configured on the VM, we must associate an appropriate vmClass with this node-pool. We override the default best-effort-small VMclass with gpuclass-a100 for this node-pool. That's it. Now we can deploy the cluster.
- class: node-pool
name: gpuworkers
metadata:
annotations:
run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu # This optional annotation is to be used to make OS selection in case the TKR referenced supports multiple OSImages. If omitted the default OS "Photon" is going to be used for the cluster
replicas: 1
variables:
overrides:
- name: nodePoolVolumes
value:
- name: containerd
mountPath: /var/lib/containerd
storageClass: kubernetes-policy
capacity:
storage: 50Gi
- name: kubelet
mountPath: /var/lib/kubelet
storageClass: kubernetes-policy
capacity:
storage: 50Gi
- name: vmClass
value: gpuclass-a100
Cluster Deployment
After applying the cluster using the kubectl apply -f manifest.yaml command, we can see the VMs deployed and the gpuworker labeled worker running the Ubuntu OS as expected. Expanding the disks shows our additional 50GB disks added to the VM.
These are just a couple of examples of straightforward customizations that can be made using the ClusterClass configuration new in vSphere 8. More extensive customization can happen post deployment through Tanzu package management and will be covered in a future blog post.