TKG 2 - What is ClusterClass and why do I need it?

November 07, 2022

vSphere 8 with TKG 2 introduces a new api resource called ClusterClass.   The casual user of Kubernetes may struggle to understand what this is or why it's needed.   TKG is an implementation of the open source Kubernetes Cluster API specification.  It defines an upstream aligned, standard way to lifecycle manage Kubernetes clusters.  Prior to the inclusion of ClusterClass, Cluster API provided no native way to provision multiple clusters with the same configuration.  DevOps users would create a basket of underlying resources to define the machines on a particular infrastructure and bootstrap the cluster.  VMware simplified this experience by implementing a single cluster resource called TanzuKubernetesCluster (TKC), which contained a specification that would be resolved into all of the underlying custom resources needed in the Cluster API specification.  The TKC specification is specific to deploying a Kubernetes cluster into a vSphere based Supervisor Cluster managed environment.   Essentially the TKC is a VMware decorated cluster.  

ClusterClass is a collection of templates that define a cluster topology and configuration.  The ClusterClass can be associated with one or more clusters and will be used to continually reconcile all the clusters it is associated with.  Though it has broader configuration, you can think of it as T-Shirt sizes for an entire cluster in the way that VMClass specifies CPU and Memory for VMs.   Now the DevOps user can create a much simpler cluster specification that references the ClusterClass.  Unlike the TKC, this specification would be more portable across infrastructure.  ClusterClass attempts to provide a way to consistently provision machines, load balancers, etc across infrastructure and to improve automation around deploying, scaling, upgrading and deleting of clusters.

Tanzu Kubernetes Cluster Specification and Reconciliation

You can think of the TKC as the legacy approach to creating a Tanzu Kubernetes Cluster.   It defines a Custom Resource of kind: TanzuKubernetesCluster.  The TanzuKubernetesCluster is reconciled by a Controller purpose built by VMware to instantiate the underlying resources needed by Cluster API.  So this specification could not be reconciled on infrastructure that did not contain this controller.  In other words, it would only work on vSphere with Tanzu Supervisor clusters.   The specification below is not comprehensive, but an example of it's proprietary syntax is the Settings section where the supported overlay networking is defined for the TKC. 

apiVersion: run.tanzu.vmware.com/v1alpha3
kind: TanzuKubernetesCluster
metadata:
  name: tkc-cluster
  namespace: tkg
spec:
  topology:
    controlPlane:
      replicas: 1
      vmClass: best-effort-small
      storageClass: k8s-policy
      tkr:
        reference:
          name: v1.23.8---vmware.2-tkg.2-zshippable
    nodePools:
    - name: tkg-cluster-nodeool-1
      replicas: 2
      failureDomain: zone1
      vmClass: best-effort-medium
      storageClass: k8s-policy
      tkr:
        reference:
          name: v1.23.8---vmware.2-tkg.2-zshippable
  settings:
    network:
      cni:
        name: antrea
      pods:
        cidrBlocks:
        - 100.96.0.0/11
      services:
        cidrBlocks:
        - 100.64.0.0/13

 

When the TKC is deployed, you can see that it is reconciled into a Cluster Custom Resource, a set of Machine resources and finally into VirtualMachine resources.  There are many other custom resources that are created and reconciled but are not shown here.  If you are interested in a more comprehensive discussion of tkc and cluster api, check out this series of blogs:  tkc deep dive

 

ubuntu@cli-vm:~/demo-applications$ kubectl get tkc,cluster,machines,vm
NAME                                                      CONTROL PLANE   WORKER   TKR NAME                              AGE   READY   TKR COMPATIBLE   UPDATES AVAILABLE
tanzukubernetescluster.run.tanzu.vmware.com/tkc-cluster   1               2        v1.23.8---vmware.2-tkg.2-zshippable   19m   True    True

NAME                                   PHASE         AGE   VERSION

cluster.cluster.x-k8s.io/tkc-cluster   Provisioned   19m   v1.23.8+vmware.2

NAME                                                                                CLUSTER       PROVIDERID                                       PHASE         AGE   VERSION
 
machine.cluster.x-k8s.io/tkc-cluster-qp4qq-5hlv6                                    tkc-cluster   vsphere://423f39fd-642e-60d1-2c60-edc360c5566e   Provisioned   19m   v1.23.8+vmware.2
machine.cluster.x-k8s.io/tkc-cluster-tkg-cluster-nodeool-1-475gt-5f895d8d79-2hxkc   tkc-cluster   vsphere://423f39fd-642e-60d1-2c60-edc360c5566e   Provisioned   19m   v1.23.8+vmware.2
machine.cluster.x-k8s.io/tkc-cluster-tkg-cluster-nodeool-1-475gt-5f895d8d79-pdsl7   tkc-cluster   vsphere://423f39fd-642e-60d1-2c60-edc360c5566e   Provisioned   19m   v1.23.8+vmware.2

NAME                                                                        POWERSTATE   AGE
virtualmachine.vmoperator.vmware.com/tkc-cluster-hsc8h-xc8m4                poweredOn    19m
virtualmachine.vmoperator.vmware.com/tkc-cluster-nodeoop-1-defsa            poweredOn    19m
virtualmachine.vmoperator.vmware.com/tkc-cluster-qp4qq-5hlv6                poweredOn    19m
ubuntu@cli-vm:~/demo-applications$

 

ClusterClass Specification and Reconciliation

The first thing you will notice in the ClusterClass specification is that you are not creating a TanzuKubernetesCluster Custom Resource.  You are creating a resource of Kind: cluster.   The other custom resources (Machines, Virtualmachines, etc.) are created in a similar way to the TKC reconciliation, however there is no TKC resource.  In the Topology section of the cluster specification, notice the reference to class: tanzukubernetescluster.   This is the ClusterClass used to define cluster details.  Tanzukubernetescluster ClusterClass is the default ClusterClass that is deployed when you enable vSphere with Tanzu.  ClusterClass templates can be modified to change the cluster topology without having to meaningfully change the specification files.  

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: cc-01
  namespace: tkg
spec:
  clusterNetwork:
    services:
      cidrBlocks: ["198.51.100.0/12"]
    pods:
      cidrBlocks: ["192.0.2.0/16"]
    serviceDomain: "cluster.local"
  topology:
    class: tanzukubernetescluster
    version: v1.23.8---vmware.2-tkg.2-zshippable
    controlPlane:
      metadata:
        annotations:
          #  run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu # This optional annotation is to be used to make OS selection in case the TKR referenced supports multiple OSImages. If omitted the default OS "Photon" is going to be used for the cluster
      replicas: 1
    workers:
      # node pools
      machineDeployments:
        - class: node-pool
          name: node-pool-1
          failureDomain: zone-1
          replicas: 1
        - class: node-pool
          name: node-pool-2
          failureDomain: zone-2
          replicas: 1
        - class: node-pool
          name: node-pool-3
          failureDomain: zone-3
          replicas: 1
    variables:
      - name: vmClass
        value: best-effort-small
      # default storageclass for control plane and node pool
      - name: storageClass
        value: "k8s-storage-policy"

The Tanzukubernetescluster ClusterClass is immutable, however one way to customize a cluster is to use variables.   Variables are name:value pairs defined in the variables section of the cluster spec.  Storage class and VM class are required to provision the cluster, but there are additional variables that can be specified to further customize your cluster.   A comprehensive list of the variables that are available can be found  HERE. Variables can also be included in the Workers and ControlPlane scopes so that you can override an entire cluster variable.  For example, vmClass best-effort-small could be specified as the cluster default and guaranteed-large could be specified for worker nodes that are part of node-pool named compute.

workers:
      machineDeployments:
      - class: node-pool
        name: compute
        replicas: 1
        variables:
          overrides:
          - name: vmClass
            value: guaranteed-large

 

ClusterClass Custom Configuration

 

Let's put the pieces together and deploy a a cluster with a few customizations.  We want our cluster to run the PhotonOS in the control plane nodes, but Ubuntu on the gpuworkers workers.  We also plan to run some applications that require GPUs, so we will use a GPU enabled vmClass for some of the workers and Non-GPU enabled vmClass for other workers.  Our GPU enabled nodes will need a couple of large Persistent Volumes so we will configure those as well.  There are many other customizations we could add and some examples are HERE.     This is the entire manifest for the cluster.  Below it we will break out the relevant sections for our customizations.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster    # We are creating a Cluster
metadata:
  name: clusterclass-01
  namespace: utilities
spec:
  clusterNetwork:
    services:
      cidrBlocks: ["198.51.100.0/12"]
    pods:
      cidrBlocks: ["192.0.2.0/16"]
    serviceDomain: "isvlab.vmware.com"
  topology:
    class: tanzukubernetescluster
    version: v1.23.8---vmware.2-tkg.2-zshippable
    controlPlane:
      replicas: 3
    workers:
      machineDeployments:
      - class: node-pool
        name: gpuworkers     
        metadata:
          annotations:
            run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu # This optional annotation is to be used to make OS selection in case the TKR referenced supports multiple OSImages. If omitted the default OS "Photon" is going to be used for the cluster
        replicas: 1
        variables:
          overrides:      
            - name: nodePoolVolumes
              value:  
                - name: containerd
                  mountPath: /var/lib/containerd
                  storageClass: kubernetes-policy
                  capacity:
                    storage: 50Gi
                - name: kubelet
                  mountPath: /var/lib/kubelet
                  storageClass: kubernetes-policy
                  capacity:
                    storage: 50Gi
            - name: vmClass
              value: gpuclass-a100
      - class: node-pool
        name: non-gpuworkers-1
        replicas: 1
      - class: node-pool
        name: non-gpuworkers-2
        replicas: 1
    variables:
    - name: vmClass
      value: best-effort-small
      # default storageclass for control plane and node pool
    - name: storageClass
      value: "kubernetes-policy"
    - name: nodePoolVolumes
      value: []

 

This is a standard cluster deployment manifest until we get to the definition of the Worker nodes.  We want workers grouped into pools that support GPUs and pools that do not.  This is accomplished by defining node-pools.  For our customizations we will focus on the gpuworkers node-pool.  The first thing to note is that clusters will default to PhotonOS for the machine image unless we add an annotation to specify another OS.  We plan to run the NVIDIA GPU Operator in our cluster, which requires Ubuntu.   We add the run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu annotation.   Our GPU enabled nodes also need additional storage volumes and we can add them through the nodePoolVolumes override.   Overrides are a way to replace default cluster configuration with specific changes for part of the cluster.  Whatever you override must be declared as a variable at the Cluster level.   Note above, the empty nodePoolVolumes declaration.  

- name: nodePoolVolumes
      value: []

Since we want machines in this node-pool to be placed on hosts with specific GPU resources, and we need a GPU PCIe device configured on the VM, we must associate an appropriate vmClass with this node-pool.   We override the default best-effort-small VMclass with gpuclass-a100 for this node-pool.   That's it.  Now we can deploy the cluster.

- class: node-pool
        name: gpuworkers
        metadata:
          annotations:
            run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu # This optional annotation is to be used to make OS selection in case the TKR referenced supports multiple OSImages. If omitted the default OS "Photon" is going to be used for the cluster
        replicas: 1
        variables:
          overrides:      
            - name: nodePoolVolumes
              value:  
                - name: containerd
                  mountPath: /var/lib/containerd
                  storageClass: kubernetes-policy
                  capacity:
                    storage: 50Gi
                - name: kubelet
                  mountPath: /var/lib/kubelet
                  storageClass: kubernetes-policy
                  capacity:
                    storage: 50Gi
            - name: vmClass
              value: gpuclass-a100

 

Cluster Deployment

After applying the cluster using the kubectl apply -f manifest.yaml command, we can see the VMs deployed and the gpuworker labeled worker running the Ubuntu OS as expected.  Expanding the disks shows our additional 50GB disks added to the VM.

image-20221122080723-3image-20221122080808-4

 

These are just a couple of examples of straightforward customizations that can be made using the ClusterClass configuration new in vSphere 8.  More extensive customization can happen post deployment through Tanzu package management and will be covered in a future blog post.

Filter Tags

Modern Applications vSphere with Tanzu Kubernetes Blog Deep Dive Advanced Manage