Deploying Kubernetes clusters with GitOps and ArgoCD

Introduction

vSphere with Tanzu has the ability to deploy K8s clusters via K8s CustomResourceDefinitions (CRDs) - kind of like K8s, managing K8s - if you like. For those new to the space, this might seem a little arcane - but there are actually very good reasons why we built it this way. For starters, it lets you define your K8s clusters declaratively, just like how you define your K8s workloads, the benefit to this is that your K8s clusters will constantly be reconciled against the desired state stored in the K8s API, including through upgrades, expansions, shrink operations, etc.

One of the major advantages to managing K8s clusters through K8s CRDs, is the ability to use a GitOps model to manage your entire estate, including workload configuration, secrets, even the K8s clusters themselves! The high level pitch is that GitOps allows you to manage your entire app stack, or indeed anything managed through the K8s API in a revisioned, declarative, scalable manner that auto-heals.

Sounds cool? Good! Let’s take a look at how to do it with ArgoCD and vSphere with Tanzu.

Requirements

We are going to assume you have a few things already set up:

An ArgoCD instance (probably deployed on a TKG cluster)
vSphere with Tanzu on vSphere 7.0 U3+
K8s CLI tools (kubectl, helm)
Some standard CLI tools (jq, xargs, base64)

To allow ArgoCD to manage K8s clusters on vSphere with Tanzu, it needs to be able to access the Supervisor cluster API and authenticate with it - in 7.0 U3 we added the ability to create K8s ServiceAccounts and RoleBindings on the Supervisor cluster, which allows ArgoCD to authenticate with the Supervisor cluster using a consistent non-changing token.

This ServiceAccount is created in the vSphere with Tanzu Namespace that you want ArgoCD to manage objects inside using your regular login and kubectl so let’s take a look at how to do that first.

Authenticate with the Supervisor cluster

The first thing we need to do is authenticate ourselves with the Supervisor cluster and target the Namespace we’re going to use for ArgoCD. So let’s login:

$ kubectl vsphere login --server https://10.198.53.128 --insecure-skip-tls-verify -u administrator@vsphere.local

Logged in successfully.

You have access to the following contexts:
   10.198.53.128
   myles-ns-01-10.198.53.128

If the context you wish to use is not in this list, you may need to try
logging in again later, or contact your cluster administrator.

Let’s change into the Namespace that we’re targeting (pro tip: kubectx/kubens are great tools for managing contexts quickly):

$ kubectl config use-context myles-ns-01-10.198.53.128
Switched to context "myles-ns-01-10.198.53.128".

Create a ServiceAccount

Now that we’re targeting the Namespaces we want to use, let’s create a ServiceAccount and a RoleBinding for ArgoCD to use:

$ kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: myles-ns-01
  name: argocd-robot
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: argocd-edit-binding
  namespace: myles-ns-01
subjects:
- kind: ServiceAccount
  name: argocd-robot
  namespace: myles-ns-01
  apiGroup: ""
roleRef:
  kind: ClusterRole
  name: edit
  apiGroup: ""
EOF

serviceaccount/argocd-robot created
rolebinding.rbac.authorization.k8s.io/argocd-edit-binding configured

We have just created an account for ArgoCD to use, and K8s will automatically create a secret token for the account and store it in a Secret object for us to retrieve and use later.

The second part of the manifest is the RoleBinding, this takes the ServiceAccount (argocd-robot) and binds it to an already existing ClusterRole on the cluster - in the spirit of least privilege, we’re binding it to the edit role - but you could equally bind it to the cluster-admin role should you so wish.

Retrieve the Secret token

At this stage, we can retrieve the Secret that K8s created for the ServiceAccount and use it to build out an authentication Secret that ArgoCD will use to target this Namespace with:

$ kubectl get serviceaccounts argocd-robot -o json | jq -r '.secrets[] .name' | xargs -I {} sh -c "kubectl get secret -o json {} | jq -r '.data .token'" | base64 -d

eyJhbGciOiJSUzI1NiIsImtpZCI6Imw3V3RXLS1jQ3YzTTJVbVMtVXl0a0loT2NTRmVYZTFqUjFtX2w0TnM4d1kifQeyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlY-rest-of-the-token-here-rHKrcxCguqbx8d0O_QNX7eoyCuWjHEd4mg2r49lpFq_vaGjvnPVzYfIph15k7fgNCgX2Dxti0GtZ_fzVKoQ0rJY035f9awE0X7vkihSI6zoOclzFfev-_ldvLbmmxy2tYPsEbRNU4WTL4REdNBkN7A

That looks like a lot - but bit by bit its quite simple - we retrieve the ServiceAccount details and output that to JSON, query the JSON with jq and get the name of the Secret associated with the account - then we use xargs to run a shell command to retrieve the Secret data and output it to JSON, then we use jq again to the get the token and decode the base64 encoded Secret data and output it to the shell.

This is the token that we need to use in our ArgoCD config.

Constructing the ArgoCD config

ArgoCD allows you to feed it configuration for clusters via a Secret object so let’s create that on the cluster and namespace where ArgoCD is running - for me, i’ve deployed ArgoCD onto a TKG cluster, so i’ll change to that context:

kubectl config use-context tkc-3

Change into the namespace that you have ArgoCD deployed into:

kubens argocd

First off, we need to build the configuration that ArgoCD is expecting, we need the token from the ServiceAccount we created earlier as well as the Namespace that we want to target and the Server URL that we’re going to use to connect to the cluster - in my case that info looks like this:

Token: eyJhbGci........BkN7A
Namespace: myles-ns-01
Server: https://10.198.53.128:6443

We need to plug this info into the below format for ArgoCD (your token will be a lot longer, I’ve shortened mine for brevity):

{
  "bearerToken": "eyJhbGciOiJSUzI1NiIsI......EdNBkN7A",
  "tlsClientConfig": {
    "insecure": true
  }
}

Let’s build out the Secret that ArgoCD will use to connect to the cluster - of particilar note here is the label argocd.argoproj.io/secret-type: cluster this is what tells ArgoCD that it should use this object to connect to the cluster (the stringData.name can be whatever you want the cluster to be referred to within ArgoCD):

kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  labels:
    argocd.argoproj.io/secret-type: cluster
  name: sv-cluster-argocd-ns
  namespace: argocd
type: Opaque
stringData:
  config: |
    {
    "bearerToken": "eyJhbGciOiJSUz......_fg6jiqwUQZLUrg",
        "tlsClientConfig": {
        "insecure": true
        }
    }
  name: sv-clu01-argocd-ns
  namespaces: myles-ns-01
  server: https://10.198.53.128:6443
EOF

With the Secret created - you should now see the Namespace show up in your ArgoCD instance - not to worry if it doesn’t show Successful yet, we have to make some changes to the ArgoCD config to make it honour the Supervisor cluster’s restrictive RBAC policies.

Configuring ArgoCD

Because the Supervisor cluster uses a least-privilege approach to permissions, we need to configure ArgoCD to only reconcile and look for the Resources and CustomResourceDefinitions that we want to allow access to - i’ve pre-created a list of CRDs that would be reasonable to allow access to and will let you configure pretty much everything through GitOps.

In the argocd-cm ConfigMap object that holds the configuration for the ArgoCD server, add the following keys to the data section (replacing https://10.198.53.128:6443 with your SV cluster server URL from above) - if you are using Helm you can find my config here:

resource.exclusions: |
    - apiGroups:
    - "*"
    kinds:
    - PodTemplate
    clusters:
    - https://10.198.53.128:6443
resource.inclusions: |
    - apiGroups:
    - "run.tanzu.vmware.com"
    kinds:
    - TanzuKubernetesAddon
    - TanzuKubernetesCluster
    - TanzuKubernetesRelease
    - TkgServiceConfiguration
    clusters:
    - https://10.198.53.128:6443
    - apiGroups:
    - "cluster.x-k8s.io"
    kinds:
    - Cluster
    - Machine
    - MachineDeployment
    - MachineHealthCheck
    - MachineSet
    clusters:
    - https://10.198.53.128:6443
    - apiGroups:
    - "vmoperator.vmware.com"
    kinds: 
    - "*"
    clusters:
    - https://10.198.53.128:6443
    - apiGroups:
    - "controlplane.cluster.x-k8s.io"
    - "bootstrap.cluster.x-k8s.io"
    kinds: 
    - KubeadmControlPlane
    - KubeadmConfig
    - KubeadmConfigTemplate
    clusters:
    - https://10.198.53.128:6443
    - apiGroups:
    - "infrastructure.cluster.vmware.com"
    kinds: 
    - "*"
    clusters:
    - https://10.198.53.128:6443
    - apiGroups:
    - "rbac.authorization.k8s.io"
    kinds: 
    - Role
    - RoleBinding
    clusters:
    - https://10.198.53.128:6443

You may need to delete the ArgoCD server pod to get it to take the new config:

kubectl delete pod -l app.kubernetes.io/name=argocd-server

Creating TKG Clusters with ArgoCD

At this stage ArgoCD should now be successfully connected to the Supervisor cluster Namespace and we are now able to actually deploy a TKG cluster with it!

ArgoCD requires two things to manage K8s apps, the Application object and the manifests you wish to deploy - I have a full GitOps repo that runs on my Tanzu cluster that you can see the config for here but to keep things simple, we’re going to start off with a single Application and it will just point to a folder from which we will deploy all manifests.

The Application object tells ArgoCD where to look for manifests, and what cluster to deploy and manage them on - so let’s create one that targets our SV Namespace - the comments i’ve added will explain what the critical sections do and you should change them to suit your needs:

kubectl apply -f - <<EOF
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: tanzu-clusters
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "0"
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    # The repository that your TKC manfiests are stored in, i'm using my GitOps repo here
    repoURL: https://github.com/mylesagray/tanzu-cluster-gitops.git
    # What version do you want ArgoCD to deploy from the repo? HEAD is the latest
    targetRevision: HEAD
    # What folder in the repo do you want to deploy manifests from?
    # You can see mine here: https://github.com/mylesagray/tanzu-cluster-gitops/tree/master/manifests/tanzu-clusters
    path: manifests/tanzu-clusters
    # What type of folder is this, and do we want to recurse into subfolders?
    # ArgoCD supports many types, including Helm, jsonnet, bare YAML, and more.
    # In this example we're using bare YAML to keep it simple, but my GitOps repo above has
    # many more complex examples and uses ArgoCD's app-of-apps pattern if you want them.
    directory:
      recurse: true
  destination:
    # The cluster to deploy the manifests to - this is the Supervisor cluster
    server: 'https://10.198.53.128:6443'
    # The Namespace to deploy the manifests to - this is the Supervisor cluster Namespace
    namespace: myles-ns-01
  syncPolicy:
    # Allow ArgoCD to automatically clean up leftovers if the manifests are removed
    # and self-heal any issues that arise
    automated:
      prune: true
      selfHeal: true
EOF

As you can see from the above, we are telling ArgoCD to deploy from the manifests/tanzu-clusters folder in my repo, and to deploy to the myles-ns-01 Namespace on the Supervisor cluster.

Inside the manifests/tanzu-clusters folder is a single manifest that defines a TKC:

apiVersion: run.tanzu.vmware.com/v1alpha2
kind: TanzuKubernetesCluster
metadata:
  name: tap-cluster-01
  namespace: myles-ns-01
spec:
  distribution:
    version: v1.21
  settings:
    storage:
      classes:
      - vsan-default-storage-policy
      defaultClass: vsan-default-storage-policy
  topology:
    controlPlane:
      replicas: 3
      storageClass: vsan-default-storage-policy
      vmClass: best-effort-large
    nodePools:
    - name: workers
      replicas: 4
      storageClass: vsan-default-storage-policy
      vmClass: best-effort-large

You obviously can, and should - change the above to meet your environment specs. But if you choose to deploy my manifest using my repo as the source, that’s fine - but be aware, any time I change my GitOps repo, ArgoCD will sync the changes from the repo to your cluster - so as an example, if I delete the tap-cluster-01.yaml manifest from that folder, the TKC will be deleted from your environment.

It is highly advisable that you fork my repo and instead, target the fork in your ArgoCD config so that you have full control over everything and can see how adding and removing, or adjusting things results in ArgoCD updating your environment to match that state.

If we’ve done everything correctly - you should now have a TKC deployed in your SV cluster, matching the spec you had in your manifest:

As you update the manifest in your repo - ArgoCD will automatically reconcile the changes to the cluster, so try adding, removing or changing node sizes to get a feel for the workflow!

If you have any questions - feel free to reach out to @mylesagray on Twitter.