Tanzu Proof of Concept Guide

vSphere
Tanzu
7.0u2

POC Guide Overview

The purpose of this document is to act as a simple guide for proof of concepts involving vSphere with Tanzu as well as VMware Cloud Foundation (VCF) with Tanzu.

This document is intended for data center cloud administrators who architect, administer and deploy VMware vSphere and VMware Cloud Foundation technologies. The information in this guide is written for experienced data center cloud administrators.

This document is not a replacement for official product documentation; however, it should be thought of as a structured guide to augment existing guidance throughout the lifecycle of a proof-of-concept exercise. Official documentation should supersede guidance documented here if the there is a divergence between this document and product documentation.

When referring to any statements made in this document, verification regarding support capabilities, minimums and maximums should be cross-checked against official VMware Technical product documentation at https://configmax.vmware.com/ in case of more recent updates or amendments to what is stated here.

This document is laid out into several distinct sections to make the guide more consumable depending on the use case and proof of concept scenario:

 

Section 1: Overview & Setup
Product information and getting started

Section 2: App Deployment & Testing
Use-case defined testing with examples

Section 3: Lifecycle Operations
Scaling, upgrades and maintenance

Section 4: Monitoring
Essential areas of focus to monitor the system

 

A Github repository with code samples to accompany this document is available at:
https://github.com/vmware-tanzu-experiments/vsphere-with-tanzu-proof-of-concept-samples

 

Overview and Setup

In this guide we detail the two networking options available in vSphere with Tanzu, namely vSphere or NSX-T networking. With the latter, we show how VMware Cloud Foundation with Tanzu can be utilised to quickly stand up a private cloud with Tanzu enabled.

Note that Tanzu itself comes in three different flavours, or ‘Editions’, see https://tanzu.vmware.com/tanzu.

 

Document Scope

This POC guide describes the following topics & tasks to build and manage Kubernetes container platforms on vSphere with Tanzu.

Architectural choices

  • Architectural choices for Network Stack

VI-Admin tasks

  • Setting up the network stack (will explore all three options for network stack)
  • Creating Content Library in vSphere
  • Enabling vSphere Cluster HA
  • Enabling Workload Management
  • Deploying a Supervisor Cluster
  • Creating Namespaces
  • Creating SBPM policy and assignment them to Namespaces
  • Setting up a standalone Harbor Image repository

Platform Management tasks

  • Creating a Tanzu Kubernetes Cluster (TKC aka guest cluster)
  • Deploying Sample Workloads on TKC
  • Installing Tanzu Extensions (CertManager, Contour, Fluentbit, Prometheus, Grafana)

App Deployment 

  • App deployment & Testing

Terminology

To make it easy to read, we have used the following short words in this document.

K8S Kubernetes
LCM Lifecycle management including Day0, Day1, and Day2 operations
TKG-S Tanzu Kubernetes Grid Service aka vSphere With Tanzu
vNamespaces vSphere Namespace, newly introduced in vSphere 7, to create multi-tenancy. vSphere Namespace is a vSphere concept to provide multi-tenancy / segregation of resources belongs to one particular tenant. This is not a Kubernetes namespace
TKG Cluster Tanzu Kubernetes Grid Cluster, an upstream K8S cluster created for DevOps workloads
TKC Synonym for TKG Cluster, stands for Tanzu Kubernetes Cluster.
Guest Cluster Synonym for TKG Cluster. A term used to denote that the cluster is outside of vSphere primitives, and life cycle management is independent of vSphere LCM.
VDS vSphere Distributed Switch (defined and managed by vCenter)

Architectural choices

Network Stack

Network-stack is responsible for connecting  Kubernetes nodes and load balancer for k8s control plane and container workloads.  VMware offers two different possible options for networking stack on which vSphere with Tanzu can be built on.

image 124

Note: At the time of this document preparation (vSphere 7.0u2), Supervisor Services (vSphere POD Services, built-in Imagerepo service, etc., )  are available only when the stack is built with NSX-T SDN.  

(Option-1) VDS

In this model, vSphere VDS will provide the network connectivity for Kubernetes cluster nodes In both Supervisor cluster & Kubernetes clusters (guest clusters). In addition, AVI LoadBalancer will provide the load balancing feature for K8S Control planes, and LB for container workloads.  Note: AVI is a default load balancer shipped with “vSphere with Tanzu”. However, customers can also bring their existing load balancer (ex: HAProxy) in the place of AVI LB.  

(Option-2) NSX-T:

In this model, NSX-T SDN will serve all the networking needs for the stack. This includes the Kubernetes cluster node network, container network, a load balancer for the control plane, a load balancer for workload apps, and layer-7 ingress for the workload apps. In addition, NSX-T enables Supervisor services (vSphere pods, image repo service, etc.,),  Network security policies, between namespaces, between K8s Clusters, nodes, and much more advanced SDN features.  Note: VMware recommends using NSX-T as a network choice, which enables the complete enterprise-grade features all-in-one network solution.

Container Network Interface (CNI)

CNI provides the connectivity and network policy for POD on a Kubernetes Cluster.  Based on the Kubernetes provides the only API, but the platform team should deploy one of the network solution compatibles for CNI, for example, Antrea, Calico, Flannel, etc. CNI makes a clear separation between container vs infra network.  Antrea CNI, VMware recommended, default CNI solution, delivered out-of-box with Tanzu Kubernetes Clusters.  Note: As an alternative to Antrea, customers can use their own choice of CNI for example Calico

Antrea & NSX-T    

In addition to the required network features like K8S POD & Service network, network policies, Antrea provides the most advanced network policies and out-of-box integration with NSX-T. This direct integration allows NSX-T to reconcile all Antrea features to NSX-T and vice versa.

Together with Antrea as a CNI and NSX-T as a network stack, customers can benefit from enterprise-grade network policy management and a single interface for managing all network policies for VMs, K8S Nodes, Container workloads, cross-cluster & cross namespace network policies

Load Balancer

On a Kubernetes cluster, we use a Load balancer for two main purposes.  (1) To access multi-node Kubernetes control plane (2) Accessing Kubernetes Service Object (type LB) served by the backend apps. vSphere with Tanzu comes with a free version of NSX Advanced LoadBalancer (AVI Essential edition). However, vSphere with Tanzu also allows bringing your Load balancer, for example, HAProxy.

vSphere with Tanzu — vSphere Networking

Here, we will describe the setup of vSphere with Tanzu using vSphere Networking, with both the NSX Advanced Load Balancer (ALB) and the open-source HaProxy options.

 

Graphical user interface, website</p>
<p>Description automatically generated

 

Getting Started

The basic steps and requirements to get started with vSphere with Tanzu are shown below. For more information, please refer to the official documentation.


1. Network Requirements

In vCenter, configure a vDS with at least two port groups for ‘Management’ and ‘Workload Network’.

Diagram</p>
<p>Description automatically generated

 

The following IP addresses are required:

Management Network:

5x consecutive routable IP addresses for Workload Management, plus one for the network appliance (i.e. either NSX ALB or HaProxy)

Workload Network:

For simplicity, one /24 routable network (which will be split into subnets). In the example below, we will use the network 172.168.161.0/24 with 172.168.161.1 as the gateway.

Next, decide on the network solution to be used, either:

2(a) NSX ALB or —
2(b) HaProxy


2(a) NSX Advanced Load Balancer Configuration

In vSphere 7.0 Update 2, a new option for load balancer is available. The NSX Advanced Load Balancer (NSX ALB) also known as AVI, provides a feature-rich and easy to manage load balancing solution. The NSX ALB is available for download in OVA format from my.vmware.com.

 

Graphical user interface, application</p>
<p>Description automatically generated with medium confidence

Below, we will briefly run through the steps to configure the NSX ALB. For full instructions, please refer to the documentation, https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-AC9A7044-6117-46BC-9950-5367813CD5C1.html

The download link will redirect you to the AVI Networks Portal. Select the VMware Controller OVA:

Graphical user interface, application</p>
<p>Description automatically generated

For more details on download workflow, see https://kb.vmware.com/s/article/82049?lang=en_US

Once the OVA has been downloaded, proceed to your vCenter and deploy the OVA by supplying a management IP address.

Note, supplying a sysadmin login authentication key is not required.

Graphical user interface, text, application, email</p>
<p>Description automatically generated

Once the appliance has been deployed and powered on, login to the UI using the supplied management IP/FQDN. Note, depending on the version used, the UI will vary. At the time of writing, the latest version available is 20.1.5.

Create username and password. Email is optional.

Graphical user interface</p>
<p>Description automatically generated

Add supplemental details, such as DNS, passphrase, etc.

A picture containing text, screenshot, monitor, screen</p>
<p>Description automatically generated

Next, the Orchestrator needs to be set to vSphere. Select ‘Infrastructure’ from the menu on the top left:

Graphical user interface, application</p>
<p>Description automatically generated

Then select ‘Clouds’ from the menu at the top:

Graphical user interface, application</p>
<p>Description automatically generated

Edit ‘Default-Cloud’ – on the pop-up window, navigate to ‘select cloud’ and set the orchestrator to ‘VMware’.

Graphical user interface, application</p>
<p>Description automatically generated

Follow the screens to supply the username, password and vCenter information so that the NSX ALB can connect to vCenter. For permissions, leave “Write” selected, as this will allow for easier deployment and automation between ALB and vCenter. Leave SDN Integration set to “None”.

Finally, on the Network tab, under ‘Management Network’, select the workload network as previously defined on the vDS. Provide the IP subnet, gateway, and IP address pool to be utilized. This IP Pool is a range of IP to be used for the Service Engine (SE) VMs .

Note, in a production environment, a separate 'data network' for the SEs may be desired. For more information, see the documentation, https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-489A842E-1A74-4A94-BC7F-354BDB780751.html

Here, we have created a block of 99 addresses in the workload network, from our /24 range:

image 76

After the initial configuration, we will need to either import a certificate or create a self-signed certificate to be used in Supervisor cluster communication. For the purposes of a PoC, a self-signed certificate should suffice.

Navigate to Administration by selecting this option from the drop-down menu on the upper left corner.

In the administration pane, select Settings and edit the System Access Settings by clicking on the pencil icon:

Graphical user interface, text, application, website</p>
<p>Description automatically generated

Remove the default certificates under ‘SSL/TLS’ Certificate. Then click on the caret underneath to expand the options. Click on ‘Create Certificate’ green box.

Graphical user interface, text, application</p>
<p>Description automatically generated

Create a self-signed certificate by providing the required information. You can add Subject Alternate Names if desired. Note, ensure the IP address of the appliance has been captured, either in the Name or in a SAN.

Graphical user interface, application</p>
<p>Description automatically generated

For more information on certificates, including creating a CSR, see the AVI documentation, https://avinetworks.com/docs/20.1/ssl-certificates/

Next, we need to create an IPAM Profile. This is needed to tell the controller to use the Frontend network to allocate VIPs via IPAM.
Navigate to Templates > Profiles > IPAM/DNS Profiles > create

Graphical user interface, text, application</p>
<p>Description automatically generated

Via a IPAM profile, change the cloud for usable network to ‘Default-Cloud’, and set the usable network to the VIP network, in this case DSwitch-wld:

Graphical user interface, text, application</p>
<p>Description automatically generated

 

At this stage, if you have successfully deployed the NSX ALB, proceed to step 3.

 

2(b) HaProxy Configuration

As an alternative to the NSX ALB, VMware have packaged HaProxy in a convenient OVA format, which can be downloaded and deployed quickly. This is hosted on GitHub: https://github.com/haproxytech/vmware-haproxy

In the simplest configuration, the HA Proxy appliance will need a minimum of two interfaces, one on the ‘Management’ network and the other on a ‘Workload’ network, with a  static IP address in each. (An option to deploy with three networks, i.e. with an additional ‘Frontend’ network is also available but is beyond the scope of this guide).

Below we will go through the basic setup of HaProxy and enabling Workload Management to quickly get started.

First, download and configure the latest HaProxy OVA from the GitHub site.

Here, we will use the ‘Default’ configuration, which will deploy the appliance with two network interfaces:

Graphical user interface, application</p>
<p>Description automatically generated

The two port groups for Management and Workload Network should be populated with the appropriate values. The Frontend network can be ignored:

Graphical user interface, application, email</p>
<p>Description automatically generated

Use the following parameters as a guide, substituting the workload network for your own.

As per the table below, we subnet the Workload network to a /25 for the load-balancer IP ranges in step 3.1. In addition, the HaProxy will require an IP for itself in the workload network.

1.2

Permit Root Login

True

2.1

Host Name

<Set a Host Name>

2.2

DNS

<DNS Server>

2.3

Management IP

<IP in Mgmt range>

2.4

Management Gateway

<Mgmt Gateway>

2.5

Workload IP

172.168.161.3

2.6

Workload Gateway

172.168.161.1

3.1

Load Balancer IP Ranges (CIDR)

172.168.161.128/25

3.2

Dataplane API Management Port

5556

3.3

HaProxy User ID

admin

3.4

HaProxy Password

<set a password>

N.B.: Take special care with step 3.1, this must be in CIDR format. Moreover, this must cover the ‘IP Address Ranges for Virtual Servers’ which will be used later to enable Workload Management in vCenter (see below). Note that the vCenter wizard will require the range defined here in a hyphenated format: from the example above, 172.168.161.128/25 covers the range 172.168.161.129-172.168.171.240

 

3. TKG Content Library

Before we can start the Workload Management wizard, we need to first setup the TKG Content Library to pull in the TKG VMs from the VMware repository. The vCenter where the TKG content library will be created on should have internet access in order to be able to connect to the repo.

Create a subscribed content library (Menu > Content Libraries > Create New Content Library) pointing to the URL:

https://wp-content.vmware.com/v2/latest/lib.json

Graphical user interface, text, application</p>
<p>Description automatically generated

For the detailed procedure, see the documentation: https://via.vmw.com/tanzu_content_library

 

4. Load Balancer Certificate

The first step is to obtain the certificate from the deployed network appliance.

For NSX ALB export the certificate from the ALB UI by going to Templates > Security > SSL/TLS Certificates. Select the self-signed certificate you created and export it.

Table</p>
<p>Description automatically generated

 

Graphical user interface, text, application, email</p>
<p>Description automatically generated

Copy the certificate and make a note of it for the steps below.

If using the HaProxy appliance, log into it using SSH. List the contents of the file /etc/haproxy/ca.crt.

 

5. Configure Workload Management

In vCenter, ensure that DRS and HA are enabled for the cluster and a storage policy for the control plane VMs exists. In a vSAN environment, the default vSAN policy can be used.

Navigate to Menu > Workload Management and click ‘Get Started’ to start the wizard.

Graphical user interface, text, application</p>
<p>Description automatically generated

 

Text, letter</p>
<p>Description automatically generated

 

Below we’ll focus on the networking, i.e. step 5 onwards in the wizard. For more details, please see the documentation, https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-8D7D292B-43E9-4CB8-9E20-E4039B80BF9B.html

Use the following as a guide, again, replacing values for your own:

Load Balancer:

Name*: lb1
Type: NSX ALB | HaProxy
Data plane API Address(s): <NSX ALB mgmt IP>:443 | <HaProxy mgmt IP>:5556 
Username: admin
Password: <password from appliance>
IP Address Ranges for Virtual Servers^ : 172.168.161.129–172.168.171.240
Server Certificate Authority: <cert from NSX ALB or HaProxy>

* Note that this is a Kubernetes construct, not the DNS name of the HaProxy appliance.
^ HaProxy only. This must be within the CIDR range defined in step 3.1 of the HaProxy configuration
 

Management Network:

Network: <mgmt port group>
Starting IP: <first IP of consecutive range>
Subnet: <mgmt subnet>
Gateway: <management gateway>
DNS: <dns server>
NTP: <ntp server>

 

Workload Network:

Name: <any you choose>
Port Group: <workload port group>
Gateway: 172.168.161.1
Subnet: 255.255.255.0
IP Address Ranges*:  172.168.161.20–172.168.161.100

* These must not overlap with the load-balancer addresses

 

Note, it may be useful to use a tool such as ‘arping’ or ‘nmap’ to check where IPs are being used. For example:

# arping -I eth0 -c 3 10.156.163.3
ARPING 10.156.163.3 from 10.156.163.10 eth0
Unicast reply from 10.156.163.3 [00:50:56:9C:5A:F5]  0.645ms
Unicast reply from 10.156.163.3 [00:50:56:9C:5A:F5]  0.891ms
Unicast reply from 10.156.163.3 [00:50:56:9C:5A:F5]  0.714ms
Sent 3 probes (1 broadcast(s))
Received 3 response(s)

 

vSphere with Tanzu — NSX-T Networking

Overview

In this section, we show how to quickly deploy vSphere with Tanzu and NSX-T using VMware Cloud Foundation (VCF). NSX provides a container plug-in (NCP) that interfaces with Kubernetes to automatically serve networking requests (such as ingress and load balancer) from NSX Manager. For more details on NCP, visit: https://via.vmw.com/ncp.

In addition, NSX-T networking enables two further elements: ‘vSphere Pods’ and a built-in version of the Harbor registry. The vSphere Pod service enables services from VMware and partners to run directly on top of ESXi hosts, providing a performant, secure and tightly integrated Kubernetes environment.

For more details on vSphere Pods see https://via.vmw.com/vsphere_pods and https://blogs.vmware.com/vsphere/2020/04/vsphere-7-vsphere-pod-service.html

 

Graphical user interface, website</p>
<p>Description automatically generated

 

Once the VCF environment with SDDC manager has been deployed (see https://docs.vmware.com/en/VMware-Cloud-Foundation/index.html for more details), Workload Management can be enabled. Note that both standard and consolidated deployments can be used.

Getting Started

Below is a summary of the detailed steps found in the VCF POC Guide.

First, in SDDC Manager, click on Solutions, this should show “Kubernetes – Workload Management”. Click on Deploy and this will show a window with the deployment pre-requisites, i.e.:

  • Hosts are licenced correctly
  • An NSX-T based Workload Domain has been provisioned
  • NTP and DNS has been set up correctly
  • NSX Edge cluster deployed with a ‘large’ form factor
  • The following IP addresses have been reserved for use:
    • non-routable /22 subnet for pod networking
    • non-routable /24 subnet for Kubernetes services
    • two routable /27 subnets for ingress and egress
    • 5x consecutive IP addresses in the management range for Supervisor services

 

Clicking on Begin will start the Kubernetes deployment wizard.

Graphical user interface, application</p>
<p>Description automatically generated

Graphical user interface, text, application, email</p>
<p>Description automatically generated

Select the appropriate cluster from the drop-down box. Click on the radio button next to the compatible cluster and click on Next:

Graphical user interface, text, application, email</p>
<p>Description automatically generated

The next screen will go through some validation checks

Check that the validation succeeds. After clicking on Next again, check the details in the final Review window:

Graphical user interface, text, application, email</p>
<p>Description automatically generated

Click on Complete in vSphere to continue the wizard in vCenter

Ensure the correct cluster has been pre-selected:

Graphical user interface, application</p>
<p>Description automatically generated

 

To show the Storage section, click on Next. Select the appropriate storage policies for the control plane, ephemeral disks and image cache:

Graphical user interface, text, application</p>
<p>Description automatically generated

Click on Next to show the review window. Clicking on Finish will start the supervisor deployment process:

Graphical user interface, text, application, email</p>
<p>Description automatically generated

For an interactive guide of the steps above, visit:

https://core.vmware.com/delivering-developer-ready-infrastructure#step_by_step_guide_to_deploying_developer_ready_infrastructure_on_cloud_foundation_isim_based_demos

 

TKG Content Library

To later setup Tanzu Kubernetes Clusters, we need to first setup the TKG Content Library to pull in the TKG VMs from the VMware repository.

Create a subscribed content library (Menu > Content Libraries > Create New Content Library) pointing to the URL:

https://wp-content.vmware.com/v2/latest/lib.json

Graphical user interface, text, application</p>
<p>Description automatically generated

For the detailed procedure, see the documentation: https://via.vmw.com/tanzu_content_library

 

 

Supervisor Cluster Setup

After the process has been completed, navigate to Cluster > Monitor > Namespaces > Overview to ensure the correct details are shown and the health is green. Note that whilst the operations are in progress, there may be ‘errors’ shown on this page, as it is monitoring a desired state model:

Graphical user interface, application</p>
<p>Description automatically generated

 

Configure Supervisor Cluster Namespace(s) with RBAC

Once the supervisor cluster has been configured, a namespace should be created in order to set permissions, storage policies, and capacity limitations among others. In Kubernetes, a namespace is a collection of resources such as containers, disks, etc.

To create a namespace, navigate to Menu > Workload Management > Click on Namespaces > New Namespace.
Fill in the necessary fields and click create.

Graphical user interface, application</p>
<p>Description automatically generated

 

The new namespace area will be presented. This is where permissions, storage policies and other options can be set.

Graphical user interface, text, application, email</p>
<p>Description automatically generated

After clicking “Got It” button, the summary will show a widget where permissions can be set.

Graphical user interface, application</p>
<p>Description automatically generated

Click on Add Permissions and fill in the necessary fields. It is important to note that the user/group to be added to this namespace should have already been created ahead of time. This can be an Active Directory user/group (see  https://via.vmw.com/ad_setup) or ‘vsphere.local’:

Graphical user interface, text, application</p>
<p>Description automatically generated

After adding permission, the summary screen will show who has permissions and what type. Clicking the Manage Permissions link will take you to the Permissions tab for this namespace

Graphical user interface, application</p>
<p>Description automatically generated

From the permissions tab, you can add/remove/edit permissions for a particular namespace. Thus, here we can enable access for a developer to be able to consume the namespace.

Graphical user interface, application</p>
<p>Description automatically generated

 

Configure Supervisor Cluster Namespace(s) Storage Policy

First, configure any storage policies as needed, either by defining a VM storage policy (as is the case for vSAN) or by tagging an existing datastore. Note that vSAN comes with a default storage policy ‘vSAN Default Storage Policy’ that can be used without any additional configuration.

To create a VM storage policy, navigate to Menu > Policies and Profiles > VM Storage Policies and click on ‘Create’. Follow the prompts for either a vSAN storage policy or tag-based policy under ‘Datastore Specific rules’.

Graphical user interface, text, application, email</p>
<p>Description automatically generated

To create a tag-based VM storage policy, reference the documentation: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.storage.doc/GUID-D025AA68-BF00-4FC2-9C7E-863E5787E743.html

Once a Policy has been created, navigate back to the namespace and click on ‘add storage

Graphical user interface, application</p>
<p>Description automatically generated

Select the appropriate storage policy to add to the namespace:

Graphical user interface, text, application</p>
<p>Description automatically generated

 

 

 

Configure Supervisor Cluster Namespace(s) with Resource Limitations

Resource limitations such as CPU, memory, and storage can be tied to a namespace. Under the namespace, click on the Configure tab and select Resource Limits.

Graphical user interface, text, application, email</p>
<p>Description automatically generated

By clicking on the edit button, resources can be limited for this specific Namespace. Resource limitations can also be set at the container level.

Graphical user interface, table</p>
<p>Description automatically generated

Note that under the Configure tab, it is also possible to limit objects such as Replica Sets, Persistent Volume Claims (PVC), and network services among others.

Table</p>
<p>Description automatically generated

 

 

 

Lab VM Setup

Whilst many of the operations in this guide can be performed on a standard end-user machine (be it Windows, MacOS or Linux), it is a good idea to deploy a jump host VM, which has the tools and configuration ready to work with. A Linux VM is recommended.

Conveniently, there is a TKG Demo Appliance fling that we can leverage for our purposes. Download and deploy the OVA file from the link below (look for the ‘offline download’ of the TKG Demo Appliance OVA): https://via.vmw.com/tkg_demo

Note that throughout this guide, we use Bash as the command processor and shell. 

 

Downloading the kubectl plugin

See https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-0F6E45C4-3CB1-4562-9370-686668519FCA.html

Once a namespace has been created (see steps above), a command-line utility (kubectl-vsphere) needs to be downloaded to be able to login to the namespace. First, navigate to the namespace in vCenter: Menu > Workload Management > Namespace then select ‘Copy link’:

Graphical user interface, text, application</p>
<p>Description automatically generated

This will provide the VIP address needed to login to the namespace. Make a note of this address. Then on your jump VM, download the zip file ‘vsphere-plugin.zip’, either using a browser or via wget, pointing to https://<VIP>/wcp/plugin/linux-amd64/vsphere-plugin.zip
 

For example:

# wget https://172.168.61.129/wcp/plugin/linux-amd64/vsphere-plugin.zip --no-check-certificate

Unzip this file and place the contents in the system path (such as /usr/local/bin). The zip file contains two files, namely kubectl and kubectl-vsphere. Remember to set execute permissions.

To log into a namespace on the supervisor cluster, issue the following command, replacing the VIP IP with your own:

# kubectl vsphere login --server=172.168.61.129 --insecure-skip-tls-verify

Use the credentials of the user added to the namespace to log-in.

Note that the ‘insecure’ option needs to be specified unless the appropriate TLS certificates have been installed on the jump host. For more details see the ‘Shell Tweaks’ sub-section below.

Once logged in, perform a quick check to verify the health of the cluster using ‘kubectl cluster-info’:

# kubectl cluster-info
Kubernetes master is running at https://172.168.61.129:6443
KubeDNS is running at https://172.168.61.129:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

 

 

Shell Tweaks (optional)

In order to have a better experience (with less typing and mistakes) it’s advisable to spend a little time further setting up our lab VM.

Installing Certificates:

In order to setup trust with vCenter, and to avoid skipping the TLS verify step on every login, we need to download the certificate bundle and copy the certificates to the appropriate location.

The outline procedure for this is given in https://kb.vmware.com/s/article/2108294 with more details here, https://via.vmw.com/tanzu_tls

First, we download the certificate bundle from vCenter and unzip it:

# wget --no-check-certificate https://vCenter-lab/certs/download.zip
# unzip download.zip

 

Then copy the certificates to the correct location. This is determined by the operating system, in the case of the TKG Appliance / Photon OS, it is /etc/ssl/certs:

# cp certs/lin/* /etc/ssl/certs

Finally, either use an OS utility to update the system certificates, or reboot the system.

 

Password as an environment variable:

We can store the password used to login to the supervisor cluster in an environment variable. This can then be combined with the login command for quicker/automated logins, for example (here we have also installed the certificates, thus we have a shorter login command):

# export KUBECTL_VSPHERE_PASSWORD=P@ssw0rd
# kubectl vsphere login --vsphere-username administrator@vsphere.local --server=https://172.168.161.101

For autocomplete:

# source <(kubectl completion bash)
# echo "source <(kubectl completion bash)" >> ~/.bashrc

To set the alias of kubectl to just ‘k’:  

# echo "alias k='kubectl'" >> ~/.bashrc
# complete -F __start_kubectl k

 

YAML validator

It is a good idea to get any manifest files checked for correct syntax, etc. before applying. Tools such as ‘yamllint’ (or similar, including online tools) validate files quickly and detail where there may be errors.

 

For more details and other tools see the following links:
https://kubernetes.io/docs/reference/kubectl/cheatsheet/
https://yamllint.readthedocs.io/

 

 

Tanzu Kubernetes Cluster Deployment

Once the Supervisor cluster has been enabled, and a Namespace created (as above), we can create an upstream-compliant Tanzu Kubernetes Cluster (TKC). This is done by applying a manifest on the supervisor cluster which will define how the cluster is setup. (Note that the terms TKC and TKG cluster are used interchangeably within this guide.)

First, make sure that the Supervisor Namespace has been correctly configured. A content library should have been created to pull down the TKG VMs. In vSphere 7 update 2a there is a further requirement to add a VM class.

Navigating to Hosts and Clusters > Namespaces > [namespace] will give you a view of the information cards. The card labelled ‘Tanzu Kubernetes Grid Service’ should have the name of the content library hosting the TKG VMs.

Graphical user interface, application</p>
<p>Description automatically generated

On the ‘VM Service’ card click on ‘Add VM Class’ to add VM class definitions to the Namespace:

Graphical user interface, application</p>
<p>Description automatically generated

This will bring up a window to enable you to add the relevant VM classes (or to create your own). Select all available classes and add them to the Namespace:

Graphical user interface, table</p>
<p>Description automatically generated

For more details on the sizing see: https://via.vmw.com/tanzu_vm_classes.

Next, we can proceed to login to the supervisor namespace using ‘kubectl vsphere login’. If necessary, use the ‘kubectl config use-context’ command to switch to the correct supervisor namespace.

To get the contexts available (the asterisk shows the current context used):

# kubectl config get-contexts
CURRENT   NAME             CLUSTER           AUTHINFO             NAMESPACE
*         172.168.61.129   172.168.61.129    dev@vsphere.local
          ns01             172.168.61.129    dev@vsphere.local    ns01

And to switch between them:

# kubectl config use-context ns01
Switched to context "ns01".

 

If we have setup our TKC content library correctly, we should be able to see the downloaded VM images using the command ‘kubectl get tkr’:

# kubectl get tkr
NAME                                VERSION                      
v1.16.12---vmware.1-tkg.1.da7afe7   1.16.12+vmware.1-tkg.1.da7afe7
v1.16.14---vmware.1-tkg.1.ada4837   1.16.14+vmware.1-tkg.1.ada4837
v1.16.8---vmware.1-tkg.3.60d2ffd    1.16.8+vmware.1-tkg.3.60d2ffd
v1.17.11---vmware.1-tkg.1.15f1e18   1.17.11+vmware.1-tkg.1.15f1e18
v1.17.11---vmware.1-tkg.2.ad3d374   1.17.11+vmware.1-tkg.2.ad3d374
v1.17.13---vmware.1-tkg.2.2c133ed   1.17.13+vmware.1-tkg.2.2c133ed
v1.17.17---vmware.1-tkg.1.d44d45a   1.17.17+vmware.1-tkg.1.d44d45a
v1.17.7---vmware.1-tkg.1.154236c    1.17.7+vmware.1-tkg.1.154236c
v1.17.8---vmware.1-tkg.1.5417466    1.17.8+vmware.1-tkg.1.5417466
v1.18.10---vmware.1-tkg.1.3a6cd48   1.18.10+vmware.1-tkg.1.3a6cd48
v1.18.15---vmware.1-tkg.1.600e412   1.18.15+vmware.1-tkg.1.600e412
v1.18.5---vmware.1-tkg.1.c40d30d    1.18.5+vmware.1-tkg.1.c40d30d
v1.19.7---vmware.1-tkg.1.fc82c41    1.19.7+vmware.1-tkg.1.fc82c41
v1.20.2---vmware.1-tkg.1.1d4f79a    1.20.2+vmware.1-tkg.1.1d4f79a

Thus versions through to v1.20.2 are available to use.

We then need to create a manifest to deploy the TKC VMs. An example manifest is shown below, this will create a cluster in the ns01 supervisor namespace called ‘tkgcluster1’ consisting of one control-plane and three worker-nodes, with the Kubernetes version 1.17.8:

TKG-deploy.yaml

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
  name: tkgcluster1
  namespace: ns01
spec:
  distribution:
   version: v1.17.8
  topology:
   controlPlane:
      count: 1
      class: guaranteed-small
      storageClass: vsan-default-storage-policy
   workers:
      count: 3
      class: guaranteed-small
      storageClass: vsan-default-storage-policy

 

Let’s dissect this manifest to examine the components:

Text</p>
<p>Description automatically generated

 

A: These lines specify the API version and the kind, these should not be modified. To get the available API version for Tanzu, run ‘kubectl api-versions | grep tanzu’.

B: Tanzu Kubernetes cluster name is defined in the field ‘name’ and the supervisor namespace is defined in the ‘namespace’ field.

C: The K8s version (v1.17.8) is defined. This will depend on the downloaded TKG VMs from the content library. Use the command ‘kubectl get tkr’ to obtain the available versions.

D: The created VMs will use the ‘guaranteed-small’ profile.

E: Storage policy to be used by the control plane VMs

For clarity, some fields have been omitted (the defaults will be used). For a full list of parameters, refer to the documentation: https://via.vmw.com/tanzu_params and further manifest file examples: https://via.vmw.com/tanzu_yaml

Once this file has been created, use kubectl to start the deployment, for example, we create our manifest file called ‘TKG-deploy.yaml’ (as above) and apply:

# kubectl apply -f TKG-deploy.yaml

The supervisor cluster will create the required VMs and configure the TKC as needed. This can be monitored using the get and describe verbs on the ‘tkc’ noun:

# kubectl get tkc -o wide
NAME          CONTROL PLANE WORKER   DISTRIBUTION                     AGE   PHASE
tkgcluster1   1               1      v1.17.8+vmware.1-tkg.1.5417466   28d   running

 

# kubectl describe tkc
Name:         tkgcluster1
Namespace:    ns01
Labels:       <none>
Annotations:  API Version:  run.tanzu.vmware.com/v1alpha1
Kind:         TanzuKubernetesCluster
.
.
Node Status:
    tkgcluster1-control-plane-jznzb:            ready
    tkgcluster1-workers-fl7x8-59849ddbb-g8qjq:  ready
    tkgcluster1-workers-fl7x8-59849ddbb-jqzn4:  ready
    tkgcluster1-workers-fl7x8-59849ddbb-kshrt:  ready
  Phase:                                        running
  Vm Status:
    tkgcluster1-control-plane-jznzb:            ready
    tkgcluster1-workers-fl7x8-59849ddbb-g8qjq:  ready
    tkgcluster1-workers-fl7x8-59849ddbb-jqzn4:  ready
    tkgcluster1-workers-fl7x8-59849ddbb-kshrt:  ready
Events:                                         <none>

For more verbose output and to watch the cluster being built out, select yaml as the output with the ‘-w’ switch:

# kubectl get tkc -o yaml -w
.
.
  nodeStatus:
    tkc-1-control-plane-lvfdt: notready
    tkc-1-workers-fxspd-894697d7b-nz682: pending
  phase: creating
  vmStatus:
    tkc-1-control-plane-lvfdt: ready
    tkc-1-workers-fxspd-894697d7b-nz682: pending

 

In vCenter, we can see the TKC VMs being created (as per the manifest) within the supervisor namespace:

A screenshot of a cell phone</p>
<p>Description automatically generated

Once provisioned, we should be able to see the created VMs in the namespace:

# kubectl get wcpmachines
NAME                                    PROVIDERID   IPADDR
tkgcluster1-control-plane-scsz5-2dr55   vsphere://421075449  172.168.61.33
tkgcluster1-workers-tjpzq-gkdn2         vsphere://421019aa  172.168.61.35
tkgcluster1-workers-tjpzq-npw88         vsphere://421055cf  172.168.61.38
tkgcluster1-workers-tjpzq-vpcwx         vsphere://4210d90c  172.168.61.36

 

Once the TKC has been created, login to it by using ‘kubectl vsphere’ with the following options:

# kubectl vsphere login –server=<VIP> \
--insecure-skip-tls-verify \
--tanzu-kubernetes-cluster-namespace=<supervisor namespace> \
--tanzu-kubernetes-cluster-name=<TKC name>

For example:

# kubectl-vsphere login --server=https://172.168.61.129 \
--insecure-skip-tls-verify \
--tanzu-kubernetes-cluster-namespace=ns01 \
--tanzu-kubernetes-cluster-name=tkgcluster1

Login using the user/credentials assigned to the namespace. You can then change contexts between the TKC and the supervisor namespace with the ‘kubectl config’ command (as above).

 

Developer Access to TKCs

Once a TKG cluster has been provisioned, developers will need sufficient permissions to deploy apps and services.

A basic RBAC profile is shown below:

tkc-rbac.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: all:psp:privileged
roleRef:
  kind: ClusterRole
  name: psp:privileged
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: Group
  name: system:authenticated
  apiGroup: rbac.authorization.k8s.io

This can also be achieved using the kubectl command:

# kubectl create clusterrolebinding default-tkg-admin-privileged-binding --clusterrole=psp:vmware-system-privileged --group=system:authenticated

For more information, see the documentation to grant developer access to the cluster: https://via.vmw.com/tanzu_rbac

 

TKG Extension Deployment & Operations

Along with the core Kubernetes, the DevOps team needs additional platform tools for connecting, monitoring, and accessing container workloads running on the K8S cluster. Platform tools like Layer7 Ingress, Log forwarder, Observability tools. Most of the Platform-tools provided by Tanzu Extensions are CRDs.

Document Scope

In this POC guide, we will walk you through deploying and managing the following container platform tools (TKG Extensions) on the TKC clusters. We will also validate our setup by deploying and accessing the sample apps. The following platform tools are shipped as part of the TKG Extensions bundle,

  • Kapp-controller & CertManager (Pre-requisite, common tools)
  • Contour - Layer 7 Ingress
  • FluentBit - Log forwarder
  • Prometheus - Metric Server
  • Grafana - Metric Dashboard

Download the TKG Extensions v1.3.1 Bundle

TKG extensions package can be downloadable from my.vmware.com -> Product Downloads -> Go to Downloads -> VMware Tanzu Kubernetes Grid -> Go TO Downloads-> VMware Tanzu Kubernetes Grid Extension Manifests 1.3.1 -> Download Now. In this TKGExtension section,  we will use pre-created CLI-VM

Extract Tanzu Extensions to CLI-VM

Once downloaded, move the tar file to CLI-VM on your Linux box. And untar the package using the following command

# tar -xzf tkg-extensions-manifests-v1.3.1-vmware.1.tar.gz

# ls  ./tkg-extensions-v1.3.1+vmware.1/extensions

Deploying TKGExtension Pre-Requisite tools

TKG Extensions required two Pre-requisites tools (1) Kapp-controller (2) CertManager. These two components will further be used by other tools as part of the TKG Extension package.

  • Kapp controller: Reconciles the TKGExtension components.
  • CertManager: Most of the Kubernetes platform components need SSL certificates. Cert-manager adds certificates and certificate issuers as resource types in Kubernetes clusters and simplifies the process of obtaining, renewing, and using those certificates.

Install Kapp Controller

kapp-controller.yaml file is available in the /tkg-extensions-v1.3.1+vmware.1/extensions 

Using kubectl command apply the kapp-controller.yaml file.

This deployment creates the following objects

  • Namespace: tkg-system 
  • ServiceAccount: kapp-controller-sa
  • CRD: apps.kappctrl.k14s.io
  • Deployment: kapp-controller
  • ClusterRole & Rolebinding: kapp-controller-cluster-role
# cd ./tkg-extensions-v1.3.1/extensions/
# kubectl apply -f kapp-controller.yaml

namespace/tkg-system created
serviceaccount/kapp-controller-sa created
customresourcedefinition.apiextensions.k8s.io/apps.kappctrl.k14s.io created
deployment.apps/kapp-controller created
clusterrole.rbac.authorization.k8s.io/kapp-controller-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/kapp-controller-cluster-role-binding created

Verify kapp-controller object creation

#  kubectl get ns tkg-system
NAME         STATUS   AGE
tkg-system   Active   17m

# kubectl get crd | grep kapp
apps.kappctrl.k14s.io                                              2021-07-06T20:39:04Z

# kubectl get clusterroles -n tkg-system | grep kapp
kapp-controller-cluster-role                                           2021-07-06T20:39:04Z

Verify Kapp deployment, pods are running

# kubectl get all -n tkg-system
NAME                                  READY   STATUS    RESTARTS   AGE
pod/kapp-controller-bcffd9c44-g5qcc   1/1     Running   0          15m

NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/kapp-controller   1/1     1            1           15m

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/kapp-controller-bcffd9c44   1         1         1       15m

The second pre-requisite for the TKG Extension package is CertManager. CertManager installation YAML file can be located at ./tkg-extensions-v1.3.1+vmware.1/cert-manager

CertManager installation creates the following objects

  • Namespace: cert-manager 
  • CRDs: Creates multiple CRDs including certificaterequests, certificates, challenges, clusterissuers, issuers, orders.acme
  • Deployment: Creates multiple deployments including cainjector, cert-manager, cert-manager-webhook
  • ClusterRole & Rolebinding : kapp-controller-cluster-role

Go to ./tkg-extensions-v1.3.1+vmware.1 & Apply  all the files from cert-manager folder.

# cd ..
# cd ./tkg-extensions-v1.3.1
# kubectl apply -f cert-manager/
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
alidatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created	
Verify CertManager Installation
# kubectl get ns,crd,clusterroles  --all-namespaces | egrep 'cert'
namespace/cert-manager                   Active   81s
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io                                2021-07-06T20:48:29Z
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io                                       2021-07-06T20:48:29Z
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io                                    2021-07-06T20:48:29Z
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io                                     2021-07-06T20:48:29Z
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io                                            2021-07-06T20:48:30Z
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io                                        2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector                                                2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates                                   2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges                                     2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers                                 2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim                                   2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers                                        2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders                                         2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-edit                                                      2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-view                                                      2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:certificatesigningrequests:nodeclient       2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:certificatesigningrequests:selfnodeclient   2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:kube-apiserver-client-approver              2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:kube-apiserver-client-kubelet-approver      2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:kubelet-serving-approver                    2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:legacy-unknown-approver                     2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:controller:certificate-controller                               2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:controller:root-ca-cert-publisher                               2021-07-06T20:10:44Z

Contour Ingress

Contour is a Kubernetes ingress controller that uses the Envoy reverse proxy. Contour deploys both Contour control plane & Envoy data plane.  Kubernetes provides only an ingress API, hence we deploy Contour for ingress controller.

Kubernetes Ingress API has very limited features and might not serve the traffic routing and security needs of the DevOps team. Which include, multi-team FQDN, TLS Delegation, inclusions, Rate-Limiting, Traffic-Shifting, Request-Rewriting, out of box integration with observability tools.

With few CRDs including HTTPProxy, TLSCertificateDelegation, Extension services, contour provides the most advanced ingress/traffic management features

image-20210802111113-1

Contour as a Control plane: Contour is the control plane for the Contour ingress, which synchronizes user ingress requests with the Envoy proxy. i.e. Contour is a management & configuration server for Envoy proxy.

Envoy as a Data Plane: Envoy Is a high-performance reverse proxy,   implements the filleters to fulfill the definition of a DevOps person’s ingress object request. Envoy offers HTTP(S), traffic management, security filters through which the packet flows, and provides a rich set of observability features over the traffic.

Detailed information about the Contour ingress can be found https://projectcontour.io/docs

 

Pre-requisites

Ensure the following pre-requisites

  • TKC / Guest cluster is ready
  • TKG Extension pre-requisites have been deployed on the TKC (Kapp-controller & Cert-manager)

Configuration & Installation

 Contour ingress installation process will create the following objects

  • Namespace:  tanzu-system-ingress
  • Service Account :contour-extension-sa
  • Controlplane: Envoy Proxy
  • Secret: contour-data-values, contour-extension-sa-token-xxx 
  • CRD  :  HAProxy, TLSCertifiateDelegation, ExtensionService
  • Pods : contour, Envoy

Go to ./tkg-extensions-v1.3.1/extensions/ingress/contour

# cd ./tkg-extensions-v1.3.1/extensions/ingress/contour/

Create namespace, service account and roles by applying namespace-role.yaml file.

# kubectl apply -f namespace-role.yaml
namespace/tanzu-system-ingress created
serviceaccount/contour-extension-sa created
role.rbac.authorization.k8s.io/contour-extension-role created
rolebinding.rbac.authorization.k8s.io/contour-extension-rolebinding created
clusterrole.rbac.authorization.k8s.io/contour-extension-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/contour-extension-cluster-rolebinding created

Create Contour config file by copying from the template file given in the package

 # cp vsphere/contour-data-values-lb.yaml.example vsphere/contour-data-values.yaml

Update the data value file: Ensure we have the right version of the Envoy package. (we must use Envoy version v1.17.3_vmware.1)

Edit contour-data-values and tag the Envoy Image to version v1.17.3_vmware.1

# cat vsphere/contour-data-values.yaml
#@data/values
#@overlay/match-child-defaults missing_ok=True
---
infrastructure_provider: "vsphere"
contour:
  image:
    repository: projects.registry.vmware.com/tkg
envoy:
  image:
    repository: projects.registry.vmware.com/tkg
    tag: v1.17.3_vmware.1
  service:
    type: "LoadBalancer"

Note: Do not use Envoy image v1.16.2_vmware.1 due to a CVE. Specify v1.17.3_vmware.1 in the configuration as shown. For more information, see the Release Notes.

Create a secret object for contour

# kubectl create secret generic contour-data-values --from-file=values.yaml=vsphere/contour-data-values.yaml -n tanzu-system-ingress

secret/contour-data-values created

Verify the secret object creation

# kubectl get secrets -n tanzu-system-ingress
NAME                               TYPE                                  DATA   AGE
contour-data-values                Opaque                                1      83s
contour-extension-sa-token-8bm88   kubernetes.io/service-account-token   3      16m
default-token-wdtr6                

Deploy Contour app

# kubectl apply -f contour-extension.yaml
app.kappctrl.k14s.io/contour created 

Validate contour app installation

# kubectl get service envoy -n tanzu-system-ingress -o wide
NAME    TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE   SELECTOR
envoy   LoadBalancer   10.103.217.168   10.198.53.141   80:31673/TCP,443:32227/TCP   19m   app=envoy,kapp.k14s.io/app=1625607181111946240

Key points:

  • Envoy proxy got an EXTERNAL-IP value from the LoadBlanacer installed along with the infrastructure.
  • All ingress objects created by the DevOps team will be  served by Envoy proxy, hence the external access for any workload on this cluster (for layer 7) will be connected to this EXTERNAL_IP
  • We will further use this EXTERNAL-IP for all ingress(layer7) communications.

Verify Envoy DaemonSet

# kubectl get daemonsets -n tanzu-system-ingress
NAME    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
envoy   3         3         3       3            3           <none>          24m 
Verify custom CRDS belongs to Contour
# # kubectl get crd | grep -i contour
extensionservices.projectcontour.io                                2021-07-06T21:33:01Z
httpproxies.projectcontour.io                                      2021-07-07T01:57:37Z
tlscertificatedelegations.projectcontour.io                        2021-07-07T01:57:37Z

Verify custom CRDS belongs to Contour

# kubectl get crd | grep -i contour
extensionservices.projectcontour.io                                2021-07-06T21:33:01Z
httpproxies.projectcontour.io                                      2021-07-07T01:57:37Z
tlscertificatedelegations.projectcontour.io                        2021-07-07T01:57:37Z 

Before use ingress in a workload, let's  verify the status of Contour app objects.   Make sure all the resource are running & Envoy has a EXTERNAL_IP

# kubectl get pod,svc -n tanzu-system-ingress
NAME                          READY   STATUS    RESTARTS   AGE
pod/contour-d968f749d-8tvl4   1/1     Running   0          26m
pod/contour-d968f749d-jmmkm   1/1     Running   0          26m
pod/envoy-2kgxs               2/2     Running   0          26m
pod/envoy-4lmxc               2/2     Running   0          26m
pod/envoy-wm2k9               2/2     Running   0          26m

NAME              TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
service/contour   ClusterIP      10.110.232.3     <none>          8001/TCP                     26m
service/envoy     LoadBalancer   10.103.217.168   10.198.53.141   80:31673/TCP,443:32227/TCP   26m

Fluentbit – Log forwarder

Fluent Bit is an open-source Log Processor and Forwarder which allows you to collect any data like metrics and logs from different sources, enrich them with filters, and send them to multiple destinations. 

Installation Scope: Fluentbit is deployed at the Cluster level, i.e. Platform Operators should deploy the TKC cluster will have its Fluentbit installed on it.

Configuration & Setup

Design choices:

Fluent bit supports tens of outputs including Elastic-search, HTTP, Kafka, Splunk, Syslog, etc.  In this example, we will use “Syslog” output and will forward the logs to the vRealize Log Insite server.

Pre-requisites

  • TKC / Guest cluster is ready
  • TKG Extension pre-requisites have been deployed on the TKC (kapp-controller & Cert-manager)
  • Log destination is available and reachable from TKC Cluster.
Configuration & Installation

Fluent bit runs as a DaemonSet with 8 replications which serve as a Log Collector, Aggregator & Forwarder.

Fluentbit installation process will create the following objects

  • Namespace:  tanzu-system-logging
  • Service Account: fluent-bit-extension-sa
  • Roles: fluent-bit-extension-role, fluent-bit-extension-cluster-role

Navigate to Fluentbit installation yaml file

# cd ./tkg-extensions-v1.3.1/extensions/logging/fluent-bit	

Create namespace, service account and roles by applying namespace-role.yaml file

# kubectl apply -f namespace-role.yaml
namespace/tanzu-system-logging created
serviceaccount/fluent-bit-extension-sa created
role.rbac.authorization.k8s.io/fluent-bit-extension-role created
rolebinding.rbac.authorization.k8s.io/fluent-bit-extension-rolebinding created
clusterrole.rbac.authorization.k8s.io/fluent-bit-extension-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/fluent-bit-extension-cluster-rolebinding created

Configure Fluentbit config values

Fluentbit Config values are found in the appropriate sub-folders in the fluent bit extension, In this document we will pick syslog example.

 # cp syslog/fluent-bit-data-values.yaml.example syslog/fluent-bit-data-values.yaml

List out config values file before updating

# cat syslog/fluent-bit-data-values.yaml
#@data/values
#@overlay/match-child-defaults missing_ok=True
---
logging:
  image:
    repository: projects.registry.vmware.com/tkg
tkg:
  instance_name: "<TKG_INSTANCE_NAME>" 
  cluster_name: "<CLUSTER_NAME>"
fluent_bit:
  output_plugin: "syslog"
  syslog:
    host: "<SYSLOG_HOST>"
    port: "<SYSLOG_PORT>"
    mode: "<SYSLOG_MODE>"
    format: "<SYSLOG_FORMAT>"

Note:

  • Instance_name: Mandatory but arbitrary; Appears in the logs
  • Cluster_name: name of the target TKC / guest cluster

Update config file to point to the target log server (VRLI in this example)

  1. # vi syslog/fluent-bit-data-values.yaml
    #@data/values
    #@overlay/match-child-defaults missing_ok=True
    ---
    logging:
      image:
        repository: projects.registry.vmware.com/tkg
    tkg:
      instance_name: "prasad-tkc-clu-01"
      cluster_name: "prasad-clu-01"
    fluent_bit:
      output_plugin: "syslog"
      syslog:
        host: "10.156.134.90"
        port: "514"
        mode: "tcp"
        format: "rfc5424"
    

Create a FluentBit Secret with data values for our log destination

# kubectl create secret generic fluent-bit-data-values --from-file=values.yaml=syslog/fluent-bit-data-values.yaml -n tanzu-system-logging

secret/fluent-bit-data-values created 

Note: Repeat the above two steps  (updating config & creating a secret) per destination type of your choice like Elasticsearch, HTTP, Kafka, Splunk etc.

Verify created Secret

# kubectl get secret -n tanzu-system-logging
NAME                                  TYPE                                  DATA   AGE
default-token-zt8qr                   kubernetes.io/service-account-token   3      21m
fluent-bit-data-values                Opaque                                1      44s
fluent-bit-extension-sa-token-w5826   kubernetes.io/service-account-token   3      21m

Deploy Fluentbit app

# kubectl apply -f 	
app.kappctrl.k14s.io/fluent-bit created

Check  Fluentbit app deployment status

# kubectl get app fluent-bit -n tanzu-system-logging
NAME         DESCRIPTION           SINCE-DEPLOY   AGE
fluent-bit   Reconcile succeeded   38s            63s
Note : Status should change from Reconcile to Reconcile Succeeded.  

Check the pods for the FluentBit app

# kubectl get pods -n tanzu-system-logging
NAME               READY   STATUS    RESTARTS   AGE
fluent-bit-bxqf5   1/1     Running   0          17m
fluent-bit-dpmpf   1/1     Running   0          17m
fluent-bit-h72hp   1/1     Running   0          17m
fluent-bit-r9dq9   1/1     Running   0          17m 

Note: These pod names are important to troubleshoot FluentBit in case if any issues.

Prometheus Metric Server

Once deployed, Prometheus can scrape the metrics from the supported resources (like deployments with /metrics or any other accessible API ). Many of the modern apps and tools implementing observability patterns like /metrics API on which Prometheus can scrape the metrics.

Scope In this section we will deploy the TKG Extension for Prometheus to collect and view metrics for Tanzu Kubernetes clusters. In addition, we will also perform day1 & day2 Lifecycle management changes.

Pre-Requisites

  • TKC/Guest cluster is available with default service Domain (cluster. local) & default persistent storage class.
  • On CLI-VM TKGExtension v1.3.1 package has been downloaded and unpacked.

Note:  In case of not having a persistent storage class, we can create one and update the persistent storage class name in the Prometheus config file.

Validate default persistent storage class

# kubectl get sc
NAME                                    PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
vsan-default-storage-policy (default)   csi.vsphere.vmware.com   Delete          Immediate           true                   6d 

Connect to TKC

# kubectl vsphere login --server https://${SUPERVISOR_CLUSTER_IP} --insecure-skip-tls-verify  -u ${PRE_USER_NAME} --tanzu-kubernetes-cluster-name  ${TKC_CLUSTER_NAME}
kubectl config use-context ${TKC_CLUSTER_NAME}

Switched to context "prasad-clu-01".

Configuration & Installation  

Prometheus installation process will create the following objects

  • Namespace:  tanzu-system-monitoring
  • Service Account:  Prometheus-extension-sa
  • Roles: Prometheus-extension-role, Prometheus-extension-cluster-role
  • Deployment(s): Prometheus creates 4 Deployments objects  
  • DaemonSet(s): Prometheus creates 2 DaemonSet objects  

Ref: For complete details of Prometheus you can refer to VMware official docs & Prometheus official docs.

Create namespace & roles

# cd  ./tkg-extensions-v1.3.1/extensions/monitoring/prometheus/

# kubectl apply -f namespace-role.yaml
namespace/tanzu-system-monitoring created
serviceaccount/prometheus-extension-sa created
role.rbac.authorization.k8s.io/prometheus-extension-role created
rolebinding.rbac.authorization.k8s.io/prometheus-extension-rolebinding created
clusterrole.rbac.authorization.k8s.io/prometheus-extension-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-extension-cluster-rolebinding created

Customise Configuration / data values

Extract a configuration file:  Prometheus config files are available at ./tkg-extensions-v1.3.1/extensions/monitoring/prometheus/

# cp prometheus-data-values.yaml.example prometheus-data-values.yaml

Note: No need to change any default values unless the cluster doesn’t have a default storage class (or) one wishes to use the specific storage class for Prometheus & AlertManager.

 We will make the following additions to the prometheus-data-values. YAML

  • Ingress: With this section, we will be able to access the Prometheus GUI using our ingress API. In order to get success with Prometheus Ingress object creation, we should have a Contour or other ingress controller installed on the cluster.
  • Prometheus_server.pvc: To specific the storage class name for Prometheus pvc object. This entry is needed only if there is no default storage class defined in the cluster
  • Alertmanager.pvc: To specific the storage class name for AlertManager PVC object. This entry is needed only if there is no default storage class defined in the cluster

Customize Storage Class for Prometheus & AlertManager

# vi   prometheus-data-values.yaml
monitoring:
  ingress:
    enabled: true
    virtual_host_fqdn: "prometheus.cluster.test"
    prometheus_prefix: "/"
    alertmanager_prefix: "/alertmanager/"  
  prometheus_server:
    image:
      repository: projects.registry.vmware.com/tkg/prometheus
    pvc:
      storage_class: vsan-default-storage-policy
      storage: "8Gi"
  alertmanager:
    image:
      repository: projects.registry.vmware.com/tkg/prometheus
    pvc:
      storage_class: vsan-default-storage-policy
      storage: "8Gi"
  kube_state_metrics:
    image:
      repository: projects.registry.vmware.com/tkg/prometheus
  node_exporter:
    image:
      repository: projects.registry.vmware.com/tkg/prometheus
  pushgateway:
    image:
      repository: projects.registry.vmware.com/tkg/prometheus
  cadvisor:
    image:
      repository: projects.registry.vmware.com/tkg/prometheus
  prometheus_server_configmap_reload:
    image:
      repository: projects.registry.vmware.com/tkg/prometheus
  prometheus_server_init_container:
    image:
      repository: projects.registry.vmware.com/tkg/prometheus

Note:  Once after the successful creation of objects, don’t forget to create a DNS entry or Host entry with the FQDN (specified in the above config file) with Envoy proxy External_IP value. As a reminder, all ingress requests on our cluster will be served on Envoy’s LB IP address.

Create Prometheus secret using the Prometheus-data-values (edited in the previous step) 

# kubectl create secret generic prometheus-data-values --from-file=values.yaml=prometheus-data-values.yaml -n tanzu-system-monitoring

secret/prometheus-data-values created

Deploy Prometheus App

# kubectl apply -f prometheus-extension.yaml
app.kappctrl.k14s.io/prometheus created

Ensure the Prometheus app status turned to  Reconcile succeeded.

# kubectl get app prometheus  -n tanzu-system-monitoring
NAME         DESCRIPTION           SINCE-DEPLOY   AGE
prometheus   Reconcile succeeded   15s            91s

Full details for the Prometheus app configuration can be availed here

# kubectl get app prometheus -n tanzu-system-monitoring -o yaml
……..

  inspect:
    exitCode: 0
    stdout: |-
      Target cluster 'https://10.96.0.1:443'
      08:47:37PM: debug: Resources: Ignoring group version: schema.GroupVersionResource{Group:"stats.antrea.tanzu.vmware.com", Version:"v1alpha1", Resource:"antreanetworkpolicystats"}
      08:47:37PM: debug: Resources: Ignoring group version: schema.GroupVersionResource{Group:"stats.antrea.tanzu.vmware.com", Version:"v1alpha1", Resource:"networkpolicystats"}
      Resources in app 'prometheus-ctrl'
   Namespace      Name  Kind     Owner    Conds.  Rs  Ri  Age
      (cluster)                prometheus-alertmanager      ClusterRole            kapp     -       ok  -   1m
      ^    prometheus-alertmanager ClusterRoleBinding     kapp     -       ok  -   1m
      ^    prometheus-cadvisor   ClusterRole            kapp     -       ok  -   1m
      ^    prometheus-cadvisor     ClusterRoleBinding     kapp     -       ok  -   1m
      ^    prometheus-kube-state-metrics   ClusterRole            kapp     -       ok  -   1m
      ^    prometheus-kube-state-metrics    ClusterRoleBinding     kapp     -       ok  -   1m
      ^    prometheus-node-exporter   ClusterRole            kapp     -       ok  -   1m
……

Let’s check for Deployments & DaemonSet object creation status

# kubectl get daemonsets -n tanzu-system-monitoring
NAME                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
prometheus-cadvisor        3         3         3       3            3           <none>          7m48s
prometheus-node-exporter   3         3         3       3            3           <none>          7m48s

# kubectl get deployments -n tanzu-system-monitoring
NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
prometheus-alertmanager         1/1     1            1           9m43s
prometheus-kube-state-metrics   1/1     1            1           9m43s
prometheus-pushgateway          1/1     1            1           9m43s
prometheus-server               1/1     1            1           9m43s

Let’s check for the PVC objects created by Prometheus & AlertManager

# kubectl get pvc -n tanzu-system-monitoring
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
prometheus-alertmanager   Bound    pvc-c014d1da-1aa2-4f01-a57f-87de57464ca0   2Gi        RWO            vsan-default-storage-policy   19m
prometheus-server         Bound    pvc-39cd774f-9b4c-42cd-b1ad-1042fb3273bb   8Gi        RWO            vsan-default-storage-policy   19m
Accessing Prometheus Web interface

Create a Host entry on your CLI-VM to access Prometheus GUI

echo "10.198.53.141 prometheus.cluster.test" | sudo tee -a /etc/hosts

Access Prometheus GUI from the CLI-VM web browser. The following URI references are served by Prometheus server Access Prometheus GUI using https://prometheus.cluster.test/

image 125

Example - Metrics based on the  “node_memory_MemAvailable_bytes”

  • Browse to Prometheus GUI  https://prometheus.cluster.test/graph
  • Type “node_memory_MemAvailable_bytes” in the Expression text box
  • Select Graph Tab in the GUI to see the results in graph view.

image 126

 

Alternatively, you can also choose the tab “Console” on the same UI, which provides the events filtered by the query values.

image 127

To view the configured metrics https://prometheus.cluster.test/metrics

image 128

Cluster Status

Access Prometheus Cluster status GUI using https://prometheus.cluster.test/status

image 129

Grafana – Observability Dashboard

With Grafana we can create, explore and share all of our data through, flexible dashboards In this section we will go through Deploy & maintaining Grafana on TKC with the help of TanzuExtensions1.3.1

Configuration & Installation

Prerequisites

  • TKC/Guest with default service Domain (cluster.local) is up and running
  • On CLI-VM TKGExtension v1.3.1 package has been downloaded and unpacked.

Prepare Configuration

Create grafana-data-values file from the given samples

# cd ./tkg-extensions-v1.3.1/extensions/monitoring/grafana/
# cp grafana-data-values.yaml.example grafana-data-values.yaml

Edit Grafana configuration values

  • Add an entry monitoring.grafana.secret.admin_user with base64 encoded value YWRtaW4=
  • Replace <ADMIN_PASSWORD> with user’s choice base64 values
  • In this example, we are using admin as the password

Note: Remember this username / password for accessing Grafana GUI  “admin/admin”.

# echo admin| base64 

YWRtaW4=

# vi Grafana-data.values.yaml

#@data/values
#@overlay/match-child-defaults missing_ok=True
---
monitoring:
  grafana:
    image:
      repository: "projects.registry.vmware.com/tkg/grafana"
    secret:
      admin_user: YWRtaW4=
      admin_password: YWRtaW4=
  grafana_init_container:
    image:
      repository: "projects.registry.vmware.com/tkg/grafana"
  grafana_sc_dashboard:
    image:
      repository: "projects.registry.vmware.com/tkg/grafana"

You can use the remaining default values as it is. Alternatively, one can customize the data values according to their deployment needs. Full list of config values can be found in VMware official documentation.

Create namespace and RBACK roles for Grafana

# kubectl apply -f namespace-role.yaml
namespace/tanzu-system-monitoring unchanged
serviceaccount/grafana-extension-sa created
role.rbac.authorization.k8s.io/grafana-extension-role created
rolebinding.rbac.authorization.k8s.io/grafana-extension-rolebinding created
clusterrole.rbac.authorization.k8s.io/grafana-extension-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/grafana-extension-cluster-rolebinding created

Create a secret object for Grafana

# kubectl -n tanzu-system-monitoring create secret generic grafana-data-values --from-file=values.yaml=grafana-data-values.yaml
secret/grafana-data-values created

Deploy Grafana 

# kubectl apply -f grafana-extension.yaml

app.kappctrl.k14s.io/grafana created

Validate deployment

# kubectl get app grafana -n tanzu-system-monitoring 

NAME      DESCRIPTION   SINCE-DEPLOY   AGE
grafana   Reconcile succeeded 55s            56s

Validate deployment with full config values

# kubectl get app grafana -n tanzu-system-monitoring -o yaml
apiVersion: kappctrl.k14s.io/v1alpha1
kind: App
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kappctrl.k14s.io/v1alpha1","kin":"App","metadata":{"annotations":{"tmc.cloud.vmware.com/managed":"false"},"name":"grafana","namespace":"tanzu-system-monitoring"},"spec":{"deploy":[{"kapp":{"rawOptions":["--wait-timeout=5m"]}}],"fetch":[{"image":{"url":"projects.registry.vmware.com/tkg/tkg-extensions-templates:v1.3.1_vmware.1"}}],"serviceAccountName":"grafana-extension-sa","syncPeriod":"5m","template":[{"ytt":{"ignoreUnknownComments":true,"inline":{"pathsFrom":[{"secretRef":{"name":"grafana-data-values"}}]},"paths":["tkg-extensions/common","tkg-extensions/monitoring/grafana"]}}]}}
    tmc.cloud.vmware.com/managed: "false"
  creationTimestamp: "2021-07-13T13:11:10Z"
  finalizers:
  - finalizers.kapp-ctrl.k14s.io/delete
  generation: 2
  managedFields:
  - apiVersion: kappctrl.k14s.io/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
…

Accessing Grafana web-interface

Grafana default configuration has been deployed with the grafana.tanzu.system URI. You can validate the object creation as following

# kubectl get httpproxy -o wide -n tanzu-system-monitoring

NAME                   FQDN                      TLS SECRET       STATUS   STATUS DESCRIPTION
grafana-httpproxy      grafana.system.tanzu      grafana-tls      valid    Valid HTTPProxy
prometheus-httpproxy   prometheus.cluster.test   prometheus-tls   valid    Valid HTTPProxy 

FQDN grafana.system.tanzu is being served by Envoy’s EXTERNAL_IP (Contour ingress data plane). 

Make a host entry in the CLI-VM (or) add a DNS A record in your DNS server  with the Envoy’s EXTERNAL_IP mapping to the grafana.system.tanzu  is.  

Get Envoy’s EXTERNAL_IP

# kubectl get -n tanzu-system-ingress service envoy -o wide	

NAME    TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE     SELECTOR
envoy   LoadBalancer   10.103.217.168   10.198.53.141   80:31673/TCP,443:32227/TCP   6d20h   app=envoy,kapp.k14s.io/app=1625607181111946240

Add host entry on the cli-vm with Envoy’s EXTERNAL_IP map to FQDN

# echo "10.198.53.141 grafana.system.tanzu" | sudo tee -a /etc/hosts

Open Browser on you CLI-VM with https://grafana.system.tanzu/login

image 130

Logging to the Dashboard using Credentials

  • User: admin
  • Password: admin

Note: The first-time login will prompt to change the login password for future uses,

A successful login will give us a Grafana welcome page.

 

image 131

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Configuring data source

Grafana has two core concepts (1) data source (2) dashboard.

Accessing default data source (Prometheus)

Grafana from TKG Extensions  comes with a default data source  Prometheus running on the same TKC/Guest Cluster. Navigate to left side menu panel, click on the settings->Configuration->Data Sources

image 132

Click on the Prometheus row marked as default.  You can notice the connection details has already been filled in by default.

image 133

Creating & accessing Dashboard

Create your first dashboard using the web interface

Navigate to Menu->”+”->Create->Dashboard -> Click “+ Add new panel”

image 134

New panel will get created, with empty values.

image 135

  • Ensure the default Prometheus data source has been selected
  • Enter query “node_memory_MemAvailable_bytes”in the Metrics section & Press Shift+Enter key to execute the query

image 136

You can notice the query results in graph

image 137

 

App Deployment and Testing

Contour - Example with Ingress API

Sample app with Layer7 Ingress (HTTP & HTTPS)

In the same TKGExtension package we have a sample app with HTTP & https.  Let’s use these examples to validate the Contour ingress using K8S standard ingress API.

In order to perform the validations, we need to create the following objects

  • Deployment: App deployment manifest
  • Services: Two services S1 & S2. This can be used with the example of Traffic shifting between the services (optionally) Kubernetes namespace
  • Ingres Object: A Layer7 access definition. Ingress can be HTTP, HTTPS, and sending all the traffic to one service or splitting the traffic between two services.
  • Secrets: TLS key-value, needed only for HTTPS use case.

In the package folder $HOME/tkg-extensions-v1.3.1/ingress/examples, we have three subfolders. The folder named Common contains the app with service, another folder http-ingress contains the HTTP ingress object definition and the folder https-ingress contains the TLS secret for the HTTP and HTTPS ingress object.

Deploy App:  Deployment & Service objects. As defined in the yaml definition, Objects will be created in the test-ingress namespace.

# ls ./common/
00-namespaces.yaml  01-services.yaml  02-deployments.yaml

# kubectl apply -f common/
namespace/test-ingress created
service/s1 created
service/s2 created
deployment.apps/helloweb created

Let's  verify the objects we just created. ( Deployment, pods, services…)

# kubectl get all -n test-ingress
NAME                            READY   STATUS    RESTARTS   AGE
pod/helloweb-749c995f85-6zj7s   1/1     Running   0          13m
pod/helloweb-749c995f85-dmf8g   1/1     Running   0          13m
pod/helloweb-749c995f85-j9qmh   1/1     Running   0          13m

NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/s1   ClusterIP   10.96.253.139   <none>        80/TCP    13m
service/s2   ClusterIP   10.107.93.77    <none>        80/TCP    13m

NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/helloweb   3/3     3            3           13m

NAME                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/helloweb-749c995f85   3         3         3       13m
Ingress Object for HTTP
# kubectl apply -f ./http-ingress/
ingress.extensions/http-ingress create

image 138

 

 

 

 

 

 

 

 

 

 

 

 

 

A: Object type. Here we are using “ingress” from the K8S standard API.

B: FQDN name through which this ingress object can be accessed.

C: subdomain/route to access backend workload 

D: Backend service which processes the requests via the ingress path

Key points:

•           All FQDNs are served using the Envoy (EXTERNAL LB) IP address

•           We should have either DNS entry or host entry added for this FQDN path.

•           All ingress & HTTPProxy objects will have the same IP address, i.e. Envoy EXTERNAL_IP

Verify Ingress object and envoy LB-IP to access that ingress

# kubectl get -n tanzu-system-ingress service envoy -o wide
NAME    TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE   SELECTOR
envoy   LoadBalancer   10.103.217.168   10.198.53.141   80:31673/TCP,443:32227/TCP   12h   app=envoy,kapp.k14s.io/app=1625607181111946240

Since we don’t have an external DNS to resolve our FQDN yet, for now lets add a host entry for the app FQDN so that we can access the app using the http with FQDN name directly.

# echo "10.198.53.141 foo.bar.com" | sudo tee -a /etc/hosts
10.198.53.141 foo.bar.com

# cat /etc/hosts
127.0.0.1	localhost
10.198.53.141 foo.bar.com

Access first service from  app http://foo.bar.com/foo using curl or Web-browser form you cli-vm

# curl http://foo.bar.com/foo
Hello, world!
Version: 1.0.0
Hostname: helloweb-749c995f85-j9qmh

image 140

Access second service from app using curl or Web-browser form your CLI-VM

# curl http://foo.bar.com/bar
Hello, world!
Version: 1.0.0
Hostname: helloweb-749c995f85-dmf8g

For HTTPS ingress object we need to create an additional object called Secret with tls.crt and tls.key values. The secret object is a standard K8S object, which we will refer to in our HTTPS ingress.

image 141

 

 

 

 

 

 

 

 

 

 

 

 

A: Object type. Here we are using “Secret” from K8S standard API.

B: Reference name for this object

C: TLS Cert value

D: TLS Key value

Key points:  We will refer to this object in our next object https-ingress.

image 142

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A: Referring Secret object we created earlier in our https-ingress object.

Key points:   There is not much difference between http-ingress object or https-ingress object apart from an important API TLS.

# kubectl apply -f https-ingress/
Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
ingress.extensions/https-ingress configured
secret/https-secret unchanged
ingress.extensions/http-ingress create

Since we already have host entry we can test the app with using Access   https://foo.bar.com/foo  using curl or Web-browser form you cli-vm

# curl https://foo.bar.com/foo --insecure
Hello, world!
Version: 1.0.0
Hostname: helloweb-749c995f85-j9qmh
Let us check from the browser. 

Let us check from the browser

Since it is a self-signed certificate, we should accept the browser’s security settings before we get to the page

image 143

 Now let us check other subdomain   https://foo.bar.com/bar using curl or Web-browser from the CLI-VM

# curl https://foo.bar.com/bar --insecure
Hello, world!
Version: 1.0.0
Hostname: helloweb-749c995f85-dmf8g

image 144

​Prometheus metrics for Custom App

​​Implementing metrics in custom apps

App owners need to implement /metrics or another equivalent API call in their app.  In order to get it scraped by Prometheus. Once the functionality available at the APP, DevOps users can enable the metrics forward by adding “annotations on pods”. Annotation must be part of the pod metadata.

Note: Logical objects such as Services, DaemonSet will have no impact with this annotation.

Example with annotation applying to workloads.

apiVersion: apps/v1beta2 # for versions before 1.8.0 use extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluentd-elasticsearch
  namespace: weave
  labels:
    app: fluentd-logging
spec:
  selector:
    matchLabels:
      name: fluentd-elasticsearch
  template:
    metadata:
      labels:
        name: fluentd-elasticsearch
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '9102'
    spec:
      containers:
      - name: fluentd-elasticsearch
        image: gcr.io/google-containers/fluentd-elasticsearch:1.20

Deploy Kuard to verify setup

A very basic test to see if the K8s cluster is operational is to deploy KUARD (Kubernetes Up And Running)

Use the commands below to pull the KUARD image and assign an IP to it. (HaProxy will serve the IP from the workload subnet):

# kubectl run --restart=Never --image=gcr.io/kuar-demo/kuard-amd64:blue kuard
# kubectl expose pod kuard --type=LoadBalancer --name=kuard --port=8080

Once deployed, we can list the external IP assigned to it using the ‘get service’ command:

# kubectl get service
NAME       TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)          AGE
kuard      LoadBalancer   10.96.0.136   152.17.31.132   8080:30243/TCP   6s

 

Therefore, opening a browser to the ‘External-IP’ on port 8080, i.e. http://152.17.31.32:8080 should give us a webpage showing the KUARD output:

Graphical user interface, text, application, email</p>
<p>Description automatically generated

 

Persistent Volume Claims (PVC)

To create a PVC, first we need to map any storage policies (defined in vCenter) we wish to use to the supervisor namespace.

In this example, we describe how to do this with standard (block) vSAN volumes. Note, at the time of writing, using the vSAN File Service to provision RWX volumes for Tanzu is not supported.

First, create the storage policy in vCenter, under Menu > Policies and Profiles > VM Storage Policies. Note the convention of using lowercase names:

Graphical user interface, application</p>
<p>Description automatically generated

Then add them to the namespace by clicking on ‘Edit Storage’

Graphical user interface, application</p>
<p>Description automatically generated

Select any additional storage policies. In the example below, we add the new ‘raid-1’ policy:

Graphical user interface, application</p>
<p>Description automatically generated

To list all of the available storage classes, we run:

# kubectl get storageclass
NAME     PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
raid-1   csi.vsphere.vmware.com   Delete          Immediate           true                  3m54s

We can then create a PVC using a manifest. In the example below, we create a 2Gi volume:

2g-block-r1.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: block-pvc-r1-2g
spec:
  storageClassName: raid-1
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

Then apply this:

# kubectl apply -f 2g-block-r1.yaml

To see the details:

# kubectl get pvc
NAME                STATUS   VOLUME        CAPACITY   ACCESS MODES   STORAGECLASS      AGE
block-pvc-r1-2g     Bound    pvc-0a612267  2Gi        RWO            raid-1            51m

Now we have a volume, we can create attach this to a pod. In the example below, we create a pod using Busybox and mount the volume created above:

simple-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: simple-pod
spec:
  containers:
  - name: simple-pod
    image: "k8s.gcr.io/busybox"
    volumeMounts:
    - name: block-vol
      mountPath: "/mnt/volume1"
    command: [ "sleep", "1000000" ]
  volumes:
    - name: block-vol
      persistentVolumeClaim:
        claimName: block-pvc-r1-2g

Once the pod has been created, we can examine the storage within it.

First we run a shell on the pod:

# kubectl exec -it simple-pod -- /bin/sh

Using the df command, we can see the volume has been attached and is available for consumption:

# df -h /mnt/volume1/
Filesystem                Size      Used Available Use%  Mounted on
/dev/sdb                  1.9G      6.0M      1.8G   0%  /mnt/volume1

Furthermore, we can see the PVCs created by a Kubernetes admin in vCenter by navigating to either Datacenter > Container Volumes or Cluster > Monitor > Container Volumes:

Graphical user interface, website</p>
<p>Description automatically generated

Clicking on the square next to the volume icon shows more information about the PVC and where it is used. From our example, we see the guest cluster, the pod name “simple pod” and the PVC name given in the manifest:

Graphical user interface, application</p>
<p>Description automatically generated

 

Graphical user interface, text, application, email</p>
<p>Description automatically generated

Clicking on Physical Placement shows (as we are using a vSAN store) the backing vSAN details:

Graphical user interface, application</p>
<p>Description automatically generated

We can also see details of the PVC in vCenter under Cluster > Namespaces > Namespace > Storage > Persistent Volume Claims:

Graphical user interface</p>
<p>Description automatically generated

Here, we can see more details – specifically Kubernetes parameters, if we click on ‘View YAML’:

Graphical user interface, table</p>
<p>Description automatically generated

 

Graphical user interface, text, application</p>
<p>Description automatically generated

 

 

 

Wordpress & MySQL app

The Kubernetes documentation has a practical example on using PVCs using WordPress and MySQL:
https://kubernetes.io/docs/tutorials/stateful-application/mysql-wordpress-persistent-volume/

However, the PVC claims in the example manifests do not include a storage policy (which is required for the PVC to be created). To successfully deploy this app, we must either add a default storage policy into our TKC manifest or edit the manifests to define a storage policy.

 The outline steps for this example app are as follows:

  1. Ensure that an TKC RBAC profile has been applied to the cluster (see the previous section on creating TKG clusters and granting developer access)
  2. Create a new directory on the jump VM
  3. Generate the kustomization.yaml file with a password
  4. Download the two manifest files for mySQL and Wordpress using curl
  5. Add the two files to the kustomization.yaml as shown
  6. Follow one of the two options below to satisfy the storage policy requirement. (For the quickest solution, copy and paste the awk line in option 2)

Thus, firstly we define our RBAC profile; as before:

# kubectl create clusterrolebinding default-tkg-admin-privileged-binding --clusterrole=psp:vmware-system-privileged --group=system:authenticated

We create a directory ‘wordpress’:

# mkdir wordpress; cd wordpress

As per the example, we generate the kustomization.yaml file, entering a password (we combine steps 3&5 for brevity):

# cat <<EOF > kustomization.yaml
secretGenerator:
- name: mysql-pass
  literals:
  - password=P@ssw0rd
resources:
  - mysql-deployment.yaml
  - wordpress-deployment.yaml
EOF

Then download the two manifests:

# curl -LO https://k8s.io/examples/application/wordpress/mysql-deployment.yaml
# curl -LO https://k8s.io/examples/application/wordpress/wordpress-deployment.yaml

Looking at the manifest file wordpress-deployment.yaml:

Wordpress-deployment.yaml

apiVersion: v1
kind: Service
metadata:
  name: wordpress
  labels:
    app: wordpress
spec:
  ports:
    - port: 80
  selector:
    app: wordpress
    tier: frontend
  type: LoadBalancer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: wp-pv-claim
  labels:
    app: wordpress
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress
  labels:
    app: wordpress
spec:
  selector:
    matchLabels:
      app: wordpress
      tier: frontend
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: wordpress
        tier: frontend
    spec:
      containers:
      - image: wordpress:4.8-apache
        name: wordpress
        env:
        - name: WORDPRESS_DB_HOST
          value: wordpress-mysql
        - name: WORDPRESS_DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-pass
              key: password
        ports:
        - containerPort: 80
          name: wordpress
        volumeMounts:
        - name: wordpress-persistent-storage
          mountPath: /var/www/html
      volumes:
      - name: wordpress-persistent-storage
        persistentVolumeClaim:
          claimName: wp-pv-claim

We notice that:

  • It creates a Loadbalancer service in the first instance. This will interact with the network provider we have provisioned (either HaProxy/NSX ALB or NCP in the case for NSX-T).
  • A Persistent Volume Claim of 20GB is instantiated
  • The WordPress containers are specified (to be pulled/downloaded) 

Now, there is no mapping to a storage class given, so as-is this deployment will fail. There are two options to add this:

Option 1: Patch or Edit the TKC manifest to add a default StorageClass

Here, we will define a default storage policy ‘defaultClass’ for our TKG cluster. First change context to the namespace that the TKG cluster resides. In the example below, this is ‘ns01’:

# kubectl config use-context ns01

Then patch with the storage class we want to make the default; in this case “vsan-default-storage-policy”:

# kubectl patch storageclass vsan-default-storage-policy -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}

Alternatively, another way to achieve this to edit the tkc manifest for your TKG cluster, for instance:

# kubectl edit tkc/tkgcluster1

Then add the following lines under spec/settings:

storage:
   defaultClass: <storage policy>

For example, we add the ‘vsan-default-storage-policy’:

spec:
  distribution:
    fullVersion: v1.17.8+vmware.1-tkg.1.5417466
    version: v1.17.8
  settings:
    network:
      cni:
        name: antrea
      pods:
        cidrBlocks:
        - 192.168.0.0/16
      serviceDomain: cluster.local
      services:
        cidrBlocks:
        - 10.96.0.0/12
    storage:
      defaultClass: vsan-default-storage-policy

We should then see the effects when running a ‘get storageclass’:

# kubectl get storageclass
NAME                                    PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
vsan-default-storage-policy (default)   csi.vsphere.vmware.com   Delete          Immediate           true                   40h

For more details on the default StorageClass, see the Kubernetes documentation, https://kubernetes.io/docs/tasks/administer-cluster/change-default-storage-class/

For more details on editing the TKC manifest, see the documentation: https://via.vmw.com/tanzu_update_manifest

Option 2: Edit the app manifest files to explicitly add the storage class:

Add the following line to the two manifest files after the line ‘- ReadWriteOnce

storageClassName: <storage policy>

For example:

spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: vsan-default-storage-policy

We could also use a script to add this line in to both files. For example, using awk:

# for x in $(grep -l 'app: wordpress' *); do awk '/ReadWriteOnce/{print;print "  storageClassName: vsan-default-storage-policy";next}1' $x >> ${x/.yaml/}-patched.yaml; done

Patched versions are also available in the Github repository

After the storage policy has been set, run the following command within the directory:

# kubectl apply -k ./

Once the manifests are applied, we can see that the PVC has been created:

# kubectl get pvc
NAME              STATUS   VOLUME     CAPACITY   ACCESS    STORAGECLASS                  
mysql-pv-claim    Bound    pvc-6d9d   20Gi       RWO       vsan-default-storage-policy   
wp-pv-claim       Bound    pvc-1906   20Gi       RWO       vsan-default-storage-policy 

We can see that the Loadbalancer service has been created with a dynamic IP address. The external IP can be obtained from the service ‘wordpress’:

# kubectl get services wordpress
NAME        TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)        AGE
wordpress   LoadBalancer   10.101.154.101   172.168.61.132   80:31840/TCP   3m21s

If we were to have a look within these network providers, we would see our service there

For example, in NSX ALB, if we navigate to Applications > Virtual Services:

Graphical user interface, application</p>
<p>Description automatically generated

Further settings, logs, etc. can then be explored inside of the network provider.

In vCenter, we can see that the PVC volumes have been created and tagged with the application name:

Graphical user interface, application</p>
<p>Description automatically generated

 

Finally, putting the external IP (in this case 172.168.61.132) into a browser should give the WordPress setup page:

Graphical user interface, application, Word</p>
<p>Description automatically generated

 

To remove the app,

# kubectl delete -k ./

 

 

Re-deploy WordPress app with a Static Load balancer address

Earlier we saw that the load balancer address (172.168.161.105) had been automatically assigned.  With NSX-T and NSX ALB, we can statically define the load balancer address.

We edit our load balancer spec, defined in wordpress-deployment.yaml, and add the extra line ‘loadBalancerIP’ pointing to the address 172.168.161.108:

apiVersion: v1
kind: Service
metadata:
  name: wordpress
  labels:
    app: wordpress
spec:
  ports:
    - port: 80
  selector:
    app: wordpress
    tier: frontend
  type: LoadBalancer
  loadBalancerIP: 172.168.161.108

 
Apply this again:

# kubectl apply -k ./

We can confirm that the service uses the static IP:

# kubectl get service wordpress
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)        AGE
wordpress   LoadBalancer   10.107.115.82   172.168.161.108   80:30639/TCP   5m1s

 

For more information on using the load balancer service with a static IP address, see the example given in the official documentation (which also covers an important security consideration): https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-83060EA7-991B-4E1E-BBE4-F53258A77A9C.html

 

Developer Self-Service Namespace: Create a new Supervisor Namespace and TKC

Supervisor Namespaces provide logical segmentation between sets of resources and permissions.  Traditionally, a vSphere admin manages infrastructure resources that are then made available into environments for users to consume. Whilst this model ensures that the vSphere admin is able to fairly manage resources across the organisation, there is an operational overhead to this.

Here, we give a devops user the ability to create Supervisor Namespaces, using a resource template that has been created by the vSphere admin. Then we show how the devops user can make use of this to create another TKG cluster.

First, in vCenter, navigate to the cluster that has Workload Management enabled, then navigate to Configure > Namespaces > General. Expand the ‘Namespace Service’ box and toggle to enable:

Graphical user interface, text, application</p>
<p>Description automatically generated

This will then bring up a configuration window for a new template, for resource assignment:

Graphical user interface, application</p>
<p>Description automatically generated

Add permissions to an existing devops user:

Graphical user interface, text, application, chat or text message</p>
<p>Description automatically generated

And confirm:

Graphical user interface, application</p>
<p>Description automatically generated

The devops user (as assigned permissions by the vSphere admin) is now able to create supervisor namespaces.

First, we switch contexts to the supervisor namespace:

# kubectl config use-context 172.168.161.101
Switched to context "172.168.161.101"

Then create the namespace:

# kubectl create namespace ns3
namespace/ns3 created

To ensure the local information is synchronised, re-issue a login (a logout is not needed).

Switch to the new namespace:

# kubectl config use-context ns3
Switched to context "ns3"

To create our TKC, we define our manifest, as before:

TKG-deploy.yaml

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
  name: tkgcluster2
  namespace: ns3
spec:
  distribution:
    version: 1.20.2+vmware.1-tkg.1.1d4f79a
  topology:
    controlPlane:
      class: best-effort-small
      count: 1
      storageClass: vsan-default-storage-policy
    workers:
      class: best-effort-small
      count: 3
      storageClass: vsan-default-storage-policy

And apply:

# kubectl apply -f TKG-deploy.yaml
tanzukubernetescluster.run.tanzu.vmware.com/tkgcluster1 created

As before, we can watch the deployment:

# kubectl get tkc tkgcluster2 -o yaml -w

 

For more information on the self-service namespaces, visit: https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-BEEA763E-43B7-4923-847F-5E0398174A88.html

 

 

Deploy a Private Registry VM using the VM Service and add to TKG Service Config

The VM Service is a new feature available in vSphere 7 Update 2a which allows you to provision VMs using kubectl within a Supervisor Namespace, thus allowing developers the ability to deploy and manage VMs in the same way they manage other Kubernetes resources.

Note that a VM created through the VM Service can only be managed using kubectl: vSphere administrators can see the VM in vCenter, but cannot edit or otherwise alter the VM, but can display its details and monitor resources it uses. For information, see Monitor VMs in the vSphere Client.

We also have the ability, from vSphere 7 Update 2a, to use private registries for TKG clusters.

In this example, we will use the VM Service feature to deploy a VM as a devops user and then install a Harbor registry on it. Finally, we will use that Harbor instance as a private registry for a TKG cluster.

First the VI-admin must configure the VM service in vCenter.

Similar to TKG, we need to setup a content library to pull from. At the time of writing, CentOS and Ubuntu images are available for testing from the VMware Marketplace:
https://marketplace.cloud.vmware.com

To obtain a subscription link, first sign in using your ‘myvmware’ credentials.

Graphical user interface, application, Teams</p>
<p>Description automatically generated

Clicking on ‘Subscribe’ will take you through the wizard to enter settings and accept the EULA:

Graphical user interface, application</p>
<p>Description automatically generated

The site will then create a Subscription URL:

Graphical user interface, text, application, email</p>
<p>Description automatically generated

See the VMware Marketplace documentation for more details, https://docs.vmware.com/en/VMware-Marketplace/services/vmware-marketplace-for-consumers/GUID-0BB96E5E-123F-4BAE-B663-6C391F57C884.html

Back in vCenter, create a new content library with the link provided:

Graphical user interface, text, application</p>
<p>Description automatically generated

We then proceed to configure a namespace. If needed, create a new namespace and note the ‘VM Service’ info box:

Graphical user interface, application, Teams</p>
<p>Description automatically generated

Add at least one VM class:

Graphical user interface, table</p>
<p>Description automatically generated

Further VM classes can be defined by navigating to Workload Management > Services > VM Service > VM Classes

Add the content library configured above:

Graphical user interface, application</p>
<p>Description automatically generated

Now the service is ready, the rest of the steps can be performed as a devops user.

 

Deploy VM using VM Service

As usual, login to our cluster and switch contexts to the configured namespace. We can then see the Virtual Machine images available (we exclude the TKG images for our purposes):

# kubectl get vmimage | grep -v tkg
NAME                                            OSTYPE                FORMAT   AGE
bitnami-jenkins-2.222.3-1                       otherLinux64Guest     ovf      2d2h
centos-stream-8-vmservice-v1alpha1.20210222     centos8_64Guest       ovf      2d2h

Here we will deploy the CentOS image.

First, we create a file named ‘centos-user-data’ that captures the user, password and any customisation parameters. Use the following as a guide, replacing the password and authorized keys, etc.:

chpasswd:
    list: |
      centos:P@ssw0rd
    expire: false
packages:
  - wget
  - yum-utils
groups:
  - docker
users:
  - default
  - name: centos
    ssh-authorized-keys:
      - ssh-rsa AAAAB3NzaC1yc2EA… root@tkg.vmware.corp
    sudo: ALL=(ALL) NOPASSWD:ALL
    groups: sudo, docker
    shell: /bin/bash
network:
  version: 2
  ethernets:
      ens192:
          dhcp4: true

Next, we encode that file in base64 (and remove any newlines):

# cat centos-user-data | base64 | tr -d '\n'
I2Nsb3VkLWNvbmZpZwpjaHBhc3N3ZDoKICAgIGxpc3Q6IHwKICAgICAgdWJ1bnR1OlBAc3N3MHJkCiAgICBleHBpcmU6IGZhbHNlCnBhY2thZ2VfdXBncmFkZTogdHJ1ZQpwYWNrYWdlczoKICAtIGRvY2tlcgpncm91cHM6CiAgLSBk

For the next step, re-confirm the network name that was defined:

# kubectl get network
network-1

Then we create a manifest for the VM (cloudinit-centos.yaml) and add the encoded line in the previous step, under ‘user-data’. Note the values for the namespace, network, class name, image name, storage class, and hostname and adjust accordingly:

apiVersion: vmoperator.vmware.com/v1alpha1
kind: VirtualMachine
metadata:
  name: centos-vmsvc
  namespace: ns2
spec:
  networkInterfaces:
  - networkName: network-1
    networkType: vsphere-distributed
  className: best-effort-small
  imageName: centos-stream-8-vmservice-v1alpha1.20210222
  powerState: poweredOn
  storageClass: vsan-default-storage-policy
  vmMetadata:
    configMapName: centos-vmsvc
    transport: OvfEnv
---
apiVersion: v1
kind: ConfigMap
metadata:
    name: centos-vmsvc
    namespace: ns2
data:
  user-data: |
    I2Nsb3VkLWNvbmZpZwpjaHBhc3N3ZDoKICAgICAgdWJ1bn…
  hostname: centos-vmsvc

Note: ensure that the base64 encoded data is indented. Use a yaml validator, such as yamlint to make sure the format is correct.

We then apply this manifest:

# kubectl apply -f cloudinit-centos.yaml

We should see this now being created:

# kubectl get vm
NAME                        POWERSTATE   AGE
centos-vmsvc                             4s

Just like the TKC deployment, we can watch the status (and wait for the IP address):

# kubectl get vm centos-vmsvc -o yaml -w

Once the VM has been deployed, we can query the IP address:

# kubectl get vm centos-vmsvc -o yaml | grep Ip
        f:vmIp: {}
  vmIp: 172.168.161.6

We should be able to login to our VM. If the private key was added to the manifest, this should drop straight to a prompt:

# ssh centos@172.168.161.6
[centos@centos-vmsvc ~]$

 

Prepare the deployed VM and Install Harbor

We need to prepare the VM by installing Docker:

❯ sudo yum-config-manager --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo

 

❯ sudo yum install -y docker-ce docker-ce-cli containerd.io

See https://docs.docker.com/engine/install/centos/ for further details on installing Docker on CentOS

Next, within our new VM, we’ll download the Harbor installation script, as per the guide at https://goharbor.io/docs/2.0.0/install-config/quick-install-script/

❯ wget https://via.vmw.com/harbor-script

And set execute permissions and run it:

❯ chmod +x harbor.sh
❯ sudo ./harbor.sh

Follow the prompts (install using the IP address).

Next, copy the Harbor manifest template:

❯ sudo cp harbor/harbor.yml.tmpl harbor/harbor.yml

Edit the Harbor manifest file and update the hostname field with the IP address of the VM.
For example:

# Configuration file of Harbor

# The IP address or hostname to access admin UI and registry service.
# DO NOT use localhost or 127.0.0.1, because Harbor needs to be accessed by external clients.
hostname: 172.168.161.6

For the next step, we will need to create a self-signed certificate, as per: https://goharbor.io/docs/1.10/install-config/configure-https/

First the CA cert, remember to update as needed:

❯ openssl genrsa -out ca.key 4096

 

❯ openssl req -x509 -new -nodes -sha512 -days 3650 \
-subj "/C=CN/ST=UK/L=UK/O=example/OU=Personal/CN=172.168.161.6" \
-key ca.key  -out ca.crt

Then the Server Cert, updating the site name as needed:

❯ openssl genrsa -out testdmain.com.key 4096

 

❯ openssl req -sha512 -new \
    -subj "/C=CN/ST=UK/L=UK/O=example/OU=Personal/CN=172.168.161.6" \
    -key testdmain.com.key \
    -out testdmain.com.csr

 

❯ cat > v3.ext <<-EOF
authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
IP.1=172.168.161.6
EOF

 

❯ openssl x509 -req -sha512 -days 3650 \
    -extfile v3.ext \
    -CA ca.crt -CAkey ca.key -CAcreateserial \
    -in testdmain.com.csr \
    -out testdmain.com.crt

We will then need to copy the cert files to the appropriate directory:

❯ sudo cp testdmain.com.* /etc/pki/ca-trust/source/anchors/

Run the following command to ingest the certificates:

❯ sudo update-ca-trust

Convert the crt file for use by Docker and copy:

❯ openssl x509 -inform PEM -in testdmain.com.crt -out testdmain.com.cert

 

❯ sudo mkdir -p /etc/docker/certs.d/testdmain.com/
❯ sudo cp testdmain.com.cert /etc/docker/certs.d/testdmain.com/
❯ sudo cp testdmain.com.key /etc/docker/certs.d/testdmain.com/

Restart Docker:

❯ sudo systemctl restart docker

Now, we must configure Harbor to use the certificate files:

❯ sudo vi harbor/harbor.yml

In the https section, update the certificate and private key lines to point to the correct files, for example:

  certificate: /etc/pki/ca-trust/source/anchors/testdmain.com.crt
  private_key: /etc/pki/ca-trust/source/anchors/testdmain.com.key

Next, we run the Harbor prepare script:

❯ cd harbor
❯ sudo ./prepare

Then restart the Harbor instance:

❯ sudo docker-compose down -v
❯ sudo docker-compose up -d

Wait for the services to start and logout of the CentOS VM.

 

Configure the TKG Service to Trust the Deployed Repository

Test the instance by using a browser to navigate to the IP address of the CentOS VM. The Harbor login page should be seen:

Graphical user interface</p>
<p>Description automatically generated

The default credentials are:

admin / Harbor12345

We can also test access using ‘docker login’. First obtain the certificate and store locally:

# echo | openssl s_client -connect 172.168.161.6:443 2>/dev/null -showcerts | openssl x509 > harbor.crt

Then move the certificate into the OS’ cert store. For Photon OS/TKG Appliance this is /etc/ssl/certs.

# mv harbor.crt > /etc/ssl/certs

Then update the OS to use the new certificate (a reboot may be needed).

Finally, login to the Harbor instance, i.e. (credentials are admin/Harbor12345) – there should not be any certificate errors or warnings:

# docker login 172.168.161.6

Next, we will configure the TKG service to be able to use this registry.

Get the certificate form the CentOS VM in base64 format:

# echo | openssl s_client -connect 172.168.161.6:443 2>/dev/null -showcerts | openssl x509 | base64 | tr -d '\n'

We can then add this to a manifest to amend the TKG service configuration. Here we create ‘tks.yaml’. Add the certificate from the previous step:

tks.yaml

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TkgServiceConfiguration
metadata:
  name: tkg-service-configuration
spec:
  defaultCNI: antrea
  trust:
    additionalTrustedCAs:
      - name: harbor-ca
         data: [CERT GOES HERE]

As usual, apply:

# kubectl apply -f tks.yaml

Thus, any new TKG clusters created will automatically trust the registry.

 

For more information on the VM service, see: https://core.vmware.com/blog/introducing-virtual-machine-provisioning-kubernetes-vm-service. This blog article also includes a GitHub repository with examples.

For more information on private registry support, see: https://core.vmware.com/blog/vsphere-tanzu-private-registry-support

 

Pulling from a Private Repository

In the previous exercise, we created a private Harbor repository to use with any new TKG clusters created. Here, we will push an image to the private repository and pull it into our TKG cluster.

First, obtain a test container, for instance busybox:

# docker pull busybox

We can then push this to our Harbor instance. First login to the Harbor instance (replacing the IP address with your own):

# docker login 172.168.161.6

Next, tag the image and provide a repository name to save to:

# docker tag busybox:latest 172.168.161.6/library/myrepo:busybox

Finally, push the image:

# docker push 172.168.161.6/library/myrepo:busybox

See the Harbor documentation for further details on pushing images, https://goharbor.io/docs/1.10/working-with-projects/working-with-images/pulling-pushing-images/

Looking at our Harbor UI, under Projects > library > myrepo we can see that the image has been pushed.

A screenshot of a computer</p>
<p>Description automatically generated

Click on the image to bring up the information screen:

A screenshot of a computer</p>
<p>Description automatically generated

Clicking on the squares next to the image gives the pull command. Confirm that this is the same image we have tagged above.

Next, we create a Namespace and a new TKG cluster (see the section earlier in this guide). Login to this new TKG cluster.

We then create a simple manifest that will pull the container. Replace image string with the name saved from the Harbor UI.
We’ll call this manifest bb.yaml:

bb.yaml

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  labels:
    app: busybox
spec:
  containers:
  - image: "172.168.161.6/library/myrepo:busybox"
    command:
      - sleep
      - "3600"
    imagePullPolicy: Always
    name: busybox
  restartPolicy: Always

Then apply:

# kubectl apply -f bb.yaml

This should pull very quickly, and we can get and describe the pod:

# kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
busybox      1/1     Running   0          28m

 

 

Further Examples

Further examples of workloads on Tanzu Kubernetes Clusters can be found in the official documentation:

https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-E217C538-2241-4FD9-9D67-6A54E97CA800.html

 

 

Lifecycle Operations

Scale Out Tanzu Kubernetes Clusters

Scaling out Tanzu Kubernetes Clusters involves changing the number of nodes. You can increase the number of control-plane VMs, Worker VMs or both at the same time.

There are a couple of methods to approach this.

Method 1: Edit the YAML file used for deployment and apply the file just as it was done to create the TKC.

Method 2: Use Kubectl edit to directly edit this YAML file. After the file is saved, the changes will be triggered.

We will focus on Method 2 since this is a more automated approach over method 1.

First, switch to namespace where TKC lives

# kubectl config use-context tkgcluster1

Then list TKG clusters:

# kubectl get tkc
NAME          CONTROL PLANE   WORKER   DISTRIBUTION                 
tkgcluster1   1               3        1.18.15+vmware.1-tkg.1.600e412

Here we can see that there is only one cluster, and it has 1 control-plane VM and 3 worker VMs.

Edit the TKC manifest

# kubectl edit tkc/tkgcluster1

The cluster manifest will open in the text editor defined by your KUBE_EDITOR or EDITOR environment variable

Locate the ‘topology’ section and change controlPlane count from 1 to 3:

  topology:
    controlPlane:
      class: best-effort-xsmall
      count: 3
      storageClass: vsan-default-storage-policy
    workers:
      class: best-effort-xsmall
      count: 3
      storageClass: vsan-default-storage-policy

Save the file.

You can ‘see’ the VM creation using the watch command with jq:

# watch 'kubectl get tkc -o json | jq -r '.items[].status.vmStatus''

We can see that there are now 3 control-plane VMs:

# kubectl get tkc
NAME          CONTROL PLANE   WORKER   DISTRIBUTION                 
tkgcluster1   3               3        1.18.15+vmware.1-tkg.1.600e412

In vCenter, we see that the extra VMs have been created

Text</p>
<p>Description automatically generated

In the same manner, you can scale out by increasing the number of worker nodes.

First, switch to the Supervisor Namespace where the TKG cluster resides:

# kubectl config use-context ns1

Then list the available TKG Clusters

# kubectl get tkc
NAME          CONTROL PLANE   WORKER   DISTRIBUTION                 
tkgcluster1   3               3        1.18.15+vmware.1-tkg.1.600e412

Here we can see that there is only one cluster, and it has 3 control-plane VMs and 3 worker VMs.

Edit the TKC manifest

# kubectl edit tkc/tkgcluster1

The cluster manifest will open in the text editor defined by your KUBE_EDITOR or EDITOR environment variable (vi by default)

As before, locate the ‘topology’ section. Change workers count from 3 to 5 and save the file:

  topology:
    controlPlane:
      class: best-effort-xsmall
      count: 3
      storageClass: vsan-default-storage-policy
    workers:
      class: best-effort-xsmall
      count: 5
      storageClass: vsan-default-storage-policy

We can see that there are now 5 worker VMs:

# kubectl get tkc
NAME          CONTROL PLANE   WORKER   DISTRIBUTION                 
tkgcluster1   3               5        1.18.15+vmware.1-tkg.1.600e412

Again, in vCenter, the new VMs can be seen:

Text</p>
<p>Description automatically generated

 

 

Scale-In Tanzu Kubernetes Clusters

Scaling in Tanzu Kubernetes Clusters is just as easy as scaling out. The same procedure applies with the exception that this time we will decrease the number of worker nodes. Note that the Control Pane cannot be scaled in.

First, switch to the Supervisor Namespace where the TKG cluster resides:

# kubectl config use-context ns1

Then list the available TKG Clusters

# kubectl get tkc
NAME          CONTROL PLANE   WORKER   DISTRIBUTION                 
tkgcluster1   3               5        1.18.15+vmware.1-tkg.1.600e412

Edit the TKC manifest

# kubectl edit tkc/tkgcluster1

The cluster manifest will open in the text editor defined by your KUBE_EDITOR or EDITOR environment variable (vi by default)

Like previously, locate ‘topology’ section and then decrease the number of worker nodes and save the file:

  topology:
    controlPlane:
      class: best-effort-xsmall
      count: 3
      storageClass: vsan-default-storage-policy
    workers:
      class: best-effort-xsmall
      count: 3
      storageClass: vsan-default-storage-policy

We can see that the number of workers scales in back to 3:

# kubectl get tkc
NAME          CONTROL PLANE   WORKER   DISTRIBUTION                 
tkgcluster1   3               3        1.18.15+vmware.1-tkg.1.600e412

 

Update Tanzu Supervisor Cluster

To update one or more Supervisor clusters, including the version of Kubernetes for the environment and the infrastructure supporting TKG clusters, you perform a vCenter and Namespace upgrade.

Note: it is necessary to upgrade the Supervisor Cluster first before upgrading any TKG clusters.


Upgrade vCenter

There are several methods to upgrading the vCenter appliance. Follow VMware’s best practices while conducting this upgrade.

Upgrade details located in the official documentation for Upgrading the vCenter Appliance.

 

Procedure to upgrade Namespace:

  • Log in to the vCenter Server as a vSphere administrator.
  • Select Menu > Workload Management.
  • Select the Namespaces > Updates tab.
  • Select the Available Version that you want to update to.
  • For example, select the version v1.18.2-vsc0.0.5-16762486.

 

Note: You must update incrementally. Do not skip updates, such as from 1.16 to 1.18. The path should be 1.16, 1.17, 1.18.

  • Select one or more Supervisor Clusters to apply the update to.
  • To initiate the update, click Apply Updates.
  • Use the Recent Tasks pane to monitor the status of the update.

 

Update Tanzu Kubernetes Clusters

As opposed to the Supervisor cluster, which is administered and upgraded in vCenter, the child TKG clusters need to be updated using the standard Kubenetes toolset.

Updating the Tanzu Kubernetes Cluster includes variables such as version, virtual machine class, and storage class. However, there are several methods of updating this information for TKG clusters. You can refer to the official documentation for further details.

This approach includes utilizing commands such as kubectl edit, kubectl patch, and kubectl apply.

For this guide, we will highlight one of the “Patch” method to perform an in-place update of the cluster.

To upgrade the Kubernetes version we will create a variable and apply it to the cluster using the patch command. The approach demonstrated here uses the UNIX shell command read to take input from the keyboard and assign it to a variable named $PATCH.

The kubectl patch command invokes the Kubernetes API to update the cluster manifest. The ‘–-type merge’ flag indicates that the data contains only those properties that are different from the existing manifest.

First, we will need to change ‘fullVersion’ parameter to ‘null’. The ‘version’ parameter should then be changed to the version of Kubernetes we want to upgrade to.

For this exercise, we have our TKG cluster deployed at version v1.18.15 that will be upgraded to version v1.19.7

We can inspect the current version of our TKG cluster:

# kubectl get tkc tkgcluster1 -o json | jq -r '.spec.distribution'
{
  "fullVersion": "1.18.15+vmware.1-tkg.1.600e412",
  "version": "1.18.15+vmware.1-tkg.1.600e412"
}

Looking at our available versions, we can see that we have versions from 1.16.12 - 1.20.2 available

# kubectl get tkr
NAME                                VERSION                      
v1.16.12---vmware.1-tkg.1.da7afe7   1.16.12+vmware.1-tkg.1.da7afe7
v1.16.14---vmware.1-tkg.1.ada4837   1.16.14+vmware.1-tkg.1.ada4837
v1.16.8---vmware.1-tkg.3.60d2ffd    1.16.8+vmware.1-tkg.3.60d2ffd
v1.17.11---vmware.1-tkg.1.15f1e18   1.17.11+vmware.1-tkg.1.15f1e18
v1.17.11---vmware.1-tkg.2.ad3d374   1.17.11+vmware.1-tkg.2.ad3d374
v1.17.13---vmware.1-tkg.2.2c133ed   1.17.13+vmware.1-tkg.2.2c133ed
v1.17.17---vmware.1-tkg.1.d44d45a   1.17.17+vmware.1-tkg.1.d44d45a
v1.17.7---vmware.1-tkg.1.154236c    1.17.7+vmware.1-tkg.1.154236c
v1.17.8---vmware.1-tkg.1.5417466    1.17.8+vmware.1-tkg.1.5417466
v1.18.10---vmware.1-tkg.1.3a6cd48   1.18.10+vmware.1-tkg.1.3a6cd48
v1.18.15---vmware.1-tkg.1.600e412   1.18.15+vmware.1-tkg.1.600e412
v1.18.5---vmware.1-tkg.1.c40d30d    1.18.5+vmware.1-tkg.1.c40d30d
v1.19.7---vmware.1-tkg.1.fc82c41    1.19.7+vmware.1-tkg.1.fc82c41
v1.20.2---vmware.1-tkg.1.1d4f79a    1.20.2+vmware.1-tkg.1.1d4f79a

We construct our ‘PATCH’ variable:

# read -r -d '' PATCH <<'EOF'
spec:
  distribution:
    fullVersion: null    # set to null as just updating version
    version: v1.19.7
EOF

Then we apply the patch to the existing tkc that we are targeting. The system should return that the TKG cluster has been patched:

# kubectl patch tkc tkgcluster1 --type merge --patch "$PATCH"
tanzukubernetescluster.run.tanzu.vmware.com/tkgcluster1 patched

Check the status of the TKG cluster; we can see that the ‘phase’ is shown as ‘updating’:

# kubectl get tkc
NAME        CONTROL PLANE   WORKER   DISTRIBUTION                     AGE  PHASE    
tkgcluster1 1               3        v1.19.7+vmware.1-tkg.1.fc82c41   7m   updating

In vCenter, we can see a rolling upgrade of the control-plane VMs, as well as the workers: new VMs will be created with the new version of Kubernetes (and once that is completed, it deletes the old version). This will be done one VM at a time, starting with the control-plane, until they are all completed.

Text</p>
<p>Description automatically generated with medium confidence

 

Graphical user interface, text, application</p>
<p>Description automatically generated with medium confidence

After a few minutes, you will see that status will change from updating to running, at which point you can verify the cluster by running:

# kubectl get tkc
NAME          CONTROL PLANE   WORKER   DISTRIBUTION                 
tkgcluster1   3               5        1.19.7+vmware.1-tkg.1.fc82c41

 

 

 

Delete Operations

Destroy TKC and related objects

In order to delete a Tanzu Kubernetes Cluster, first switch to the Supervisor Namespace where the cluster is located. Visually, this can be seen in vCenter:

Graphical user interface, text, application, email</p>
<p>Description automatically generated

We change context to the Supervisor Namespace that contains the TKG cluster that we would like to destroy:

# kubectl config use-context ns1

Double-check the namespace is the correct one; a star next to the name indicates the currently selected context:

# kubectl config get-contexts
CURRENT   NAME                  CLUSTER           AUTHINFO      NAMESPACE
          172.168.161.101       172.168.161.101   wcp: ...
*         ns1                   172.168.161.101   wcp: ...      ns1
          ns2                   172.168.161.101   wcp: ...      ns2
          ns3                   172.168.161.101   wcp: ...      ns3

See which TKG cluster(s) reside in the namespace:

# kubectl get tkc
NAME          CONTROL PLANE   WORKER   DISTRIBUTION       AGE   PHASE 
tkgcluster1   1               3        v1.20.2+vmware...  10d   running

Prior to deletion, conduct a search for the TKG cluster within the vCenter search field to see all related objects:

Finally, to the delete TKG cluster, in this case with the name ‘tkgcluster1’:

# kubectl delete tkc tkgcluster1
tanzukubernetescluster.run.tanzu.vmware.com "tkgcluster1" deleted

vCenter will have tasks regarding the deletion of the TKG cluster and all related objects:

From vCenter, we can see that there are no more resources relating to the TKG cluster:

Delete Namespaces

To delete namespaces from the UI, navigate to Menu > Workload Management > Namespaces. Select the Namespace to be removed, then click on the namespace and click remove

 Note, ensure that there are no TKG clusters contained within the namespace before removal.

 

Graphical user interface, application</p>
<p>Description automatically generated

 

 

Graphical user interface, text, application</p>
<p>Description automatically generated
 

 

 

Delete Supervisor Cluster and Confirm Resources are Released

The supervisor cluster gets deleted when you disable Workload Management for a specific cluster. This action will also delete any existing Namespaces and TKG clusters that exists within this cluster. Proceed with caution when disabling Workload Management for a cluster.

You can first verify the supervisor cluster member by using the following command:

# kubectl get nodes
NAME                               STATUS   ROLES    AGE   VERSION
421c2fba09ab60c0ffe80c27a82d04af   Ready    master   12d   v1.19.1+wcp.3
421c4fcf29033faecfb403bb13656a39   Ready    master   12d   v1.19.1+wcp.3
421cefbcbfaeb030defbb8fcec097c48   Ready    master   12d   v1.19.1+wcp.3

From vCenter, use the search field to look for ‘supervisor’. This will return the supervisor VMs. You can add the DNS Name field and compare this with the output from the ‘kubectl get nodes’ command:

Once you have verified the supervisor cluster, you can delete this cluster and all other objects within this cluster by going to Menu > Workload Management > Select Clusters tab > Select the cluster to be deleted > Click DISABLE to remove the cluster and all of its objects

Graphical user interface</p>
<p>Description automatically generated

 

In this case you can see that the supervisor cluster houses a namespace and TKG cluster

You will receive a confirmation prompt prior to continuing with the deletion task:

Once you select the check box and click Disable, you will see some tasks such as powering off the TKC workers, deleting these virtual machines, deleting related folders, and lastly shutting down and deleting the Supervisor Cluster VMs.

 

A picture containing table</p>
<p>Description automatically generated

 

A picture containing graphical user interface, text, table</p>
<p>Description automatically generated

 

Graphical user interface, table</p>
<p>Description automatically generated

When the tasks are complete, the clusters tab will no longer have the previously selected cluster and you will not be able to connect to it via kubectl as the cluster no longer exists.

Graphical user interface, text, application</p>
<p>Description automatically generated

 

Text</p>
<p>Description automatically generated

 

 

Lifecycle Operations - TKG Extension

Contour Ingress

Day 1 Ops – Log Management

In this section we will examine few key Day2 activities on Contour ingress.  Contour components are running under two different apps (a) contour (b) envoy

Let’s extract the POD details for the Envoy & Contour

# kubectl get pods  -n tanzu-system-ingress
NAME                      READY   STATUS    RESTARTS   AGE
contour-d968f749d-8tvl4   1/1     Running   5          14h
contour-d968f749d-jmmkm   1/1     Running   5          14h
envoy-2kgxs               2/2     Running   0          14h
envoy-4lmxc               2/2     Running   0          14h
envoy-t8nc5               2/2     Running   0          11h

Now we know the pod details contour and envoy, we can extract the   logs           for troubleshooting purpose.

Extract Contour logs by using the pod name we listed before

# kubectl logs contour-d968f749d-8tvl4   -c contour -n tanzu-system-ingress

time="2021-07-07T01:58:04Z" level=info msg="args: [serve --incluster --xds-address=0.0.0.0 --xds-port=8001 --envoy-service-http-port=80 --envoy-service-https-port=443 --contour-cafile=/certs/ca.crt --contour-cert-file=/certs/tls.crt --contour-key-file=/certs/tls.key --config-path=/config/contour.yaml]"

time="2021-07-07T01:58:05Z" level=info msg="Watching Service for Ingress status" envoy-service-name=envoy envoy-service-namespace=tanzu-system-ingress

Extract Envoy logs by using the pod name we listed before

# kubectl logs envoy-2kgxs -c envoy -n tanzu-system-ingress

[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:324] initializing epoch 0 (base id=0, hot restart version=11.104)
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:326] statically linked extensions:
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:328]   envoy.resolvers: envoy.ip
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:328]   envoy.retry_priorities: envoy.retry_priorities.previous_priorities
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:328]   envoy.thrift_proxy.transports: auto, framed, header, unframed
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:328]   envoy.udp_listeners: quiche_quic_listener, raw_udp_listener
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:328]   envoy.filters.http: envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.ext_proc, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.admission_control, envoy.filters.http.aws_lambda, envoy.filters.http.aws_request_signing, envoy.filters.http.buffer, envoy.filters.http.cache, envoy.filters.http.cdn_loop, envoy.filters.http.compressor, envoy.filters.http.cors, envoy.filters.http.csrf, envoy.filters.http.decompressor,  

Since Envoy is the actual data-plane and dynamically implements filters to fulfil Devops ingress object requests, Envoy logs are more important to trouble shoot.

In a production / scaled environment, It is hard to go through the logs for each pod, CRDs and other objects. We can Simplify this by forwarding the logs to log server, and metrics to metrics server and Dashboards. We will explore the necessary tooling like Fluentbit, Prometheus, Grafana which are already part of TKGExtensions package.

Day 1 Ops - Config changes

Contour is highly configurable ingress, providing various options to customize the contour deployment according to the customer environmental needs. These configurations are broadly categorised under two sections  (1) Contour.config (2) envoy.config

Contour & envoy config values can be found at 

# ls $HOME/tkg-extensions-v1.3.1/ingress/contour/03-contour.yaml  
# ls $HOME/tkg-extensions-v1.3.1/ingress/contour/03-envoy.yaml

Few config values for example

  • contour.namespace: Namespaces  on which contour and its packaged obejcts can be deployed. Organizations might have standard mechanism to define namespaces according their defined standards.
  • Contour.config.default.HTTPVersion : A default HTTPversion need to be used by Contour
  • contour.config.timeouts.requestTimeout:Timout for an entire ingress request.
  • Envoy.hostPort.http : Port number for http requests , defaulted to 80
  • Envoy.hostPort.https: Port number for https requests, defaulted to 443

Note:  Config params with timeout value : Zero means, no value been set in contour, then Contour fall backs on Envoy default values. A full list of config values can be extracted from VMware official docs 

Day 2 Ops - upgrade Contour 

Like other immutable architectural pattern, the best way to upgrade is to Delete the Contour and re-install the new Ingress.

Note: You should take a backup of current config entries before you delete, and can be restored once the new version has been installed. This way the configs will remain same even after the upgrade.

Config file to be backed up

kubectl get secret contour-data-values -n tanzu-system-ingress -o 'go-template={{ index .data "values.yaml" }}' | base64 -d > contour-data-values.yaml

Day 2 Ops - Deleting Contour 

Like other TKGExtensions, Contour can be deleted, upgraded, changed any time without impacting the core K8S setup. To delete Contour ingress, we shall need to delete the following objects

  • Contour app
  • Namespace containing contour and its dependency objects/
  • Roles created for Contour
  • Delete app
# kubectl delete app contour -n tanzu-system-ingress  
app.kappctrl.k14s.io "contour" deleted

Delete NameSpace & roles

# kubectl delete -f namespace-role.yaml

FluentBit - Log forwarder

Day 1 Ops -Troubleshooting

Extracting FluentBit data values configured on the cluster

# kubectl get secret fluent-bit-data-values -n tanzu-system-logging -o 'go-template={{ index .data "values.yaml" }}' | base64 -d > fluent-bit-data-values.yaml

# cat fluent-bit-data-values.yaml

Note: in K8S secretes are base64 encoded, hence we shall decode the secret values with base64 to make it readable for us

Check the pods for FluentBit app

# kubectl get pods -n tanzu-system-logging
NAME               READY   STATUS    RESTARTS   AGE
fluent-bit-bxqf5   1/1     Running   0          17m
fluent-bit-dpmpf   1/1     Running   0          17m
fluent-bit-h72hp   1/1     Running   0          17m
fluent-bit-r9dq9   1/1     Running   0          17m 

Read logs generated by FluentBit container running inside the pods

# kubectl logs pod/fluent-bit-7qg5h -c fluent-bit -n tanzu-system-logging
Fluent Bit v1.6.9
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2021/07/12 16:25:38] [ info] [engine] started (pid=1)
[2021/07/12 16:25:38] [ info] [storage] version=1.0.6, initializing...
[2021/07/12 16:25:38] [ info] [storage] in-memory
[2021/07/12 16:25:38] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/07/12 16:25:38] [ info] [input:systemd:systemd.1] seek_cursor=s=657e7711b1764c8bbb38b81ee2c7349b;i=82f... OK
[2021/07/12 16:25:38] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2021/07/12 16:25:38] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2021/07/12 16:25:38] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2021/07/12 16:25:38] [ info] [filter:kubernetes:kubernetes.0] API server connectivity OK
[2021/07/12 16:25:38] [ info] [output:syslog:syslog.0] setup done for 10.156.134.90:514
[2021/07/12 16:25:38] [ info] [output:syslog:syslog.1] setup done for 10.156.134.90:514
[2021/07/12 16:25:38] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2021/07/12 16:25:38] [ info] [sp] stream processor started
[2021/07/12 16:25:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=793191 watch_fd=1 name=/var/log/containers/antrea-agent-vckf8_kube-system_install-cni-9d3c3ccd3ec44477b29ec0c8481e6f51c2eba2493cd9cebf6b90e7e2e67dbbc5.log

Day 2 Ops - Config changes

Fluentbit configuration can be updated in the fluent-bit-data-values.yaml and re-apply the updated config file.

Get current config values from fluent bit secret object

# kubectl get secret fluent-bit-data-values -n tanzu-system-logging -o 'go-template={{ index .data "values.yaml" }}' | base64 -d > fluent-bit-data-values.yaml

 Check for the file has been created from the

# ls fluent-bit-data-values.yaml

fluent-bit-data-values.yaml

Update configuration in fluent-bit-data-values.yaml

# vi  fluent-bit-data-values.yaml
#@data/values
#@overlay/match-child-defaults missing_ok=True
---
logging:
  image:
    repository: projects.registry.vmware.com/tkg
tkg:
  instance_name: "prasad-clu-01"
  cluster_name: "prasad-clu-01"
fluent_bit:
  output_plugin: "syslog"
  syslog:
    host: "10.156.134.90"
    port: "514"
    mode: "tcp"
    format: "rfc5424"

For more detailed info on the config values please refer to  VMWare Official documents.

Update/recreate FluentBit secret object

# kubectl create secret generic fluent-bit-data-values --from-file=values.yaml=fluent-bit-data-values.yaml -n tanzu-system-logging -o yaml --dry-run | kubectl replace -f-

Check the status of the FluentBit extension

# kubectl get app fluent-bit -n tanzu-system-logging
NAME         DESCRIPTION           SINCE-DEPLOY   AGE
fluent-bit   Reconcile succeeded   32s            66m

Detailed status and troubleshoot

# kubectl get app fluent-bit -n tanzu-system-logging -o yaml

 Fluent Bit v1.6.9
apiVersion: kappctrl.k14s.io/v1alpha1
kind: App
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kappctrl.k14s.io/v1alpha1","kind":"App","metadata":{"annotations":{"tmc.cloud.vmware.com/managed":"false"},"name":"fluent-bit","namespace":"tanzu-system-logging"},"spec":{"deploy":[{"kapp":{"rawOptions":["--wait-timeout=5m"]}}],"fetch":[{"image":{"url":"projects.registry.vmware.com/tkg/tkg-extensions-templates:v1.3.1_vmware.1"}}],"serviceAccountName":"fluent-bit-extension-sa","syncPeriod":"5m","template":[{"ytt":{"inline":{"pathsFrom":[{"secretRef":{"name":"fluent-bit-data-values"}}]},"paths":["tkg-extensions/common","tkg-extensions/logging/fluent-bit"]}}]}}
    tmc.cloud.vmware.com/managed: "false"
  creationTimestamp: "2021-07-12T16:25:29Z"
  finalizers:
  - finalizers.kapp-ctrl.k14s.io/delete
….

Day 2 Ops - Upgrade Fluent Bit

Like other immutable patterns, to upgrade the Fluentbit, you need to delete the current version of FluentBit resources and deploy the new version.  Config values (input & output connection details) are independent from Fluentbit resources, hence you can re-use the current config values.

 

  • Extract current config values from clusters
  • Delete Fluentbit from the Cluster
  • Download new version from the Tanzu Extensions package
  • Deploy Fluentbit extension by re-using the config values file.

Day 2 Ops - Delete Fluentbit

Fluent bit app can be deleted from cluster in 2 steps (1) Delete app (2) Delete namespace & role.                    

Delete Fluentbit App    

# kubectl delete app fluent-bit -n tanzu-system-logging 
app.kappctrl.k14s.io "fluent-bit" deleted

Delete NameSpace-role

# kubectl delete -f namespace-role.yaml
namespace "tanzu-system-logging" deleted
serviceaccount "fluent-bit-extension-sa" deleted
role.rbac.authorization.k8s.io "fluent-bit-extension-role" deleted
rolebinding.rbac.authorization.k8s.io "fluent-bit-extension-rolebinding" deleted
clusterrole.rbac.authorization.k8s.io "fluent-bit-extension-cluster-role" deleted
clusterrolebinding.rbac.authorization.k8s.io "fluent-bit-extension-cluster-rolebinding" deleted

Prometheus Metric Server

Day 1 Ops – Troubleshooting

Ensure the Prometheus app in the Reconcile Success. For failure to Reconcile cold be a issue with the YAML file syntax, API mismatches or other resource issues.

To troubleshoot Prometheus, extract the pods running for Prometheus and verify the log messages from those pods.

Fetching Prometheus PODS (both Prometheus & alertmanager)

# kubectl get pods -n tanzu-system-monitoring
NAME                                             READY   STATUS    RESTARTS   AGE
prometheus-alertmanager-5c49dfb98c-2jqfn         2/2     Running   0          13h
prometheus-cadvisor-g6vbg                        1/1     Running   0          13h
prometheus-cadvisor-ngtsg                        1/1     Running   0          13h
prometheus-cadvisor-pfwlx                        1/1     Running   0          13h
prometheus-kube-state-metrics-6f44c86df6-d5mql   1/1     Running   0          13h
prometheus-node-exporter-l2fd7                   1/1     Running   0          13h
prometheus-node-exporter-mqsrr                   1/1     Running   0          13h
prometheus-node-exporter-zlc4f                   1/1     Running   0          13h
prometheus-pushgateway-6d5f49cbcb-wf8mq          1/1     Running   0          13h
prometheus-server-8cc9dc559-6cxjh                2/2     Running   0          13h

Validate Log output from  “prometheus-alertmanager” containers running in one of the Prometheus POD listed from previous statement

# kubectl logs pod/prometheus-alertmanager-5c49dfb98c-2jqfn -c prometheus-alertmanager -n tanzu-system-monitoring

level=info ts=2021-07-12T21:30:55.076Z caller=main.go:231 msg="Starting Alertmanager" version="(version=, branch=, revision=)"
level=info ts=2021-07-12T21:30:55.076Z caller=main.go:232 build_context="(go=go1.13.15, user=, date=)"
level=info ts=2021-07-12T21:30:55.109Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/config/alertmanager.yml
level=info ts=2021-07-12T21:30:55.109Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/config/alertmanager.yml
level=info ts=2021-07-12T21:30:55.113Z caller=main.go:497 msg=Listening address=:9093

Verify log output for “prometheus-server” containers running in one of the Prometheus POD listed from pods listed earlier

# kubectl logs pod/prometheus-server-8cc9dc559-6cxjh -c prometheus-server -n tanzu-system-monitoring
level=info ts=2021-07-12T21:30:54.978Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.18.1, branch=non-git, revision=non-git)"
level=info ts=2021-07-12T21:30:54.979Z caller=main.go:338 build_context="(go=go1.14.8, user=root@781f2e89c308, date=20200907-23:58:33)"
level=info ts=2021-07-12T21:30:54.979Z caller=main.go:339 host_details="(Linux 4.19.190-1.ph3-esx #1-photon SMP Thu May 20 06:33:45 UTC 2021 x86_64 prometheus-server-8cc9dc559-6cxjh (none))"
level=info ts=2021-07-12T21:30:54.979Z caller=main.go:340 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2021-07-12T21:30:54.979Z caller=main.go:341 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2021-07-12T21:30:54.981Z caller=web.go:523 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2021-07-12T21:30:54.981Z caller=main.go:678 msg="Starting TSDB ..."
level=info ts=2021-07-12T21:30:54.989Z caller=head.go:575 component=tsdb msg="Replaying WAL, this may take awhile"
level=info ts=2021-07-12T21:30:54.989Z caller=head.go:624 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2021-07-12T21:30:54.989Z caller=head.go:627 component=tsdb msg="WAL replay completed" duration=242.481µs
level=info ts=2021-07-12T21:30:54.990Z caller=main.go:694 fs_type=EXT4_SUPER_MAGIC
level=info ts=2021-07-12T21:30:54.990Z caller=main.go:695 msg="TSDB started"
level=info ts=2021-07-12T21:30:54.990Z caller=main.go:799 msg="Loading configuration file" filename=/etc/config/prometheus.yml
level=info ts=2021-07-12T21:30:54.992Z caller=kubernetes.go:253 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"

In case of app Reconcile failure, verify the YAML syntax in prometheus-data-values.yaml . Updae the YAML file and re-apply  secret & app YAML files

#. kubectl create secret generic prometheus-data-values --from-file=values.yaml=prometheus-data-values.yaml -n tanzu-system-monitoring

# kubectl apply -f prometheus-extension.yaml

Day 2 Ops – Update Prometheus Configuration

Update the configuration for a Prometheus extension that is deployed to a Tanzu Kubernetes cluster.

Update the YAML file and re-apply   secret & app YAML files

# cd  ./tkg-extensions-v1.3.1/extensions/monitoring/prometheus/
# vi prometheus-data-values.yaml

#kubectl create secret generic prometheus-data-values --from-file=values.yaml=prometheus-data-values.yaml -n tanzu-system-monitoring

# kubectl apply -f prometheus-extension.yaml

Ref: Supported Prometheus Configuration parameters can be found at VMware official Documents

Note:  By default, the kapp-controller will sync apps every 5 minutes. The update should take effect in 5 minutes or less. If you want the update to take effect immediately, change the sync period in Prometheus-extension.yaml to a lesser value and apply the Prometheus extension using kubectl apply -f prometheus-extension.yaml.

Check for app status

# kubectl get app prometheus -n tanzu-system-monitoring -o yaml

Day 2 Ops - Delete Prometheus

Prometheus and Grafana sharing the same namespace “tanzu-system-monitoring”, hence one should delete Grafana resources (if it is installed already) before deleting the common namespace.  Prometheus setup deletion invoices 2 steps (1) Delete App (2) Delete

Delete Prometheus App

# kubectl delete app prometheus -n tanzu-system-monitoring
app.kappctrl.k14s.io "prometheus" deleted

Delete namespaces and roles

# kubectl delete -f namespace-role.yaml

Delete the secret object

# kubectl delete secret prometheus-data-values -n tanzu-system-monitoring

Grafana

Day 1 Ops – Troubleshooting

Check for grafana app deployment status

# kubectl get app grafana -n tanzu-system-monitoring

NAME      DESCRIPTION           SINCE-DEPLOY   AGE
grafana   Reconcile succeeded   113s           63m

If the app status is Reconcile Failed., Verify grafana data values file and do the necessary changes and redeploy

Access Grafana pod logs

# kubectl get pods -n tanzu-system-monitoring -l "app.kubernetes.io/name=grafana"

NAME                       READY   STATUS    RESTARTS   AGE
grafana-5b575c6cc9-r7mb9   2/2     Running   0          72m

# kubectl logs pod/grafana-5b575c6cc9-r7mb9    -c grafana -n tanzu-system-monitoring

t=2021-07-13T17:13:19+0000 lvl=info msg="Starting Grafana" logger=server version=7.3.5 commit=unknown-dev branch=master compiled=2021-04-14T17:36:56+0000
t=2021-07-13T17:13:19+0000 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/conf/defaults.ini
t=2021-07-13T17:13:19+0000 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana.ini
t=2021-07-13T17:13:19+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.data=/var/lib/grafana"
t=2021-07-13T17:13:19+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.logs=/var/log/grafana"
t=2021-07-13T17:13:19+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.plugins=/var/lib/grafana/plugins"

the issue with access Grafana web interface

Verify Grafana FQDN

# kubectl get httpproxy -n tanzu-system-monitoring -l app=grafana
NAME                FQDN                   TLS SECRET    STATUS   STATUS DESCRIPTION
grafana-httpproxy   grafana.system.tanzu   grafana-tls   valid    Valid HTTPProxy

Get ENVOY EXTERNAL_IP value

# kubectl get -n tanzu-system-ingress service envoy -o wide
NAME    TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE     SELECTOR
envoy   LoadBalancer   10.103.217.168   10.198.53.141   80:31673/TCP,443:32227/TCP   6d21h   app=envoy,kapp.k14s.io/app=1625607181111946240

Create a Host entry on CLI-VM or a DNS name on the mapped DNS server

# echo "10.198.53.141 grafana.system.tanzu" | sudo tee -a /etc/hosts

Ensure network connectivity for LB IP, by ping Envoy External_IP

# ping -c 3 10.198.53.141

Username and password Is not working while very first access to grafana

  • Validate grafana-data-values.yaml
  • Make sure base64 encoding values are accurate for monitoring.grafana.secret.admin_user & monitoring.grafana.secret.admin_password

Ex :

      admin_user: "YWRtaW4="

      admin_password: "YWRtaW4="

Day 2 Ops – Update / change Grafana configuration

Ger the current grana data values

# kubectl get secret grafana-data-values -n tanzu-system-monitoring -o 'go-template={{ index .data "values.yaml" }}' | base64 -d > grafana-data-values.yaml

Update Grafana data values file

Vi grafana-extension.yaml

Re-create the Grafana secret object (with Grafana data values)

# kubectl apply -f grafana-extension.yaml

app.kappctrl.k14s.io/grafana configured

Day 2 Ops – Delete Grafana

Delete Secret

# kubectl delete secret grafana-data-values -n tanzu-system-monitoring

secret "grafana-data-values" deleted	

Delete App

# kubectl delete app grafana -n tanzu-system-monitoring

app.kappctrl.k14s.io "grafana" deleted

Note: Both Grafana and Prometheus share the same common namespace, hence deleting tanzu-system-monitoring will delete both Grafana & Prometheus.

 

Monitoring

Monitor Namespaces, and K8s Objects resource utilization (vCenter)

Resource monitoring is an important aspect of managing a Tanzu environment. As part of the integration, monitoring namespaces and Kubernetes objects resource utilization is possible through vCenter.

At the cluster level, it is possible to monitor the different namespaces that exist within the vCenter. The overview pane provides a high-level view of the health, Kubernetes version and status, as well as the Control Plane IP and node health.

Navigate to Cluster>Monitor>Namespaces>Overview

Graphical user interface, text, application, email</p>
<p>Description automatically generated

 

Under the compute tab for the namespace, the resources for Tanzu Kubernetes as well as Virtual Machines display key information about the environment such as version, IP address, phase, etc.

Graphical user interface, text, application, email</p>
<p>Description automatically generated

 

Graphical user interface, text, application, email</p>
<p>Description automatically generated

For the Tanzu Kubernetes Clusters, the monitor tab also provides specific insights to the particular TKG Cluster. Information such as performance overview, tasks and evets, as well as resource allocation helps the admin understand the state and performance of the Tanzu Kubernetes Cluster.

Graphical user interface, application</p>
<p>Description automatically generated

 

 

Deploy Octant (optional)

Octant is a highly extensible Kubernetes management tool that, amongst many other features, allows for a graphical view of the Kubernetes environment. This is useful in a PoC environment to see the relationship between the different components. See https://github.com/vmware-tanzu/octant for more details.

Octant demo

If the TKG Demo Appliance is being used, Octant is already installed. Otherwise, download and install Octant, as described in the Octant getting started page:
https://reference.octant.dev/?path=/docs/docs-intro--page#getting-started

Launch Octant simply by the command ‘Octant’:

# octant &

Open an SSH tunnel port 7777 of the jump host –

For instance, from a Mac terminal:

 $ ssh -L 7777:127.0.0.1:7777 -N -f -l root <jump host IP>

Or Windows, using putty — navigate to Connection > SSH > Tunnels on the left panel. Enter ‘7777’ for the source port and ‘127.0.0.1:7777’ as the destination. Then click on ‘add’ and open a session to the jump host VM:

Graphical user interface, application</p>
<p>Description automatically generated

Thus, if we open a browser to http://127.0.0.1:777 (note http not https) we can see the Octant console:

Graphical user interface, text, application</p>
<p>Description automatically generated

Filter Tags

Modern Applications Cloud Foundation ESXi 7 vCenter Server 7 vSphere 7 vSphere with Tanzu Container Registry Kubernetes vSphere Distributed Switch (vDS) Document Proof of Concept Advanced Deploy Manage