Tanzu Proof of Concept Guide
vSphereTanzu
7.0u2
POC Guide Overview
The purpose of this document is to act as a simple guide for proof of concepts involving vSphere with Tanzu as well as VMware Cloud Foundation (VCF) with Tanzu.
This document is intended for data center cloud administrators who architect, administer and deploy VMware vSphere and VMware Cloud Foundation technologies. The information in this guide is written for experienced data center cloud administrators.
This document is not a replacement for official product documentation; however, it should be thought of as a structured guide to augment existing guidance throughout the lifecycle of a proof-of-concept exercise. Official documentation should supersede guidance documented here if the there is a divergence between this document and product documentation.
When referring to any statements made in this document, verification regarding support capabilities, minimums and maximums should be cross-checked against official VMware Technical product documentation at https://configmax.vmware.com/ in case of more recent updates or amendments to what is stated here.
This document is laid out into several distinct sections to make the guide more consumable depending on the use case and proof of concept scenario:
Section 1: Overview & Setup
Product information and getting started
Section 2: App Deployment & Testing
Use-case defined testing with examples
Section 3: Lifecycle Operations
Scaling, upgrades and maintenance
Section 4: Monitoring
Essential areas of focus to monitor the system
A Github repository with code samples to accompany this document is available at:
https://github.com/vmware-tanzu-experiments/vsphere-with-tanzu-proof-of-concept-samples
Overview and Setup
In this guide we detail the two networking options available in vSphere with Tanzu, namely vSphere or NSX-T networking. With the latter, we show how VMware Cloud Foundation with Tanzu can be utilised to quickly stand up a private cloud with Tanzu enabled.
Note that Tanzu itself comes in three different flavours, or ‘Editions’, see https://tanzu.vmware.com/tanzu.
Document Scope
This POC guide describes the following topics & tasks to build and manage Kubernetes container platforms on vSphere with Tanzu.
Architectural choices
- Architectural choices for Network Stack
VI-Admin tasks
- Setting up the network stack (will explore all three options for network stack)
- Creating Content Library in vSphere
- Enabling vSphere Cluster HA
- Enabling Workload Management
- Deploying a Supervisor Cluster
- Creating Namespaces
- Creating SBPM policy and assignment them to Namespaces
- Setting up a standalone Harbor Image repository
Platform Management tasks
- Creating a Tanzu Kubernetes Cluster (TKC aka guest cluster)
- Deploying Sample Workloads on TKC
- Installing Tanzu Extensions (CertManager, Contour, Fluentbit, Prometheus, Grafana)
App Deployment
- App deployment & Testing
Terminology
To make it easy to read, we have used the following short words in this document.
K8S | Kubernetes |
LCM | Lifecycle management including Day0, Day1, and Day2 operations |
TKG-S | Tanzu Kubernetes Grid Service aka vSphere With Tanzu |
vNamespaces | vSphere Namespace, newly introduced in vSphere 7, to create multi-tenancy. vSphere Namespace is a vSphere concept to provide multi-tenancy / segregation of resources belongs to one particular tenant. This is not a Kubernetes namespace |
TKG Cluster | Tanzu Kubernetes Grid Cluster, an upstream K8S cluster created for DevOps workloads |
TKC | Synonym for TKG Cluster, stands for Tanzu Kubernetes Cluster. |
Guest Cluster | Synonym for TKG Cluster. A term used to denote that the cluster is outside of vSphere primitives, and life cycle management is independent of vSphere LCM. |
VDS | vSphere Distributed Switch (defined and managed by vCenter) |
Architectural choices
Network Stack
Network-stack is responsible for connecting Kubernetes nodes and load balancer for k8s control plane and container workloads. VMware offers two different possible options for networking stack on which vSphere with Tanzu can be built on.
Note: At the time of this document preparation (vSphere 7.0u2), Supervisor Services (vSphere POD Services, built-in Imagerepo service, etc., ) are available only when the stack is built with NSX-T SDN.
(Option-1) VDS
In this model, vSphere VDS will provide the network connectivity for Kubernetes cluster nodes In both Supervisor cluster & Kubernetes clusters (guest clusters). In addition, AVI LoadBalancer will provide the load balancing feature for K8S Control planes, and LB for container workloads. Note: AVI is a default load balancer shipped with “vSphere with Tanzu”. However, customers can also bring their existing load balancer (ex: HAProxy) in the place of AVI LB.
(Option-2) NSX-T:
In this model, NSX-T SDN will serve all the networking needs for the stack. This includes the Kubernetes cluster node network, container network, a load balancer for the control plane, a load balancer for workload apps, and layer-7 ingress for the workload apps. In addition, NSX-T enables Supervisor services (vSphere pods, image repo service, etc.,), Network security policies, between namespaces, between K8s Clusters, nodes, and much more advanced SDN features. Note: VMware recommends using NSX-T as a network choice, which enables the complete enterprise-grade features all-in-one network solution.
Container Network Interface (CNI)
CNI provides the connectivity and network policy for POD on a Kubernetes Cluster. Based on the Kubernetes provides the only API, but the platform team should deploy one of the network solution compatibles for CNI, for example, Antrea, Calico, Flannel, etc. CNI makes a clear separation between container vs infra network. Antrea CNI, VMware recommended, default CNI solution, delivered out-of-box with Tanzu Kubernetes Clusters. Note: As an alternative to Antrea, customers can use their own choice of CNI for example Calico
Antrea & NSX-T
In addition to the required network features like K8S POD & Service network, network policies, Antrea provides the most advanced network policies and out-of-box integration with NSX-T. This direct integration allows NSX-T to reconcile all Antrea features to NSX-T and vice versa.
Together with Antrea as a CNI and NSX-T as a network stack, customers can benefit from enterprise-grade network policy management and a single interface for managing all network policies for VMs, K8S Nodes, Container workloads, cross-cluster & cross namespace network policies
Load Balancer
On a Kubernetes cluster, we use a Load balancer for two main purposes. (1) To access multi-node Kubernetes control plane (2) Accessing Kubernetes Service Object (type LB) served by the backend apps. vSphere with Tanzu comes with a free version of NSX Advanced LoadBalancer (AVI Essential edition). However, vSphere with Tanzu also allows bringing your Load balancer, for example, HAProxy.
vSphere with Tanzu — vSphere Networking
Here, we will describe the setup of vSphere with Tanzu using vSphere Networking, with both the NSX Advanced Load Balancer (ALB) and the open-source HaProxy options.
Getting Started
The basic steps and requirements to get started with vSphere with Tanzu are shown below. For more information, please refer to the official documentation.
1. Network Requirements
In vCenter, configure a vDS with at least two port groups for ‘Management’ and ‘Workload Network’.
The following IP addresses are required:
Management Network:
5x consecutive routable IP addresses for Workload Management, plus one for the network appliance (i.e. either NSX ALB or HaProxy)
Workload Network:
For simplicity, one /24 routable network (which will be split into subnets). In the example below, we will use the network 172.168.161.0/24 with 172.168.161.1 as the gateway.
Next, decide on the network solution to be used, either:
2(a) NSX ALB or —
2(b) HaProxy
2(a) NSX Advanced Load Balancer Configuration
In vSphere 7.0 Update 2, a new option for load balancer is available. The NSX Advanced Load Balancer (NSX ALB) also known as AVI, provides a feature-rich and easy to manage load balancing solution. The NSX ALB is available for download in OVA format from my.vmware.com.
Below, we will briefly run through the steps to configure the NSX ALB. For full instructions, please refer to the documentation, https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-AC9A7044-6117-46BC-9950-5367813CD5C1.html
The download link will redirect you to the AVI Networks Portal. Select the VMware Controller OVA:
For more details on download workflow, see https://kb.vmware.com/s/article/82049?lang=en_US
Once the OVA has been downloaded, proceed to your vCenter and deploy the OVA by supplying a management IP address.
Note, supplying a sysadmin login authentication key is not required.
Once the appliance has been deployed and powered on, login to the UI using the supplied management IP/FQDN. Note, depending on the version used, the UI will vary. At the time of writing, the latest version available is 20.1.5.
Create username and password. Email is optional.
Add supplemental details, such as DNS, passphrase, etc.
Next, the Orchestrator needs to be set to vSphere. Select ‘Infrastructure’ from the menu on the top left:
Then select ‘Clouds’ from the menu at the top:
Edit ‘Default-Cloud’ – on the pop-up window, navigate to ‘select cloud’ and set the orchestrator to ‘VMware’.
Follow the screens to supply the username, password and vCenter information so that the NSX ALB can connect to vCenter. For permissions, leave “Write” selected, as this will allow for easier deployment and automation between ALB and vCenter. Leave SDN Integration set to “None”.
Finally, on the Network tab, under ‘Management Network’, select the workload network as previously defined on the vDS. Provide the IP subnet, gateway, and IP address pool to be utilized. This IP Pool is a range of IP to be used for the Service Engine (SE) VMs .
Note, in a production environment, a separate 'data network' for the SEs may be desired. For more information, see the documentation, https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-489A842E-1A74-4A94-BC7F-354BDB780751.html
Here, we have created a block of 99 addresses in the workload network, from our /24 range:
After the initial configuration, we will need to either import a certificate or create a self-signed certificate to be used in Supervisor cluster communication. For the purposes of a PoC, a self-signed certificate should suffice.
Navigate to Administration by selecting this option from the drop-down menu on the upper left corner.
In the administration pane, select Settings and edit the System Access Settings by clicking on the pencil icon:
Remove the default certificates under ‘SSL/TLS’ Certificate. Then click on the caret underneath to expand the options. Click on ‘Create Certificate’ green box.
Create a self-signed certificate by providing the required information. You can add Subject Alternate Names if desired. Note, ensure the IP address of the appliance has been captured, either in the Name or in a SAN.
For more information on certificates, including creating a CSR, see the AVI documentation, https://avinetworks.com/docs/20.1/ssl-certificates/
Next, we need to create an IPAM Profile. This is needed to tell the controller to use the Frontend network to allocate VIPs via IPAM.
Navigate to Templates > Profiles > IPAM/DNS Profiles > create
Via a IPAM profile, change the cloud for usable network to ‘Default-Cloud’, and set the usable network to the VIP network, in this case DSwitch-wld:
At this stage, if you have successfully deployed the NSX ALB, proceed to step 3.
2(b) HaProxy Configuration
As an alternative to the NSX ALB, VMware have packaged HaProxy in a convenient OVA format, which can be downloaded and deployed quickly. This is hosted on GitHub: https://github.com/haproxytech/vmware-haproxy
In the simplest configuration, the HA Proxy appliance will need a minimum of two interfaces, one on the ‘Management’ network and the other on a ‘Workload’ network, with a static IP address in each. (An option to deploy with three networks, i.e. with an additional ‘Frontend’ network is also available but is beyond the scope of this guide).
Below we will go through the basic setup of HaProxy and enabling Workload Management to quickly get started.
First, download and configure the latest HaProxy OVA from the GitHub site.
Here, we will use the ‘Default’ configuration, which will deploy the appliance with two network interfaces:
The two port groups for Management and Workload Network should be populated with the appropriate values. The Frontend network can be ignored:
Use the following parameters as a guide, substituting the workload network for your own.
As per the table below, we subnet the Workload network to a /25 for the load-balancer IP ranges in step 3.1. In addition, the HaProxy will require an IP for itself in the workload network.
1.2 |
Permit Root Login |
True |
2.1 |
Host Name |
<Set a Host Name> |
2.2 |
DNS |
<DNS Server> |
2.3 |
Management IP |
<IP in Mgmt range> |
2.4 |
Management Gateway |
<Mgmt Gateway> |
2.5 |
Workload IP |
172.168.161.3 |
2.6 |
Workload Gateway |
172.168.161.1 |
3.1 |
Load Balancer IP Ranges (CIDR) |
172.168.161.128/25 |
3.2 |
Dataplane API Management Port |
5556 |
3.3 |
HaProxy User ID |
admin |
3.4 |
HaProxy Password |
<set a password> |
N.B.: Take special care with step 3.1, this must be in CIDR format. Moreover, this must cover the ‘IP Address Ranges for Virtual Servers’ which will be used later to enable Workload Management in vCenter (see below). Note that the vCenter wizard will require the range defined here in a hyphenated format: from the example above, 172.168.161.128/25 covers the range 172.168.161.129-172.168.171.240
3. TKG Content Library
Before we can start the Workload Management wizard, we need to first setup the TKG Content Library to pull in the TKG VMs from the VMware repository. The vCenter where the TKG content library will be created on should have internet access in order to be able to connect to the repo.
Create a subscribed content library (Menu > Content Libraries > Create New Content Library) pointing to the URL:
https://wp-content.vmware.com/v2/latest/lib.json |
For the detailed procedure, see the documentation: https://via.vmw.com/tanzu_content_library
4. Load Balancer Certificate
The first step is to obtain the certificate from the deployed network appliance.
For NSX ALB export the certificate from the ALB UI by going to Templates > Security > SSL/TLS Certificates. Select the self-signed certificate you created and export it.
Copy the certificate and make a note of it for the steps below.
If using the HaProxy appliance, log into it using SSH. List the contents of the file /etc/haproxy/ca.crt.
5. Configure Workload Management
In vCenter, ensure that DRS and HA are enabled for the cluster and a storage policy for the control plane VMs exists. In a vSAN environment, the default vSAN policy can be used.
Navigate to Menu > Workload Management and click ‘Get Started’ to start the wizard.
Below we’ll focus on the networking, i.e. step 5 onwards in the wizard. For more details, please see the documentation, https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-8D7D292B-43E9-4CB8-9E20-E4039B80BF9B.html
Use the following as a guide, again, replacing values for your own:
Load Balancer:
Name*: lb1 Type: NSX ALB | HaProxy Data plane API Address(s): <NSX ALB mgmt IP>:443 | <HaProxy mgmt IP>:5556 Username: admin Password: <password from appliance> IP Address Ranges for Virtual Servers^ : 172.168.161.129–172.168.171.240 Server Certificate Authority: <cert from NSX ALB or HaProxy>
* Note that this is a Kubernetes construct, not the DNS name of the HaProxy appliance.
^ HaProxy only. This must be within the CIDR range defined in step 3.1 of the HaProxy configuration
Management Network:
Network: <mgmt port group> Starting IP: <first IP of consecutive range> Subnet: <mgmt subnet> Gateway: <management gateway> DNS: <dns server> NTP: <ntp server>
Workload Network:
Name: <any you choose> Port Group: <workload port group> Gateway: 172.168.161.1 Subnet: 255.255.255.0 IP Address Ranges*: 172.168.161.20–172.168.161.100
* These must not overlap with the load-balancer addresses
Note, it may be useful to use a tool such as ‘arping’ or ‘nmap’ to check where IPs are being used. For example:
# arping -I eth0 -c 3 10.156.163.3
ARPING 10.156.163.3 from 10.156.163.10 eth0
Unicast reply from 10.156.163.3 [00:50:56:9C:5A:F5] 0.645ms
Unicast reply from 10.156.163.3 [00:50:56:9C:5A:F5] 0.891ms
Unicast reply from 10.156.163.3 [00:50:56:9C:5A:F5] 0.714ms
Sent 3 probes (1 broadcast(s))
Received 3 response(s)
vSphere with Tanzu — NSX-T Networking
Overview
In this section, we show how to quickly deploy vSphere with Tanzu and NSX-T using VMware Cloud Foundation (VCF). NSX provides a container plug-in (NCP) that interfaces with Kubernetes to automatically serve networking requests (such as ingress and load balancer) from NSX Manager. For more details on NCP, visit: https://via.vmw.com/ncp.
In addition, NSX-T networking enables two further elements: ‘vSphere Pods’ and a built-in version of the Harbor registry. The vSphere Pod service enables services from VMware and partners to run directly on top of ESXi hosts, providing a performant, secure and tightly integrated Kubernetes environment.
For more details on vSphere Pods see https://via.vmw.com/vsphere_pods and https://blogs.vmware.com/vsphere/2020/04/vsphere-7-vsphere-pod-service.html
Once the VCF environment with SDDC manager has been deployed (see https://docs.vmware.com/en/VMware-Cloud-Foundation/index.html for more details), Workload Management can be enabled. Note that both standard and consolidated deployments can be used.
Getting Started
Below is a summary of the detailed steps found in the VCF POC Guide.
First, in SDDC Manager, click on Solutions, this should show “Kubernetes – Workload Management”. Click on Deploy and this will show a window with the deployment pre-requisites, i.e.:
- Hosts are licenced correctly
- An NSX-T based Workload Domain has been provisioned
- NTP and DNS has been set up correctly
- NSX Edge cluster deployed with a ‘large’ form factor
- The following IP addresses have been reserved for use:
- non-routable /22 subnet for pod networking
- non-routable /24 subnet for Kubernetes services
- two routable /27 subnets for ingress and egress
- 5x consecutive IP addresses in the management range for Supervisor services
Clicking on Begin will start the Kubernetes deployment wizard.
Select the appropriate cluster from the drop-down box. Click on the radio button next to the compatible cluster and click on Next:
The next screen will go through some validation checks
Check that the validation succeeds. After clicking on Next again, check the details in the final Review window:
Click on Complete in vSphere to continue the wizard in vCenter
Ensure the correct cluster has been pre-selected:
To show the Storage section, click on Next. Select the appropriate storage policies for the control plane, ephemeral disks and image cache:
Click on Next to show the review window. Clicking on Finish will start the supervisor deployment process:
For an interactive guide of the steps above, visit:
TKG Content Library
To later setup Tanzu Kubernetes Clusters, we need to first setup the TKG Content Library to pull in the TKG VMs from the VMware repository.
Create a subscribed content library (Menu > Content Libraries > Create New Content Library) pointing to the URL:
https://wp-content.vmware.com/v2/latest/lib.json |
For the detailed procedure, see the documentation: https://via.vmw.com/tanzu_content_library
Supervisor Cluster Setup
After the process has been completed, navigate to Cluster > Monitor > Namespaces > Overview to ensure the correct details are shown and the health is green. Note that whilst the operations are in progress, there may be ‘errors’ shown on this page, as it is monitoring a desired state model:
Configure Supervisor Cluster Namespace(s) with RBAC
Once the supervisor cluster has been configured, a namespace should be created in order to set permissions, storage policies, and capacity limitations among others. In Kubernetes, a namespace is a collection of resources such as containers, disks, etc.
To create a namespace, navigate to Menu > Workload Management > Click on Namespaces > New Namespace.
Fill in the necessary fields and click create.
The new namespace area will be presented. This is where permissions, storage policies and other options can be set.
After clicking “Got It” button, the summary will show a widget where permissions can be set.
Click on Add Permissions and fill in the necessary fields. It is important to note that the user/group to be added to this namespace should have already been created ahead of time. This can be an Active Directory user/group (see https://via.vmw.com/ad_setup) or ‘vsphere.local’:
After adding permission, the summary screen will show who has permissions and what type. Clicking the Manage Permissions link will take you to the Permissions tab for this namespace
From the permissions tab, you can add/remove/edit permissions for a particular namespace. Thus, here we can enable access for a developer to be able to consume the namespace.
Configure Supervisor Cluster Namespace(s) Storage Policy
First, configure any storage policies as needed, either by defining a VM storage policy (as is the case for vSAN) or by tagging an existing datastore. Note that vSAN comes with a default storage policy ‘vSAN Default Storage Policy’ that can be used without any additional configuration.
To create a VM storage policy, navigate to Menu > Policies and Profiles > VM Storage Policies and click on ‘Create’. Follow the prompts for either a vSAN storage policy or tag-based policy under ‘Datastore Specific rules’.
To create a tag-based VM storage policy, reference the documentation: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.storage.doc/GUID-D025AA68-BF00-4FC2-9C7E-863E5787E743.html
Once a Policy has been created, navigate back to the namespace and click on ‘add storage’
Select the appropriate storage policy to add to the namespace:
Configure Supervisor Cluster Namespace(s) with Resource Limitations
Resource limitations such as CPU, memory, and storage can be tied to a namespace. Under the namespace, click on the Configure tab and select Resource Limits.
By clicking on the edit button, resources can be limited for this specific Namespace. Resource limitations can also be set at the container level.
Note that under the Configure tab, it is also possible to limit objects such as Replica Sets, Persistent Volume Claims (PVC), and network services among others.
Lab VM Setup
Whilst many of the operations in this guide can be performed on a standard end-user machine (be it Windows, MacOS or Linux), it is a good idea to deploy a jump host VM, which has the tools and configuration ready to work with. A Linux VM is recommended.
Conveniently, there is a TKG Demo Appliance fling that we can leverage for our purposes. Download and deploy the OVA file from the link below (look for the ‘offline download’ of the TKG Demo Appliance OVA): https://via.vmw.com/tkg_demo
Note that throughout this guide, we use Bash as the command processor and shell.
Downloading the kubectl plugin
Once a namespace has been created (see steps above), a command-line utility (kubectl-vsphere) needs to be downloaded to be able to login to the namespace. First, navigate to the namespace in vCenter: Menu > Workload Management > Namespace then select ‘Copy link’:
This will provide the VIP address needed to login to the namespace. Make a note of this address. Then on your jump VM, download the zip file ‘vsphere-plugin.zip’, either using a browser or via wget, pointing to https://<VIP>/wcp/plugin/linux-amd64/vsphere-plugin.zip
For example:
# wget https://172.168.61.129/wcp/plugin/linux-amd64/vsphere-plugin.zip --no-check-certificate
Unzip this file and place the contents in the system path (such as /usr/local/bin). The zip file contains two files, namely kubectl and kubectl-vsphere. Remember to set execute permissions.
To log into a namespace on the supervisor cluster, issue the following command, replacing the VIP IP with your own:
# kubectl vsphere login --server=172.168.61.129 --insecure-skip-tls-verify
Use the credentials of the user added to the namespace to log-in.
Note that the ‘insecure’ option needs to be specified unless the appropriate TLS certificates have been installed on the jump host. For more details see the ‘Shell Tweaks’ sub-section below.
Once logged in, perform a quick check to verify the health of the cluster using ‘kubectl cluster-info’:
# kubectl cluster-info
Kubernetes master is running at https://172.168.61.129:6443
KubeDNS is running at https://172.168.61.129:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Shell Tweaks (optional)
In order to have a better experience (with less typing and mistakes) it’s advisable to spend a little time further setting up our lab VM.
Installing Certificates:
In order to setup trust with vCenter, and to avoid skipping the TLS verify step on every login, we need to download the certificate bundle and copy the certificates to the appropriate location.
The outline procedure for this is given in https://kb.vmware.com/s/article/2108294 with more details here, https://via.vmw.com/tanzu_tls
First, we download the certificate bundle from vCenter and unzip it:
# wget --no-check-certificate https://vCenter-lab/certs/download.zip
# unzip download.zip
Then copy the certificates to the correct location. This is determined by the operating system, in the case of the TKG Appliance / Photon OS, it is /etc/ssl/certs:
# cp certs/lin/* /etc/ssl/certs
Finally, either use an OS utility to update the system certificates, or reboot the system.
Password as an environment variable:
We can store the password used to login to the supervisor cluster in an environment variable. This can then be combined with the login command for quicker/automated logins, for example (here we have also installed the certificates, thus we have a shorter login command):
# export KUBECTL_VSPHERE_PASSWORD=P@ssw0rd
# kubectl vsphere login --vsphere-username administrator@vsphere.local --server=https://172.168.161.101
For autocomplete:
# source <(kubectl completion bash)
# echo "source <(kubectl completion bash)" >> ~/.bashrc
To set the alias of kubectl to just ‘k’:
# echo "alias k='kubectl'" >> ~/.bashrc
# complete -F __start_kubectl k
YAML validator
It is a good idea to get any manifest files checked for correct syntax, etc. before applying. Tools such as ‘yamllint’ (or similar, including online tools) validate files quickly and detail where there may be errors.
For more details and other tools see the following links:
https://kubernetes.io/docs/reference/kubectl/cheatsheet/
https://yamllint.readthedocs.io/
Tanzu Kubernetes Cluster Deployment
Once the Supervisor cluster has been enabled, and a Namespace created (as above), we can create an upstream-compliant Tanzu Kubernetes Cluster (TKC). This is done by applying a manifest on the supervisor cluster which will define how the cluster is setup. (Note that the terms TKC and TKG cluster are used interchangeably within this guide.)
First, make sure that the Supervisor Namespace has been correctly configured. A content library should have been created to pull down the TKG VMs. In vSphere 7 update 2a there is a further requirement to add a VM class.
Navigating to Hosts and Clusters > Namespaces > [namespace] will give you a view of the information cards. The card labelled ‘Tanzu Kubernetes Grid Service’ should have the name of the content library hosting the TKG VMs.
On the ‘VM Service’ card click on ‘Add VM Class’ to add VM class definitions to the Namespace:
This will bring up a window to enable you to add the relevant VM classes (or to create your own). Select all available classes and add them to the Namespace:
For more details on the sizing see: https://via.vmw.com/tanzu_vm_classes.
Next, we can proceed to login to the supervisor namespace using ‘kubectl vsphere login’. If necessary, use the ‘kubectl config use-context’ command to switch to the correct supervisor namespace.
To get the contexts available (the asterisk shows the current context used):
# kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* 172.168.61.129 172.168.61.129 dev@vsphere.local
ns01 172.168.61.129 dev@vsphere.local ns01
And to switch between them:
# kubectl config use-context ns01
Switched to context "ns01".
If we have setup our TKC content library correctly, we should be able to see the downloaded VM images using the command ‘kubectl get tkr’:
# kubectl get tkr
NAME VERSION
v1.16.12---vmware.1-tkg.1.da7afe7 1.16.12+vmware.1-tkg.1.da7afe7
v1.16.14---vmware.1-tkg.1.ada4837 1.16.14+vmware.1-tkg.1.ada4837
v1.16.8---vmware.1-tkg.3.60d2ffd 1.16.8+vmware.1-tkg.3.60d2ffd
v1.17.11---vmware.1-tkg.1.15f1e18 1.17.11+vmware.1-tkg.1.15f1e18
v1.17.11---vmware.1-tkg.2.ad3d374 1.17.11+vmware.1-tkg.2.ad3d374
v1.17.13---vmware.1-tkg.2.2c133ed 1.17.13+vmware.1-tkg.2.2c133ed
v1.17.17---vmware.1-tkg.1.d44d45a 1.17.17+vmware.1-tkg.1.d44d45a
v1.17.7---vmware.1-tkg.1.154236c 1.17.7+vmware.1-tkg.1.154236c
v1.17.8---vmware.1-tkg.1.5417466 1.17.8+vmware.1-tkg.1.5417466
v1.18.10---vmware.1-tkg.1.3a6cd48 1.18.10+vmware.1-tkg.1.3a6cd48
v1.18.15---vmware.1-tkg.1.600e412 1.18.15+vmware.1-tkg.1.600e412
v1.18.5---vmware.1-tkg.1.c40d30d 1.18.5+vmware.1-tkg.1.c40d30d
v1.19.7---vmware.1-tkg.1.fc82c41 1.19.7+vmware.1-tkg.1.fc82c41
v1.20.2---vmware.1-tkg.1.1d4f79a 1.20.2+vmware.1-tkg.1.1d4f79a
Thus versions through to v1.20.2 are available to use.
We then need to create a manifest to deploy the TKC VMs. An example manifest is shown below, this will create a cluster in the ns01 supervisor namespace called ‘tkgcluster1’ consisting of one control-plane and three worker-nodes, with the Kubernetes version 1.17.8:
TKG-deploy.yaml
apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
name: tkgcluster1
namespace: ns01
spec:
distribution:
version: v1.17.8
topology:
controlPlane:
count: 1
class: guaranteed-small
storageClass: vsan-default-storage-policy
workers:
count: 3
class: guaranteed-small
storageClass: vsan-default-storage-policy
Let’s dissect this manifest to examine the components:
A: These lines specify the API version and the kind, these should not be modified. To get the available API version for Tanzu, run ‘kubectl api-versions | grep tanzu’.
B: Tanzu Kubernetes cluster name is defined in the field ‘name’ and the supervisor namespace is defined in the ‘namespace’ field.
C: The K8s version (v1.17.8) is defined. This will depend on the downloaded TKG VMs from the content library. Use the command ‘kubectl get tkr’ to obtain the available versions.
D: The created VMs will use the ‘guaranteed-small’ profile.
E: Storage policy to be used by the control plane VMs
For clarity, some fields have been omitted (the defaults will be used). For a full list of parameters, refer to the documentation: https://via.vmw.com/tanzu_params and further manifest file examples: https://via.vmw.com/tanzu_yaml
Once this file has been created, use kubectl to start the deployment, for example, we create our manifest file called ‘TKG-deploy.yaml’ (as above) and apply:
# kubectl apply -f TKG-deploy.yaml
The supervisor cluster will create the required VMs and configure the TKC as needed. This can be monitored using the get and describe verbs on the ‘tkc’ noun:
# kubectl get tkc -o wide
NAME CONTROL PLANE WORKER DISTRIBUTION AGE PHASE
tkgcluster1 1 1 v1.17.8+vmware.1-tkg.1.5417466 28d running
# kubectl describe tkc
Name: tkgcluster1
Namespace: ns01
Labels: <none>
Annotations: API Version: run.tanzu.vmware.com/v1alpha1
Kind: TanzuKubernetesCluster
.
.
Node Status:
tkgcluster1-control-plane-jznzb: ready
tkgcluster1-workers-fl7x8-59849ddbb-g8qjq: ready
tkgcluster1-workers-fl7x8-59849ddbb-jqzn4: ready
tkgcluster1-workers-fl7x8-59849ddbb-kshrt: ready
Phase: running
Vm Status:
tkgcluster1-control-plane-jznzb: ready
tkgcluster1-workers-fl7x8-59849ddbb-g8qjq: ready
tkgcluster1-workers-fl7x8-59849ddbb-jqzn4: ready
tkgcluster1-workers-fl7x8-59849ddbb-kshrt: ready
Events: <none>
For more verbose output and to watch the cluster being built out, select yaml as the output with the ‘-w’ switch:
# kubectl get tkc -o yaml -w
.
.
nodeStatus:
tkc-1-control-plane-lvfdt: notready
tkc-1-workers-fxspd-894697d7b-nz682: pending
phase: creating
vmStatus:
tkc-1-control-plane-lvfdt: ready
tkc-1-workers-fxspd-894697d7b-nz682: pending
In vCenter, we can see the TKC VMs being created (as per the manifest) within the supervisor namespace:
Once provisioned, we should be able to see the created VMs in the namespace:
# kubectl get wcpmachines
NAME PROVIDERID IPADDR
tkgcluster1-control-plane-scsz5-2dr55 vsphere://421075449 172.168.61.33
tkgcluster1-workers-tjpzq-gkdn2 vsphere://421019aa 172.168.61.35
tkgcluster1-workers-tjpzq-npw88 vsphere://421055cf 172.168.61.38
tkgcluster1-workers-tjpzq-vpcwx vsphere://4210d90c 172.168.61.36
Once the TKC has been created, login to it by using ‘kubectl vsphere’ with the following options:
# kubectl vsphere login –server=<VIP> \
--insecure-skip-tls-verify \
--tanzu-kubernetes-cluster-namespace=<supervisor namespace> \
--tanzu-kubernetes-cluster-name=<TKC name>
For example:
# kubectl-vsphere login --server=https://172.168.61.129 \
--insecure-skip-tls-verify \
--tanzu-kubernetes-cluster-namespace=ns01 \
--tanzu-kubernetes-cluster-name=tkgcluster1
Login using the user/credentials assigned to the namespace. You can then change contexts between the TKC and the supervisor namespace with the ‘kubectl config’ command (as above).
Developer Access to TKCs
Once a TKG cluster has been provisioned, developers will need sufficient permissions to deploy apps and services.
A basic RBAC profile is shown below:
tkc-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: all:psp:privileged
roleRef:
kind: ClusterRole
name: psp:privileged
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: Group
name: system:authenticated
apiGroup: rbac.authorization.k8s.io
This can also be achieved using the kubectl command:
# kubectl create clusterrolebinding default-tkg-admin-privileged-binding --clusterrole=psp:vmware-system-privileged --group=system:authenticated
For more information, see the documentation to grant developer access to the cluster: https://via.vmw.com/tanzu_rbac
TKG Extension Deployment & Operations
Along with the core Kubernetes, the DevOps team needs additional platform tools for connecting, monitoring, and accessing container workloads running on the K8S cluster. Platform tools like Layer7 Ingress, Log forwarder, Observability tools. Most of the Platform-tools provided by Tanzu Extensions are CRDs.
Document Scope
In this POC guide, we will walk you through deploying and managing the following container platform tools (TKG Extensions) on the TKC clusters. We will also validate our setup by deploying and accessing the sample apps. The following platform tools are shipped as part of the TKG Extensions bundle,
- Kapp-controller & CertManager (Pre-requisite, common tools)
- Contour - Layer 7 Ingress
- FluentBit - Log forwarder
- Prometheus - Metric Server
- Grafana - Metric Dashboard
Download the TKG Extensions v1.3.1 Bundle
TKG extensions package can be downloadable from my.vmware.com -> Product Downloads -> Go to Downloads -> VMware Tanzu Kubernetes Grid -> Go TO Downloads-> VMware Tanzu Kubernetes Grid Extension Manifests 1.3.1 -> Download Now. In this TKGExtension section, we will use pre-created CLI-VM
Extract Tanzu Extensions to CLI-VM
Once downloaded, move the tar file to CLI-VM on your Linux box. And untar the package using the following command
# tar -xzf tkg-extensions-manifests-v1.3.1-vmware.1.tar.gz
# ls ./tkg-extensions-v1.3.1+vmware.1/extensions
Deploying TKGExtension Pre-Requisite tools
TKG Extensions required two Pre-requisites tools (1) Kapp-controller (2) CertManager. These two components will further be used by other tools as part of the TKG Extension package.
- Kapp controller: Reconciles the TKGExtension components.
- CertManager: Most of the Kubernetes platform components need SSL certificates. Cert-manager adds certificates and certificate issuers as resource types in Kubernetes clusters and simplifies the process of obtaining, renewing, and using those certificates.
Install Kapp Controller
kapp-controller.yaml file is available in the /tkg-extensions-v1.3.1+vmware.1/extensions
Using kubectl command apply the kapp-controller.yaml file.
This deployment creates the following objects
- Namespace: tkg-system
- ServiceAccount: kapp-controller-sa
- CRD: apps.kappctrl.k14s.io
- Deployment: kapp-controller
- ClusterRole & Rolebinding: kapp-controller-cluster-role
# cd ./tkg-extensions-v1.3.1/extensions/
# kubectl apply -f kapp-controller.yaml
namespace/tkg-system created
serviceaccount/kapp-controller-sa created
customresourcedefinition.apiextensions.k8s.io/apps.kappctrl.k14s.io created
deployment.apps/kapp-controller created
clusterrole.rbac.authorization.k8s.io/kapp-controller-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/kapp-controller-cluster-role-binding created
Verify kapp-controller object creation
# kubectl get ns tkg-system
NAME STATUS AGE
tkg-system Active 17m
# kubectl get crd | grep kapp
apps.kappctrl.k14s.io 2021-07-06T20:39:04Z
# kubectl get clusterroles -n tkg-system | grep kapp
kapp-controller-cluster-role 2021-07-06T20:39:04Z
Verify Kapp deployment, pods are running
# kubectl get all -n tkg-system
NAME READY STATUS RESTARTS AGE
pod/kapp-controller-bcffd9c44-g5qcc 1/1 Running 0 15m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kapp-controller 1/1 1 1 15m
NAME DESIRED CURRENT READY AGE
replicaset.apps/kapp-controller-bcffd9c44 1 1 1 15m
The second pre-requisite for the TKG Extension package is CertManager. CertManager installation YAML file can be located at ./tkg-extensions-v1.3.1+vmware.1/cert-manager
CertManager installation creates the following objects
- Namespace: cert-manager
- CRDs: Creates multiple CRDs including certificaterequests, certificates, challenges, clusterissuers, issuers, orders.acme
- Deployment: Creates multiple deployments including cainjector, cert-manager, cert-manager-webhook
- ClusterRole & Rolebinding : kapp-controller-cluster-role
Go to ./tkg-extensions-v1.3.1+vmware.1 & Apply all the files from cert-manager folder.
# cd ..
# cd ./tkg-extensions-v1.3.1
# kubectl apply -f cert-manager/
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
alidatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
Verify CertManager Installation
# kubectl get ns,crd,clusterroles --all-namespaces | egrep 'cert'
namespace/cert-manager Active 81s
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io 2021-07-06T20:48:29Z
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io 2021-07-06T20:48:29Z
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io 2021-07-06T20:48:29Z
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io 2021-07-06T20:48:29Z
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io 2021-07-06T20:48:30Z
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io 2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector 2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates 2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges 2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers 2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim 2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers 2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders 2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-edit 2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/cert-manager-view 2021-07-06T20:48:30Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:certificatesigningrequests:nodeclient 2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:certificatesigningrequests:selfnodeclient 2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:kube-apiserver-client-approver 2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:kube-apiserver-client-kubelet-approver 2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:kubelet-serving-approver 2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:certificates.k8s.io:legacy-unknown-approver 2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:controller:certificate-controller 2021-07-06T20:10:44Z
clusterrole.rbac.authorization.k8s.io/system:controller:root-ca-cert-publisher 2021-07-06T20:10:44Z
Contour Ingress
Contour is a Kubernetes ingress controller that uses the Envoy reverse proxy. Contour deploys both Contour control plane & Envoy data plane. Kubernetes provides only an ingress API, hence we deploy Contour for ingress controller.
Kubernetes Ingress API has very limited features and might not serve the traffic routing and security needs of the DevOps team. Which include, multi-team FQDN, TLS Delegation, inclusions, Rate-Limiting, Traffic-Shifting, Request-Rewriting, out of box integration with observability tools.
With few CRDs including HTTPProxy, TLSCertificateDelegation, Extension services, contour provides the most advanced ingress/traffic management features
Contour as a Control plane: Contour is the control plane for the Contour ingress, which synchronizes user ingress requests with the Envoy proxy. i.e. Contour is a management & configuration server for Envoy proxy. Envoy as a Data Plane: Envoy Is a high-performance reverse proxy, implements the filleters to fulfill the definition of a DevOps person’s ingress object request. Envoy offers HTTP(S), traffic management, security filters through which the packet flows, and provides a rich set of observability features over the traffic. Detailed information about the Contour ingress can be found https://projectcontour.io/docs
|
Pre-requisites
Ensure the following pre-requisites
- TKC / Guest cluster is ready
- TKG Extension pre-requisites have been deployed on the TKC (Kapp-controller & Cert-manager)
Configuration & Installation
Contour ingress installation process will create the following objects
- Namespace: tanzu-system-ingress
- Service Account :contour-extension-sa
- Controlplane: Envoy Proxy
- Secret: contour-data-values, contour-extension-sa-token-xxx
- CRD : HAProxy, TLSCertifiateDelegation, ExtensionService
- Pods : contour, Envoy
Go to ./tkg-extensions-v1.3.1/extensions/ingress/contour
# cd ./tkg-extensions-v1.3.1/extensions/ingress/contour/
Create namespace, service account and roles by applying namespace-role.yaml file.
# kubectl apply -f namespace-role.yaml
namespace/tanzu-system-ingress created
serviceaccount/contour-extension-sa created
role.rbac.authorization.k8s.io/contour-extension-role created
rolebinding.rbac.authorization.k8s.io/contour-extension-rolebinding created
clusterrole.rbac.authorization.k8s.io/contour-extension-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/contour-extension-cluster-rolebinding created
Create Contour config file by copying from the template file given in the package
# cp vsphere/contour-data-values-lb.yaml.example vsphere/contour-data-values.yaml
Update the data value file: Ensure we have the right version of the Envoy package. (we must use Envoy version v1.17.3_vmware.1)
Edit contour-data-values and tag the Envoy Image to version v1.17.3_vmware.1
# cat vsphere/contour-data-values.yaml
#@data/values
#@overlay/match-child-defaults missing_ok=True
---
infrastructure_provider: "vsphere"
contour:
image:
repository: projects.registry.vmware.com/tkg
envoy:
image:
repository: projects.registry.vmware.com/tkg
tag: v1.17.3_vmware.1
service:
type: "LoadBalancer"
Note: Do not use Envoy image v1.16.2_vmware.1 due to a CVE. Specify v1.17.3_vmware.1 in the configuration as shown. For more information, see the Release Notes.
Create a secret object for contour
# kubectl create secret generic contour-data-values --from-file=values.yaml=vsphere/contour-data-values.yaml -n tanzu-system-ingress
secret/contour-data-values created
Verify the secret object creation
# kubectl get secrets -n tanzu-system-ingress
NAME TYPE DATA AGE
contour-data-values Opaque 1 83s
contour-extension-sa-token-8bm88 kubernetes.io/service-account-token 3 16m
default-token-wdtr6
Deploy Contour app
# kubectl apply -f contour-extension.yaml
app.kappctrl.k14s.io/contour created
Validate contour app installation
# kubectl get service envoy -n tanzu-system-ingress -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
envoy LoadBalancer 10.103.217.168 10.198.53.141 80:31673/TCP,443:32227/TCP 19m app=envoy,kapp.k14s.io/app=1625607181111946240
Key points:
- Envoy proxy got an EXTERNAL-IP value from the LoadBlanacer installed along with the infrastructure.
- All ingress objects created by the DevOps team will be served by Envoy proxy, hence the external access for any workload on this cluster (for layer 7) will be connected to this EXTERNAL_IP
- We will further use this EXTERNAL-IP for all ingress(layer7) communications.
Verify Envoy DaemonSet
# kubectl get daemonsets -n tanzu-system-ingress
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
envoy 3 3 3 3 3 <none> 24m
Verify custom CRDS belongs to Contour
# # kubectl get crd | grep -i contour
extensionservices.projectcontour.io 2021-07-06T21:33:01Z
httpproxies.projectcontour.io 2021-07-07T01:57:37Z
tlscertificatedelegations.projectcontour.io 2021-07-07T01:57:37Z
Verify custom CRDS belongs to Contour
# kubectl get crd | grep -i contour
extensionservices.projectcontour.io 2021-07-06T21:33:01Z
httpproxies.projectcontour.io 2021-07-07T01:57:37Z
tlscertificatedelegations.projectcontour.io 2021-07-07T01:57:37Z
Before use ingress in a workload, let's verify the status of Contour app objects. Make sure all the resource are running & Envoy has a EXTERNAL_IP
# kubectl get pod,svc -n tanzu-system-ingress
NAME READY STATUS RESTARTS AGE
pod/contour-d968f749d-8tvl4 1/1 Running 0 26m
pod/contour-d968f749d-jmmkm 1/1 Running 0 26m
pod/envoy-2kgxs 2/2 Running 0 26m
pod/envoy-4lmxc 2/2 Running 0 26m
pod/envoy-wm2k9 2/2 Running 0 26m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/contour ClusterIP 10.110.232.3 <none> 8001/TCP 26m
service/envoy LoadBalancer 10.103.217.168 10.198.53.141 80:31673/TCP,443:32227/TCP 26m
Fluentbit – Log forwarder
Fluent Bit is an open-source Log Processor and Forwarder which allows you to collect any data like metrics and logs from different sources, enrich them with filters, and send them to multiple destinations.
Installation Scope: Fluentbit is deployed at the Cluster level, i.e. Platform Operators should deploy the TKC cluster will have its Fluentbit installed on it.
Configuration & Setup
Design choices:
Fluent bit supports tens of outputs including Elastic-search, HTTP, Kafka, Splunk, Syslog, etc. In this example, we will use “Syslog” output and will forward the logs to the vRealize Log Insite server.
- TKC / Guest cluster is ready
- TKG Extension pre-requisites have been deployed on the TKC (kapp-controller & Cert-manager)
- Log destination is available and reachable from TKC Cluster.
Configuration & Installation
Fluent bit runs as a DaemonSet with 8 replications which serve as a Log Collector, Aggregator & Forwarder.
Fluentbit installation process will create the following objects
- Namespace: tanzu-system-logging
- Service Account: fluent-bit-extension-sa
- Roles: fluent-bit-extension-role, fluent-bit-extension-cluster-role
Navigate to Fluentbit installation yaml file
# cd ./tkg-extensions-v1.3.1/extensions/logging/fluent-bit
Create namespace, service account and roles by applying namespace-role.yaml file
# kubectl apply -f namespace-role.yaml
namespace/tanzu-system-logging created
serviceaccount/fluent-bit-extension-sa created
role.rbac.authorization.k8s.io/fluent-bit-extension-role created
rolebinding.rbac.authorization.k8s.io/fluent-bit-extension-rolebinding created
clusterrole.rbac.authorization.k8s.io/fluent-bit-extension-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/fluent-bit-extension-cluster-rolebinding created
Configure Fluentbit config values
Fluentbit Config values are found in the appropriate sub-folders in the fluent bit extension, In this document we will pick syslog example.
# cp syslog/fluent-bit-data-values.yaml.example syslog/fluent-bit-data-values.yaml
List out config values file before updating
# cat syslog/fluent-bit-data-values.yaml
#@data/values
#@overlay/match-child-defaults missing_ok=True
---
logging:
image:
repository: projects.registry.vmware.com/tkg
tkg:
instance_name: "<TKG_INSTANCE_NAME>"
cluster_name: "<CLUSTER_NAME>"
fluent_bit:
output_plugin: "syslog"
syslog:
host: "<SYSLOG_HOST>"
port: "<SYSLOG_PORT>"
mode: "<SYSLOG_MODE>"
format: "<SYSLOG_FORMAT>"
Note:
- Instance_name: Mandatory but arbitrary; Appears in the logs
- Cluster_name: name of the target TKC / guest cluster
Update config file to point to the target log server (VRLI in this example)
-
# vi syslog/fluent-bit-data-values.yaml #@data/values #@overlay/match-child-defaults missing_ok=True --- logging: image: repository: projects.registry.vmware.com/tkg tkg: instance_name: "prasad-tkc-clu-01" cluster_name: "prasad-clu-01" fluent_bit: output_plugin: "syslog" syslog: host: "10.156.134.90" port: "514" mode: "tcp" format: "rfc5424"
Create a FluentBit Secret with data values for our log destination
# kubectl create secret generic fluent-bit-data-values --from-file=values.yaml=syslog/fluent-bit-data-values.yaml -n tanzu-system-logging
secret/fluent-bit-data-values created
Note: Repeat the above two steps (updating config & creating a secret) per destination type of your choice like Elasticsearch, HTTP, Kafka, Splunk etc.
Verify created Secret
# kubectl get secret -n tanzu-system-logging
NAME TYPE DATA AGE
default-token-zt8qr kubernetes.io/service-account-token 3 21m
fluent-bit-data-values Opaque 1 44s
fluent-bit-extension-sa-token-w5826 kubernetes.io/service-account-token 3 21m
Deploy Fluentbit app
# kubectl apply -f
app.kappctrl.k14s.io/fluent-bit created
Check Fluentbit app deployment status
# kubectl get app fluent-bit -n tanzu-system-logging
NAME DESCRIPTION SINCE-DEPLOY AGE
fluent-bit Reconcile succeeded 38s 63s
Note : Status should change from Reconcile to Reconcile Succeeded.
Check the pods for the FluentBit app
# kubectl get pods -n tanzu-system-logging
NAME READY STATUS RESTARTS AGE
fluent-bit-bxqf5 1/1 Running 0 17m
fluent-bit-dpmpf 1/1 Running 0 17m
fluent-bit-h72hp 1/1 Running 0 17m
fluent-bit-r9dq9 1/1 Running 0 17m
Note: These pod names are important to troubleshoot FluentBit in case if any issues.
Prometheus Metric Server
Once deployed, Prometheus can scrape the metrics from the supported resources (like deployments with /metrics or any other accessible API ). Many of the modern apps and tools implementing observability patterns like /metrics API on which Prometheus can scrape the metrics.
Scope In this section we will deploy the TKG Extension for Prometheus to collect and view metrics for Tanzu Kubernetes clusters. In addition, we will also perform day1 & day2 Lifecycle management changes.
Pre-Requisites
- TKC/Guest cluster is available with default service Domain (cluster. local) & default persistent storage class.
- On CLI-VM TKGExtension v1.3.1 package has been downloaded and unpacked.
Note: In case of not having a persistent storage class, we can create one and update the persistent storage class name in the Prometheus config file.
Validate default persistent storage class
# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
vsan-default-storage-policy (default) csi.vsphere.vmware.com Delete Immediate true 6d
Connect to TKC
# kubectl vsphere login --server https://${SUPERVISOR_CLUSTER_IP} --insecure-skip-tls-verify -u ${PRE_USER_NAME} --tanzu-kubernetes-cluster-name ${TKC_CLUSTER_NAME}
kubectl config use-context ${TKC_CLUSTER_NAME}
Switched to context "prasad-clu-01".
Configuration & Installation
Prometheus installation process will create the following objects
- Namespace: tanzu-system-monitoring
- Service Account: Prometheus-extension-sa
- Roles: Prometheus-extension-role, Prometheus-extension-cluster-role
- Deployment(s): Prometheus creates 4 Deployments objects
- DaemonSet(s): Prometheus creates 2 DaemonSet objects
Ref: For complete details of Prometheus you can refer to VMware official docs & Prometheus official docs.
Create namespace & roles
# cd ./tkg-extensions-v1.3.1/extensions/monitoring/prometheus/
# kubectl apply -f namespace-role.yaml
namespace/tanzu-system-monitoring created
serviceaccount/prometheus-extension-sa created
role.rbac.authorization.k8s.io/prometheus-extension-role created
rolebinding.rbac.authorization.k8s.io/prometheus-extension-rolebinding created
clusterrole.rbac.authorization.k8s.io/prometheus-extension-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-extension-cluster-rolebinding created
Customise Configuration / data values
Extract a configuration file: Prometheus config files are available at ./tkg-extensions-v1.3.1/extensions/monitoring/prometheus/
# cp prometheus-data-values.yaml.example prometheus-data-values.yaml
Note: No need to change any default values unless the cluster doesn’t have a default storage class (or) one wishes to use the specific storage class for Prometheus & AlertManager.
We will make the following additions to the prometheus-data-values. YAML
- Ingress: With this section, we will be able to access the Prometheus GUI using our ingress API. In order to get success with Prometheus Ingress object creation, we should have a Contour or other ingress controller installed on the cluster.
- Prometheus_server.pvc: To specific the storage class name for Prometheus pvc object. This entry is needed only if there is no default storage class defined in the cluster
- Alertmanager.pvc: To specific the storage class name for AlertManager PVC object. This entry is needed only if there is no default storage class defined in the cluster
Customize Storage Class for Prometheus & AlertManager
# vi prometheus-data-values.yaml
monitoring:
ingress:
enabled: true
virtual_host_fqdn: "prometheus.cluster.test"
prometheus_prefix: "/"
alertmanager_prefix: "/alertmanager/"
prometheus_server:
image:
repository: projects.registry.vmware.com/tkg/prometheus
pvc:
storage_class: vsan-default-storage-policy
storage: "8Gi"
alertmanager:
image:
repository: projects.registry.vmware.com/tkg/prometheus
pvc:
storage_class: vsan-default-storage-policy
storage: "8Gi"
kube_state_metrics:
image:
repository: projects.registry.vmware.com/tkg/prometheus
node_exporter:
image:
repository: projects.registry.vmware.com/tkg/prometheus
pushgateway:
image:
repository: projects.registry.vmware.com/tkg/prometheus
cadvisor:
image:
repository: projects.registry.vmware.com/tkg/prometheus
prometheus_server_configmap_reload:
image:
repository: projects.registry.vmware.com/tkg/prometheus
prometheus_server_init_container:
image:
repository: projects.registry.vmware.com/tkg/prometheus
Note: Once after the successful creation of objects, don’t forget to create a DNS entry or Host entry with the FQDN (specified in the above config file) with Envoy proxy External_IP value. As a reminder, all ingress requests on our cluster will be served on Envoy’s LB IP address.
Create Prometheus secret using the Prometheus-data-values (edited in the previous step)
# kubectl create secret generic prometheus-data-values --from-file=values.yaml=prometheus-data-values.yaml -n tanzu-system-monitoring
secret/prometheus-data-values created
Deploy Prometheus App
# kubectl apply -f prometheus-extension.yaml
app.kappctrl.k14s.io/prometheus created
Ensure the Prometheus app status turned to Reconcile succeeded.
# kubectl get app prometheus -n tanzu-system-monitoring
NAME DESCRIPTION SINCE-DEPLOY AGE
prometheus Reconcile succeeded 15s 91s
Full details for the Prometheus app configuration can be availed here
# kubectl get app prometheus -n tanzu-system-monitoring -o yaml
……..
inspect:
exitCode: 0
stdout: |-
Target cluster 'https://10.96.0.1:443'
08:47:37PM: debug: Resources: Ignoring group version: schema.GroupVersionResource{Group:"stats.antrea.tanzu.vmware.com", Version:"v1alpha1", Resource:"antreanetworkpolicystats"}
08:47:37PM: debug: Resources: Ignoring group version: schema.GroupVersionResource{Group:"stats.antrea.tanzu.vmware.com", Version:"v1alpha1", Resource:"networkpolicystats"}
Resources in app 'prometheus-ctrl'
Namespace Name Kind Owner Conds. Rs Ri Age
(cluster) prometheus-alertmanager ClusterRole kapp - ok - 1m
^ prometheus-alertmanager ClusterRoleBinding kapp - ok - 1m
^ prometheus-cadvisor ClusterRole kapp - ok - 1m
^ prometheus-cadvisor ClusterRoleBinding kapp - ok - 1m
^ prometheus-kube-state-metrics ClusterRole kapp - ok - 1m
^ prometheus-kube-state-metrics ClusterRoleBinding kapp - ok - 1m
^ prometheus-node-exporter ClusterRole kapp - ok - 1m
……
Let’s check for Deployments & DaemonSet object creation status
# kubectl get daemonsets -n tanzu-system-monitoring
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
prometheus-cadvisor 3 3 3 3 3 <none> 7m48s
prometheus-node-exporter 3 3 3 3 3 <none> 7m48s
# kubectl get deployments -n tanzu-system-monitoring
NAME READY UP-TO-DATE AVAILABLE AGE
prometheus-alertmanager 1/1 1 1 9m43s
prometheus-kube-state-metrics 1/1 1 1 9m43s
prometheus-pushgateway 1/1 1 1 9m43s
prometheus-server 1/1 1 1 9m43s
Let’s check for the PVC objects created by Prometheus & AlertManager
# kubectl get pvc -n tanzu-system-monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-alertmanager Bound pvc-c014d1da-1aa2-4f01-a57f-87de57464ca0 2Gi RWO vsan-default-storage-policy 19m
prometheus-server Bound pvc-39cd774f-9b4c-42cd-b1ad-1042fb3273bb 8Gi RWO vsan-default-storage-policy 19m
Accessing Prometheus Web interface
Create a Host entry on your CLI-VM to access Prometheus GUI
echo "10.198.53.141 prometheus.cluster.test" | sudo tee -a /etc/hosts
Access Prometheus GUI from the CLI-VM web browser. The following URI references are served by Prometheus server Access Prometheus GUI using https://prometheus.cluster.test/
Example - Metrics based on the “node_memory_MemAvailable_bytes”
- Browse to Prometheus GUI https://prometheus.cluster.test/graph
- Type “node_memory_MemAvailable_bytes” in the Expression text box
- Select Graph Tab in the GUI to see the results in graph view.
Alternatively, you can also choose the tab “Console” on the same UI, which provides the events filtered by the query values.
To view the configured metrics https://prometheus.cluster.test/metrics
Cluster Status
Access Prometheus Cluster status GUI using https://prometheus.cluster.test/status
Grafana – Observability Dashboard
With Grafana we can create, explore and share all of our data through, flexible dashboards In this section we will go through Deploy & maintaining Grafana on TKC with the help of TanzuExtensions1.3.1
Configuration & Installation
Prerequisites
- TKC/Guest with default service Domain (cluster.local) is up and running
- On CLI-VM TKGExtension v1.3.1 package has been downloaded and unpacked.
Prepare Configuration
Create grafana-data-values file from the given samples
# cd ./tkg-extensions-v1.3.1/extensions/monitoring/grafana/
# cp grafana-data-values.yaml.example grafana-data-values.yaml
Edit Grafana configuration values
- Add an entry monitoring.grafana.secret.admin_user with base64 encoded value YWRtaW4=
- Replace <ADMIN_PASSWORD> with user’s choice base64 values
- In this example, we are using admin as the password
Note: Remember this username / password for accessing Grafana GUI “admin/admin”.
# echo admin| base64
YWRtaW4=
# vi Grafana-data.values.yaml
#@data/values
#@overlay/match-child-defaults missing_ok=True
---
monitoring:
grafana:
image:
repository: "projects.registry.vmware.com/tkg/grafana"
secret:
admin_user: YWRtaW4=
admin_password: YWRtaW4=
grafana_init_container:
image:
repository: "projects.registry.vmware.com/tkg/grafana"
grafana_sc_dashboard:
image:
repository: "projects.registry.vmware.com/tkg/grafana"
You can use the remaining default values as it is. Alternatively, one can customize the data values according to their deployment needs. Full list of config values can be found in VMware official documentation.
Create namespace and RBACK roles for Grafana
# kubectl apply -f namespace-role.yaml
namespace/tanzu-system-monitoring unchanged
serviceaccount/grafana-extension-sa created
role.rbac.authorization.k8s.io/grafana-extension-role created
rolebinding.rbac.authorization.k8s.io/grafana-extension-rolebinding created
clusterrole.rbac.authorization.k8s.io/grafana-extension-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/grafana-extension-cluster-rolebinding created
Create a secret object for Grafana
# kubectl -n tanzu-system-monitoring create secret generic grafana-data-values --from-file=values.yaml=grafana-data-values.yaml
secret/grafana-data-values created
Deploy Grafana
# kubectl apply -f grafana-extension.yaml
app.kappctrl.k14s.io/grafana created
Validate deployment
# kubectl get app grafana -n tanzu-system-monitoring
NAME DESCRIPTION SINCE-DEPLOY AGE
grafana Reconcile succeeded 55s 56s
Validate deployment with full config values
# kubectl get app grafana -n tanzu-system-monitoring -o yaml
apiVersion: kappctrl.k14s.io/v1alpha1
kind: App
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"kappctrl.k14s.io/v1alpha1","kin":"App","metadata":{"annotations":{"tmc.cloud.vmware.com/managed":"false"},"name":"grafana","namespace":"tanzu-system-monitoring"},"spec":{"deploy":[{"kapp":{"rawOptions":["--wait-timeout=5m"]}}],"fetch":[{"image":{"url":"projects.registry.vmware.com/tkg/tkg-extensions-templates:v1.3.1_vmware.1"}}],"serviceAccountName":"grafana-extension-sa","syncPeriod":"5m","template":[{"ytt":{"ignoreUnknownComments":true,"inline":{"pathsFrom":[{"secretRef":{"name":"grafana-data-values"}}]},"paths":["tkg-extensions/common","tkg-extensions/monitoring/grafana"]}}]}}
tmc.cloud.vmware.com/managed: "false"
creationTimestamp: "2021-07-13T13:11:10Z"
finalizers:
- finalizers.kapp-ctrl.k14s.io/delete
generation: 2
managedFields:
- apiVersion: kappctrl.k14s.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
…
Accessing Grafana web-interface
Grafana default configuration has been deployed with the grafana.tanzu.system URI. You can validate the object creation as following
# kubectl get httpproxy -o wide -n tanzu-system-monitoring
NAME FQDN TLS SECRET STATUS STATUS DESCRIPTION
grafana-httpproxy grafana.system.tanzu grafana-tls valid Valid HTTPProxy
prometheus-httpproxy prometheus.cluster.test prometheus-tls valid Valid HTTPProxy
FQDN grafana.system.tanzu is being served by Envoy’s EXTERNAL_IP (Contour ingress data plane).
Make a host entry in the CLI-VM (or) add a DNS A record in your DNS server with the Envoy’s EXTERNAL_IP mapping to the grafana.system.tanzu is.
Get Envoy’s EXTERNAL_IP
# kubectl get -n tanzu-system-ingress service envoy -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
envoy LoadBalancer 10.103.217.168 10.198.53.141 80:31673/TCP,443:32227/TCP 6d20h app=envoy,kapp.k14s.io/app=1625607181111946240
Add host entry on the cli-vm with Envoy’s EXTERNAL_IP map to FQDN
# echo "10.198.53.141 grafana.system.tanzu" | sudo tee -a /etc/hosts
Open Browser on you CLI-VM with https://grafana.system.tanzu/login
Logging to the Dashboard using Credentials
- User: admin
- Password: admin
Note: The first-time login will prompt to change the login password for future uses,
A successful login will give us a Grafana welcome page.
Configuring data source
Grafana has two core concepts (1) data source (2) dashboard.
Accessing default data source (Prometheus)
Grafana from TKG Extensions comes with a default data source Prometheus running on the same TKC/Guest Cluster. Navigate to left side menu panel, click on the settings->Configuration->Data Sources
Click on the Prometheus row marked as default. You can notice the connection details has already been filled in by default.
Creating & accessing Dashboard
Create your first dashboard using the web interface
Navigate to Menu->”+”->Create->Dashboard -> Click “+ Add new panel”
New panel will get created, with empty values.
- Ensure the default Prometheus data source has been selected
- Enter query “node_memory_MemAvailable_bytes”in the Metrics section & Press Shift+Enter key to execute the query
You can notice the query results in graph
App Deployment and Testing
Contour - Example with Ingress API
Sample app with Layer7 Ingress (HTTP & HTTPS)
In the same TKGExtension package we have a sample app with HTTP & https. Let’s use these examples to validate the Contour ingress using K8S standard ingress API.
In order to perform the validations, we need to create the following objects
- Deployment: App deployment manifest
- Services: Two services S1 & S2. This can be used with the example of Traffic shifting between the services (optionally) Kubernetes namespace
- Ingres Object: A Layer7 access definition. Ingress can be HTTP, HTTPS, and sending all the traffic to one service or splitting the traffic between two services.
- Secrets: TLS key-value, needed only for HTTPS use case.
In the package folder $HOME/tkg-extensions-v1.3.1/ingress/examples, we have three subfolders. The folder named Common contains the app with service, another folder http-ingress contains the HTTP ingress object definition and the folder https-ingress contains the TLS secret for the HTTP and HTTPS ingress object.
Deploy App: Deployment & Service objects. As defined in the yaml definition, Objects will be created in the test-ingress namespace.
# ls ./common/
00-namespaces.yaml 01-services.yaml 02-deployments.yaml
# kubectl apply -f common/
namespace/test-ingress created
service/s1 created
service/s2 created
deployment.apps/helloweb created
Let's verify the objects we just created. ( Deployment, pods, services…)
# kubectl get all -n test-ingress
NAME READY STATUS RESTARTS AGE
pod/helloweb-749c995f85-6zj7s 1/1 Running 0 13m
pod/helloweb-749c995f85-dmf8g 1/1 Running 0 13m
pod/helloweb-749c995f85-j9qmh 1/1 Running 0 13m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/s1 ClusterIP 10.96.253.139 <none> 80/TCP 13m
service/s2 ClusterIP 10.107.93.77 <none> 80/TCP 13m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/helloweb 3/3 3 3 13m
NAME DESIRED CURRENT READY AGE
replicaset.apps/helloweb-749c995f85 3 3 3 13m
Ingress Object for HTTP
# kubectl apply -f ./http-ingress/
ingress.extensions/http-ingress create
A: Object type. Here we are using “ingress” from the K8S standard API.
B: FQDN name through which this ingress object can be accessed.
C: subdomain/route to access backend workload
D: Backend service which processes the requests via the ingress path
Key points:
• All FQDNs are served using the Envoy (EXTERNAL LB) IP address
• We should have either DNS entry or host entry added for this FQDN path.
• All ingress & HTTPProxy objects will have the same IP address, i.e. Envoy EXTERNAL_IP
Verify Ingress object and envoy LB-IP to access that ingress
# kubectl get -n tanzu-system-ingress service envoy -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
envoy LoadBalancer 10.103.217.168 10.198.53.141 80:31673/TCP,443:32227/TCP 12h app=envoy,kapp.k14s.io/app=1625607181111946240
Since we don’t have an external DNS to resolve our FQDN yet, for now lets add a host entry for the app FQDN so that we can access the app using the http with FQDN name directly.
# echo "10.198.53.141 foo.bar.com" | sudo tee -a /etc/hosts
10.198.53.141 foo.bar.com
# cat /etc/hosts
127.0.0.1 localhost
10.198.53.141 foo.bar.com
Access first service from app http://foo.bar.com/foo using curl or Web-browser form you cli-vm
# curl http://foo.bar.com/foo
Hello, world!
Version: 1.0.0
Hostname: helloweb-749c995f85-j9qmh
Access second service from app using curl or Web-browser form your CLI-VM
# curl http://foo.bar.com/bar
Hello, world!
Version: 1.0.0
Hostname: helloweb-749c995f85-dmf8g
For HTTPS ingress object we need to create an additional object called Secret with tls.crt and tls.key values. The secret object is a standard K8S object, which we will refer to in our HTTPS ingress.
A: Object type. Here we are using “Secret” from K8S standard API.
B: Reference name for this object
C: TLS Cert value
D: TLS Key value
Key points: We will refer to this object in our next object https-ingress.
A: Referring Secret object we created earlier in our https-ingress object.
Key points: There is not much difference between http-ingress object or https-ingress object apart from an important API TLS.
# kubectl apply -f https-ingress/
Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
ingress.extensions/https-ingress configured
secret/https-secret unchanged
ingress.extensions/http-ingress create
Since we already have host entry we can test the app with using Access https://foo.bar.com/foo using curl or Web-browser form you cli-vm
# curl https://foo.bar.com/foo --insecure
Hello, world!
Version: 1.0.0
Hostname: helloweb-749c995f85-j9qmh
Let us check from the browser.
Let us check from the browser
Since it is a self-signed certificate, we should accept the browser’s security settings before we get to the page
Now let us check other subdomain https://foo.bar.com/bar using curl or Web-browser from the CLI-VM
# curl https://foo.bar.com/bar --insecure
Hello, world!
Version: 1.0.0
Hostname: helloweb-749c995f85-dmf8g
Prometheus metrics for Custom App
Implementing metrics in custom apps
App owners need to implement /metrics or another equivalent API call in their app. In order to get it scraped by Prometheus. Once the functionality available at the APP, DevOps users can enable the metrics forward by adding “annotations on pods”. Annotation must be part of the pod metadata.
Note: Logical objects such as Services, DaemonSet will have no impact with this annotation.
Example with annotation applying to workloads.
apiVersion: apps/v1beta2 # for versions before 1.8.0 use extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: weave
labels:
app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9102'
spec:
containers:
- name: fluentd-elasticsearch
image: gcr.io/google-containers/fluentd-elasticsearch:1.20
Deploy Kuard to verify setup
A very basic test to see if the K8s cluster is operational is to deploy KUARD (Kubernetes Up And Running)
Use the commands below to pull the KUARD image and assign an IP to it. (HaProxy will serve the IP from the workload subnet):
# kubectl run --restart=Never --image=gcr.io/kuar-demo/kuard-amd64:blue kuard
# kubectl expose pod kuard --type=LoadBalancer --name=kuard --port=8080
Once deployed, we can list the external IP assigned to it using the ‘get service’ command:
# kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kuard LoadBalancer 10.96.0.136 152.17.31.132 8080:30243/TCP 6s
Therefore, opening a browser to the ‘External-IP’ on port 8080, i.e. http://152.17.31.32:8080 should give us a webpage showing the KUARD output:
Persistent Volume Claims (PVC)
To create a PVC, first we need to map any storage policies (defined in vCenter) we wish to use to the supervisor namespace.
In this example, we describe how to do this with standard (block) vSAN volumes. Note, at the time of writing, using the vSAN File Service to provision RWX volumes for Tanzu is not supported.
First, create the storage policy in vCenter, under Menu > Policies and Profiles > VM Storage Policies. Note the convention of using lowercase names:
Then add them to the namespace by clicking on ‘Edit Storage’
Select any additional storage policies. In the example below, we add the new ‘raid-1’ policy:
To list all of the available storage classes, we run:
# kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
raid-1 csi.vsphere.vmware.com Delete Immediate true 3m54s
We can then create a PVC using a manifest. In the example below, we create a 2Gi volume:
2g-block-r1.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: block-pvc-r1-2g
spec:
storageClassName: raid-1
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
Then apply this:
# kubectl apply -f 2g-block-r1.yaml
To see the details:
# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
block-pvc-r1-2g Bound pvc-0a612267 2Gi RWO raid-1 51m
Now we have a volume, we can create attach this to a pod. In the example below, we create a pod using Busybox and mount the volume created above:
simple-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: simple-pod
spec:
containers:
- name: simple-pod
image: "k8s.gcr.io/busybox"
volumeMounts:
- name: block-vol
mountPath: "/mnt/volume1"
command: [ "sleep", "1000000" ]
volumes:
- name: block-vol
persistentVolumeClaim:
claimName: block-pvc-r1-2g
Once the pod has been created, we can examine the storage within it.
First we run a shell on the pod:
# kubectl exec -it simple-pod -- /bin/sh
Using the df command, we can see the volume has been attached and is available for consumption:
# df -h /mnt/volume1/
Filesystem Size Used Available Use% Mounted on
/dev/sdb 1.9G 6.0M 1.8G 0% /mnt/volume1
Furthermore, we can see the PVCs created by a Kubernetes admin in vCenter by navigating to either Datacenter > Container Volumes or Cluster > Monitor > Container Volumes:
Clicking on the square next to the volume icon shows more information about the PVC and where it is used. From our example, we see the guest cluster, the pod name “simple pod” and the PVC name given in the manifest:
Clicking on Physical Placement shows (as we are using a vSAN store) the backing vSAN details:
We can also see details of the PVC in vCenter under Cluster > Namespaces > Namespace > Storage > Persistent Volume Claims:
Here, we can see more details – specifically Kubernetes parameters, if we click on ‘View YAML’:
Wordpress & MySQL app
The Kubernetes documentation has a practical example on using PVCs using WordPress and MySQL:
https://kubernetes.io/docs/tutorials/stateful-application/mysql-wordpress-persistent-volume/
However, the PVC claims in the example manifests do not include a storage policy (which is required for the PVC to be created). To successfully deploy this app, we must either add a default storage policy into our TKC manifest or edit the manifests to define a storage policy.
The outline steps for this example app are as follows:
- Ensure that an TKC RBAC profile has been applied to the cluster (see the previous section on creating TKG clusters and granting developer access)
- Create a new directory on the jump VM
- Generate the kustomization.yaml file with a password
- Download the two manifest files for mySQL and Wordpress using curl
- Add the two files to the kustomization.yaml as shown
- Follow one of the two options below to satisfy the storage policy requirement. (For the quickest solution, copy and paste the awk line in option 2)
Thus, firstly we define our RBAC profile; as before:
# kubectl create clusterrolebinding default-tkg-admin-privileged-binding --clusterrole=psp:vmware-system-privileged --group=system:authenticated
We create a directory ‘wordpress’:
# mkdir wordpress; cd wordpress
As per the example, we generate the kustomization.yaml file, entering a password (we combine steps 3&5 for brevity):
# cat <<EOF > kustomization.yaml
secretGenerator:
- name: mysql-pass
literals:
- password=P@ssw0rd
resources:
- mysql-deployment.yaml
- wordpress-deployment.yaml
EOF
Then download the two manifests:
# curl -LO https://k8s.io/examples/application/wordpress/mysql-deployment.yaml
# curl -LO https://k8s.io/examples/application/wordpress/wordpress-deployment.yaml
Looking at the manifest file wordpress-deployment.yaml:
Wordpress-deployment.yaml
apiVersion: v1
kind: Service
metadata:
name: wordpress
labels:
app: wordpress
spec:
ports:
- port: 80
selector:
app: wordpress
tier: frontend
type: LoadBalancer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: wp-pv-claim
labels:
app: wordpress
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress
labels:
app: wordpress
spec:
selector:
matchLabels:
app: wordpress
tier: frontend
strategy:
type: Recreate
template:
metadata:
labels:
app: wordpress
tier: frontend
spec:
containers:
- image: wordpress:4.8-apache
name: wordpress
env:
- name: WORDPRESS_DB_HOST
value: wordpress-mysql
- name: WORDPRESS_DB_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-pass
key: password
ports:
- containerPort: 80
name: wordpress
volumeMounts:
- name: wordpress-persistent-storage
mountPath: /var/www/html
volumes:
- name: wordpress-persistent-storage
persistentVolumeClaim:
claimName: wp-pv-claim
We notice that:
- It creates a Loadbalancer service in the first instance. This will interact with the network provider we have provisioned (either HaProxy/NSX ALB or NCP in the case for NSX-T).
- A Persistent Volume Claim of 20GB is instantiated
- The WordPress containers are specified (to be pulled/downloaded)
Now, there is no mapping to a storage class given, so as-is this deployment will fail. There are two options to add this:
- Option 1: Patch or Edit the TKC manifest to add a default StorageClass
-
Here, we will define a default storage policy ‘defaultClass’ for our TKG cluster. First change context to the namespace that the TKG cluster resides. In the example below, this is ‘ns01’:
# kubectl config use-context ns01
Then patch with the storage class we want to make the default; in this case “vsan-default-storage-policy”:
# kubectl patch storageclass vsan-default-storage-policy -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}
Alternatively, another way to achieve this to edit the tkc manifest for your TKG cluster, for instance:
# kubectl edit tkc/tkgcluster1
Then add the following lines under spec/settings:
storage: defaultClass: <storage policy>
For example, we add the ‘vsan-default-storage-policy’:
spec: distribution: fullVersion: v1.17.8+vmware.1-tkg.1.5417466 version: v1.17.8 settings: network: cni: name: antrea pods: cidrBlocks: - 192.168.0.0/16 serviceDomain: cluster.local services: cidrBlocks: - 10.96.0.0/12 storage: defaultClass: vsan-default-storage-policy
We should then see the effects when running a ‘get storageclass’:
# kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE vsan-default-storage-policy (default) csi.vsphere.vmware.com Delete Immediate true 40h
For more details on the default StorageClass, see the Kubernetes documentation, https://kubernetes.io/docs/tasks/administer-cluster/change-default-storage-class/
For more details on editing the TKC manifest, see the documentation: https://via.vmw.com/tanzu_update_manifest
- Option 2: Edit the app manifest files to explicitly add the storage class:
-
Add the following line to the two manifest files after the line ‘- ReadWriteOnce’
storageClassName: <storage policy>
For example:
spec: accessModes: - ReadWriteOnce storageClassName: vsan-default-storage-policy
We could also use a script to add this line in to both files. For example, using awk:
# for x in $(grep -l 'app: wordpress' *); do awk '/ReadWriteOnce/{print;print " storageClassName: vsan-default-storage-policy";next}1' $x >> ${x/.yaml/}-patched.yaml; done
Patched versions are also available in the Github repository
# kubectl apply -k ./
Once the manifests are applied, we can see that the PVC has been created:
# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS STORAGECLASS
mysql-pv-claim Bound pvc-6d9d 20Gi RWO vsan-default-storage-policy
wp-pv-claim Bound pvc-1906 20Gi RWO vsan-default-storage-policy
We can see that the Loadbalancer service has been created with a dynamic IP address. The external IP can be obtained from the service ‘wordpress’:
# kubectl get services wordpress
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
wordpress LoadBalancer 10.101.154.101 172.168.61.132 80:31840/TCP 3m21s
If we were to have a look within these network providers, we would see our service there
For example, in NSX ALB, if we navigate to Applications > Virtual Services:
Further settings, logs, etc. can then be explored inside of the network provider.
In vCenter, we can see that the PVC volumes have been created and tagged with the application name:
Finally, putting the external IP (in this case 172.168.61.132) into a browser should give the WordPress setup page:
To remove the app,
# kubectl delete -k ./
Re-deploy WordPress app with a Static Load balancer address
Earlier we saw that the load balancer address (172.168.161.105) had been automatically assigned. With NSX-T and NSX ALB, we can statically define the load balancer address.
We edit our load balancer spec, defined in wordpress-deployment.yaml, and add the extra line ‘loadBalancerIP’ pointing to the address 172.168.161.108:
apiVersion: v1
kind: Service
metadata:
name: wordpress
labels:
app: wordpress
spec:
ports:
- port: 80
selector:
app: wordpress
tier: frontend
type: LoadBalancer
loadBalancerIP: 172.168.161.108
Apply this again:
# kubectl apply -k ./
We can confirm that the service uses the static IP:
# kubectl get service wordpress
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
wordpress LoadBalancer 10.107.115.82 172.168.161.108 80:30639/TCP 5m1s
For more information on using the load balancer service with a static IP address, see the example given in the official documentation (which also covers an important security consideration): https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-83060EA7-991B-4E1E-BBE4-F53258A77A9C.html
Developer Self-Service Namespace: Create a new Supervisor Namespace and TKC
Supervisor Namespaces provide logical segmentation between sets of resources and permissions. Traditionally, a vSphere admin manages infrastructure resources that are then made available into environments for users to consume. Whilst this model ensures that the vSphere admin is able to fairly manage resources across the organisation, there is an operational overhead to this.
Here, we give a devops user the ability to create Supervisor Namespaces, using a resource template that has been created by the vSphere admin. Then we show how the devops user can make use of this to create another TKG cluster.
First, in vCenter, navigate to the cluster that has Workload Management enabled, then navigate to Configure > Namespaces > General. Expand the ‘Namespace Service’ box and toggle to enable:
This will then bring up a configuration window for a new template, for resource assignment:
Add permissions to an existing devops user:
And confirm:
The devops user (as assigned permissions by the vSphere admin) is now able to create supervisor namespaces.
First, we switch contexts to the supervisor namespace:
# kubectl config use-context 172.168.161.101
Switched to context "172.168.161.101"
Then create the namespace:
# kubectl create namespace ns3
namespace/ns3 created
To ensure the local information is synchronised, re-issue a login (a logout is not needed).
Switch to the new namespace:
# kubectl config use-context ns3
Switched to context "ns3"
To create our TKC, we define our manifest, as before:
TKG-deploy.yaml
apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
name: tkgcluster2
namespace: ns3
spec:
distribution:
version: 1.20.2+vmware.1-tkg.1.1d4f79a
topology:
controlPlane:
class: best-effort-small
count: 1
storageClass: vsan-default-storage-policy
workers:
class: best-effort-small
count: 3
storageClass: vsan-default-storage-policy
And apply:
# kubectl apply -f TKG-deploy.yaml
tanzukubernetescluster.run.tanzu.vmware.com/tkgcluster1 created
As before, we can watch the deployment:
# kubectl get tkc tkgcluster2 -o yaml -w
For more information on the self-service namespaces, visit: https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-tanzu/GUID-BEEA763E-43B7-4923-847F-5E0398174A88.html
Deploy a Private Registry VM using the VM Service and add to TKG Service Config
The VM Service is a new feature available in vSphere 7 Update 2a which allows you to provision VMs using kubectl within a Supervisor Namespace, thus allowing developers the ability to deploy and manage VMs in the same way they manage other Kubernetes resources.
Note that a VM created through the VM Service can only be managed using kubectl: vSphere administrators can see the VM in vCenter, but cannot edit or otherwise alter the VM, but can display its details and monitor resources it uses. For information, see Monitor VMs in the vSphere Client.
We also have the ability, from vSphere 7 Update 2a, to use private registries for TKG clusters.
In this example, we will use the VM Service feature to deploy a VM as a devops user and then install a Harbor registry on it. Finally, we will use that Harbor instance as a private registry for a TKG cluster.
First the VI-admin must configure the VM service in vCenter.
Similar to TKG, we need to setup a content library to pull from. At the time of writing, CentOS and Ubuntu images are available for testing from the VMware Marketplace:
https://marketplace.cloud.vmware.com
To obtain a subscription link, first sign in using your ‘myvmware’ credentials.
Clicking on ‘Subscribe’ will take you through the wizard to enter settings and accept the EULA:
The site will then create a Subscription URL:
See the VMware Marketplace documentation for more details, https://docs.vmware.com/en/VMware-Marketplace/services/vmware-marketplace-for-consumers/GUID-0BB96E5E-123F-4BAE-B663-6C391F57C884.html
Back in vCenter, create a new content library with the link provided:
We then proceed to configure a namespace. If needed, create a new namespace and note the ‘VM Service’ info box:
Add at least one VM class:
Further VM classes can be defined by navigating to Workload Management > Services > VM Service > VM Classes
Add the content library configured above:
Now the service is ready, the rest of the steps can be performed as a devops user.
Deploy VM using VM Service
As usual, login to our cluster and switch contexts to the configured namespace. We can then see the Virtual Machine images available (we exclude the TKG images for our purposes):
# kubectl get vmimage | grep -v tkg
NAME OSTYPE FORMAT AGE
bitnami-jenkins-2.222.3-1 otherLinux64Guest ovf 2d2h
centos-stream-8-vmservice-v1alpha1.20210222 centos8_64Guest ovf 2d2h
Here we will deploy the CentOS image.
First, we create a file named ‘centos-user-data’ that captures the user, password and any customisation parameters. Use the following as a guide, replacing the password and authorized keys, etc.:
chpasswd:
list: |
centos:P@ssw0rd
expire: false
packages:
- wget
- yum-utils
groups:
- docker
users:
- default
- name: centos
ssh-authorized-keys:
- ssh-rsa AAAAB3NzaC1yc2EA… root@tkg.vmware.corp
sudo: ALL=(ALL) NOPASSWD:ALL
groups: sudo, docker
shell: /bin/bash
network:
version: 2
ethernets:
ens192:
dhcp4: true
Next, we encode that file in base64 (and remove any newlines):
# cat centos-user-data | base64 | tr -d '\n'
I2Nsb3VkLWNvbmZpZwpjaHBhc3N3ZDoKICAgIGxpc3Q6IHwKICAgICAgdWJ1bnR1OlBAc3N3MHJkCiAgICBleHBpcmU6IGZhbHNlCnBhY2thZ2VfdXBncmFkZTogdHJ1ZQpwYWNrYWdlczoKICAtIGRvY2tlcgpncm91cHM6CiAgLSBk
For the next step, re-confirm the network name that was defined:
# kubectl get network
network-1
Then we create a manifest for the VM (cloudinit-centos.yaml) and add the encoded line in the previous step, under ‘user-data’. Note the values for the namespace, network, class name, image name, storage class, and hostname and adjust accordingly:
apiVersion: vmoperator.vmware.com/v1alpha1
kind: VirtualMachine
metadata:
name: centos-vmsvc
namespace: ns2
spec:
networkInterfaces:
- networkName: network-1
networkType: vsphere-distributed
className: best-effort-small
imageName: centos-stream-8-vmservice-v1alpha1.20210222
powerState: poweredOn
storageClass: vsan-default-storage-policy
vmMetadata:
configMapName: centos-vmsvc
transport: OvfEnv
---
apiVersion: v1
kind: ConfigMap
metadata:
name: centos-vmsvc
namespace: ns2
data:
user-data: |
I2Nsb3VkLWNvbmZpZwpjaHBhc3N3ZDoKICAgICAgdWJ1bn…
hostname: centos-vmsvc
Note: ensure that the base64 encoded data is indented. Use a yaml validator, such as yamlint to make sure the format is correct.
We then apply this manifest:
# kubectl apply -f cloudinit-centos.yaml
We should see this now being created:
# kubectl get vm
NAME POWERSTATE AGE
centos-vmsvc 4s
Just like the TKC deployment, we can watch the status (and wait for the IP address):
# kubectl get vm centos-vmsvc -o yaml -w
Once the VM has been deployed, we can query the IP address:
# kubectl get vm centos-vmsvc -o yaml | grep Ip
f:vmIp: {}
vmIp: 172.168.161.6
We should be able to login to our VM. If the private key was added to the manifest, this should drop straight to a prompt:
# ssh centos@172.168.161.6
[centos@centos-vmsvc ~]$
Prepare the deployed VM and Install Harbor
We need to prepare the VM by installing Docker:
❯ sudo yum-config-manager --add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
❯ sudo yum install -y docker-ce docker-ce-cli containerd.io
See https://docs.docker.com/engine/install/centos/ for further details on installing Docker on CentOS
Next, within our new VM, we’ll download the Harbor installation script, as per the guide at https://goharbor.io/docs/2.0.0/install-config/quick-install-script/
❯ wget https://via.vmw.com/harbor-script
And set execute permissions and run it:
❯ chmod +x harbor.sh
❯ sudo ./harbor.sh
Follow the prompts (install using the IP address).
Next, copy the Harbor manifest template:
❯ sudo cp harbor/harbor.yml.tmpl harbor/harbor.yml
Edit the Harbor manifest file and update the hostname field with the IP address of the VM.
For example:
# Configuration file of Harbor
# The IP address or hostname to access admin UI and registry service.
# DO NOT use localhost or 127.0.0.1, because Harbor needs to be accessed by external clients.
hostname: 172.168.161.6
For the next step, we will need to create a self-signed certificate, as per: https://goharbor.io/docs/1.10/install-config/configure-https/
First the CA cert, remember to update as needed:
❯ openssl genrsa -out ca.key 4096
❯ openssl req -x509 -new -nodes -sha512 -days 3650 \
-subj "/C=CN/ST=UK/L=UK/O=example/OU=Personal/CN=172.168.161.6" \
-key ca.key -out ca.crt
Then the Server Cert, updating the site name as needed:
❯ openssl genrsa -out testdmain.com.key 4096
❯ openssl req -sha512 -new \
-subj "/C=CN/ST=UK/L=UK/O=example/OU=Personal/CN=172.168.161.6" \
-key testdmain.com.key \
-out testdmain.com.csr
❯ cat > v3.ext <<-EOF
authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
IP.1=172.168.161.6
EOF
❯ openssl x509 -req -sha512 -days 3650 \
-extfile v3.ext \
-CA ca.crt -CAkey ca.key -CAcreateserial \
-in testdmain.com.csr \
-out testdmain.com.crt
We will then need to copy the cert files to the appropriate directory:
❯ sudo cp testdmain.com.* /etc/pki/ca-trust/source/anchors/
Run the following command to ingest the certificates:
❯ sudo update-ca-trust
Convert the crt file for use by Docker and copy:
❯ openssl x509 -inform PEM -in testdmain.com.crt -out testdmain.com.cert
❯ sudo mkdir -p /etc/docker/certs.d/testdmain.com/
❯ sudo cp testdmain.com.cert /etc/docker/certs.d/testdmain.com/
❯ sudo cp testdmain.com.key /etc/docker/certs.d/testdmain.com/
Restart Docker:
❯ sudo systemctl restart docker
Now, we must configure Harbor to use the certificate files:
❯ sudo vi harbor/harbor.yml
In the https section, update the certificate and private key lines to point to the correct files, for example:
certificate: /etc/pki/ca-trust/source/anchors/testdmain.com.crt
private_key: /etc/pki/ca-trust/source/anchors/testdmain.com.key
Next, we run the Harbor prepare script:
❯ cd harbor
❯ sudo ./prepare
Then restart the Harbor instance:
❯ sudo docker-compose down -v
❯ sudo docker-compose up -d
Wait for the services to start and logout of the CentOS VM.
Configure the TKG Service to Trust the Deployed Repository
Test the instance by using a browser to navigate to the IP address of the CentOS VM. The Harbor login page should be seen:
The default credentials are:
admin / Harbor12345
We can also test access using ‘docker login’. First obtain the certificate and store locally:
# echo | openssl s_client -connect 172.168.161.6:443 2>/dev/null -showcerts | openssl x509 > harbor.crt
Then move the certificate into the OS’ cert store. For Photon OS/TKG Appliance this is /etc/ssl/certs.
# mv harbor.crt > /etc/ssl/certs
Then update the OS to use the new certificate (a reboot may be needed).
Finally, login to the Harbor instance, i.e. (credentials are admin/Harbor12345) – there should not be any certificate errors or warnings:
# docker login 172.168.161.6
Next, we will configure the TKG service to be able to use this registry.
Get the certificate form the CentOS VM in base64 format:
# echo | openssl s_client -connect 172.168.161.6:443 2>/dev/null -showcerts | openssl x509 | base64 | tr -d '\n'
We can then add this to a manifest to amend the TKG service configuration. Here we create ‘tks.yaml’. Add the certificate from the previous step:
tks.yaml
apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TkgServiceConfiguration
metadata:
name: tkg-service-configuration
spec:
defaultCNI: antrea
trust:
additionalTrustedCAs:
- name: harbor-ca
data: [CERT GOES HERE]
As usual, apply:
# kubectl apply -f tks.yaml
Thus, any new TKG clusters created will automatically trust the registry.
For more information on the VM service, see: https://core.vmware.com/blog/introducing-virtual-machine-provisioning-kubernetes-vm-service. This blog article also includes a GitHub repository with examples.
For more information on private registry support, see: https://core.vmware.com/blog/vsphere-tanzu-private-registry-support
Pulling from a Private Repository
In the previous exercise, we created a private Harbor repository to use with any new TKG clusters created. Here, we will push an image to the private repository and pull it into our TKG cluster.
First, obtain a test container, for instance busybox:
# docker pull busybox
We can then push this to our Harbor instance. First login to the Harbor instance (replacing the IP address with your own):
# docker login 172.168.161.6
Next, tag the image and provide a repository name to save to:
# docker tag busybox:latest 172.168.161.6/library/myrepo:busybox
Finally, push the image:
# docker push 172.168.161.6/library/myrepo:busybox
See the Harbor documentation for further details on pushing images, https://goharbor.io/docs/1.10/working-with-projects/working-with-images/pulling-pushing-images/
Looking at our Harbor UI, under Projects > library > myrepo we can see that the image has been pushed.
Click on the image to bring up the information screen:
Clicking on the squares next to the image gives the pull command. Confirm that this is the same image we have tagged above.
Next, we create a Namespace and a new TKG cluster (see the section earlier in this guide). Login to this new TKG cluster.
We then create a simple manifest that will pull the container. Replace image string with the name saved from the Harbor UI.
We’ll call this manifest bb.yaml:
bb.yaml
apiVersion: v1
kind: Pod
metadata:
name: busybox
labels:
app: busybox
spec:
containers:
- image: "172.168.161.6/library/myrepo:busybox"
command:
- sleep
- "3600"
imagePullPolicy: Always
name: busybox
restartPolicy: Always
Then apply:
# kubectl apply -f bb.yaml
This should pull very quickly, and we can get and describe the pod:
# kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox 1/1 Running 0 28m
Further Examples
Further examples of workloads on Tanzu Kubernetes Clusters can be found in the official documentation:
Lifecycle Operations
Scale Out Tanzu Kubernetes Clusters
Scaling out Tanzu Kubernetes Clusters involves changing the number of nodes. You can increase the number of control-plane VMs, Worker VMs or both at the same time.
There are a couple of methods to approach this.
Method 1: Edit the YAML file used for deployment and apply the file just as it was done to create the TKC.
Method 2: Use Kubectl edit to directly edit this YAML file. After the file is saved, the changes will be triggered.
We will focus on Method 2 since this is a more automated approach over method 1.
First, switch to namespace where TKC lives
# kubectl config use-context tkgcluster1
Then list TKG clusters:
# kubectl get tkc
NAME CONTROL PLANE WORKER DISTRIBUTION
tkgcluster1 1 3 1.18.15+vmware.1-tkg.1.600e412
Here we can see that there is only one cluster, and it has 1 control-plane VM and 3 worker VMs.
Edit the TKC manifest
# kubectl edit tkc/tkgcluster1
The cluster manifest will open in the text editor defined by your KUBE_EDITOR or EDITOR environment variable
Locate the ‘topology’ section and change controlPlane count from 1 to 3:
topology:
controlPlane:
class: best-effort-xsmall
count: 3
storageClass: vsan-default-storage-policy
workers:
class: best-effort-xsmall
count: 3
storageClass: vsan-default-storage-policy
Save the file.
You can ‘see’ the VM creation using the watch command with jq:
# watch 'kubectl get tkc -o json | jq -r '.items[].status.vmStatus''
We can see that there are now 3 control-plane VMs:
# kubectl get tkc
NAME CONTROL PLANE WORKER DISTRIBUTION
tkgcluster1 3 3 1.18.15+vmware.1-tkg.1.600e412
In vCenter, we see that the extra VMs have been created
In the same manner, you can scale out by increasing the number of worker nodes.
First, switch to the Supervisor Namespace where the TKG cluster resides:
# kubectl config use-context ns1
Then list the available TKG Clusters
# kubectl get tkc
NAME CONTROL PLANE WORKER DISTRIBUTION
tkgcluster1 3 3 1.18.15+vmware.1-tkg.1.600e412
Here we can see that there is only one cluster, and it has 3 control-plane VMs and 3 worker VMs.
Edit the TKC manifest
# kubectl edit tkc/tkgcluster1
The cluster manifest will open in the text editor defined by your KUBE_EDITOR or EDITOR environment variable (vi by default)
As before, locate the ‘topology’ section. Change workers count from 3 to 5 and save the file:
topology:
controlPlane:
class: best-effort-xsmall
count: 3
storageClass: vsan-default-storage-policy
workers:
class: best-effort-xsmall
count: 5
storageClass: vsan-default-storage-policy
We can see that there are now 5 worker VMs:
# kubectl get tkc
NAME CONTROL PLANE WORKER DISTRIBUTION
tkgcluster1 3 5 1.18.15+vmware.1-tkg.1.600e412
Again, in vCenter, the new VMs can be seen:
Scale-In Tanzu Kubernetes Clusters
Scaling in Tanzu Kubernetes Clusters is just as easy as scaling out. The same procedure applies with the exception that this time we will decrease the number of worker nodes. Note that the Control Pane cannot be scaled in.
First, switch to the Supervisor Namespace where the TKG cluster resides:
# kubectl config use-context ns1
Then list the available TKG Clusters
# kubectl get tkc
NAME CONTROL PLANE WORKER DISTRIBUTION
tkgcluster1 3 5 1.18.15+vmware.1-tkg.1.600e412
Edit the TKC manifest
# kubectl edit tkc/tkgcluster1
The cluster manifest will open in the text editor defined by your KUBE_EDITOR or EDITOR environment variable (vi by default)
Like previously, locate ‘topology’ section and then decrease the number of worker nodes and save the file:
topology:
controlPlane:
class: best-effort-xsmall
count: 3
storageClass: vsan-default-storage-policy
workers:
class: best-effort-xsmall
count: 3
storageClass: vsan-default-storage-policy
We can see that the number of workers scales in back to 3:
# kubectl get tkc
NAME CONTROL PLANE WORKER DISTRIBUTION
tkgcluster1 3 3 1.18.15+vmware.1-tkg.1.600e412
Update Tanzu Supervisor Cluster
To update one or more Supervisor clusters, including the version of Kubernetes for the environment and the infrastructure supporting TKG clusters, you perform a vCenter and Namespace upgrade.
Note: it is necessary to upgrade the Supervisor Cluster first before upgrading any TKG clusters.
Upgrade vCenter
There are several methods to upgrading the vCenter appliance. Follow VMware’s best practices while conducting this upgrade.
Upgrade details located in the official documentation for Upgrading the vCenter Appliance.
Procedure to upgrade Namespace:
- Log in to the vCenter Server as a vSphere administrator.
- Select Menu > Workload Management.
- Select the Namespaces > Updates tab.
- Select the Available Version that you want to update to.
- For example, select the version v1.18.2-vsc0.0.5-16762486.
Note: You must update incrementally. Do not skip updates, such as from 1.16 to 1.18. The path should be 1.16, 1.17, 1.18.
- Select one or more Supervisor Clusters to apply the update to.
- To initiate the update, click Apply Updates.
- Use the Recent Tasks pane to monitor the status of the update.
Update Tanzu Kubernetes Clusters
As opposed to the Supervisor cluster, which is administered and upgraded in vCenter, the child TKG clusters need to be updated using the standard Kubenetes toolset.
Updating the Tanzu Kubernetes Cluster includes variables such as version, virtual machine class, and storage class. However, there are several methods of updating this information for TKG clusters. You can refer to the official documentation for further details.
This approach includes utilizing commands such as kubectl edit, kubectl patch, and kubectl apply.
For this guide, we will highlight one of the “Patch” method to perform an in-place update of the cluster.
To upgrade the Kubernetes version we will create a variable and apply it to the cluster using the patch command. The approach demonstrated here uses the UNIX shell command read to take input from the keyboard and assign it to a variable named $PATCH.
The kubectl patch command invokes the Kubernetes API to update the cluster manifest. The ‘–-type merge’ flag indicates that the data contains only those properties that are different from the existing manifest.
First, we will need to change ‘fullVersion’ parameter to ‘null’. The ‘version’ parameter should then be changed to the version of Kubernetes we want to upgrade to.
For this exercise, we have our TKG cluster deployed at version v1.18.15 that will be upgraded to version v1.19.7
We can inspect the current version of our TKG cluster:
# kubectl get tkc tkgcluster1 -o json | jq -r '.spec.distribution'
{
"fullVersion": "1.18.15+vmware.1-tkg.1.600e412",
"version": "1.18.15+vmware.1-tkg.1.600e412"
}
Looking at our available versions, we can see that we have versions from 1.16.12 - 1.20.2 available
# kubectl get tkr
NAME VERSION
v1.16.12---vmware.1-tkg.1.da7afe7 1.16.12+vmware.1-tkg.1.da7afe7
v1.16.14---vmware.1-tkg.1.ada4837 1.16.14+vmware.1-tkg.1.ada4837
v1.16.8---vmware.1-tkg.3.60d2ffd 1.16.8+vmware.1-tkg.3.60d2ffd
v1.17.11---vmware.1-tkg.1.15f1e18 1.17.11+vmware.1-tkg.1.15f1e18
v1.17.11---vmware.1-tkg.2.ad3d374 1.17.11+vmware.1-tkg.2.ad3d374
v1.17.13---vmware.1-tkg.2.2c133ed 1.17.13+vmware.1-tkg.2.2c133ed
v1.17.17---vmware.1-tkg.1.d44d45a 1.17.17+vmware.1-tkg.1.d44d45a
v1.17.7---vmware.1-tkg.1.154236c 1.17.7+vmware.1-tkg.1.154236c
v1.17.8---vmware.1-tkg.1.5417466 1.17.8+vmware.1-tkg.1.5417466
v1.18.10---vmware.1-tkg.1.3a6cd48 1.18.10+vmware.1-tkg.1.3a6cd48
v1.18.15---vmware.1-tkg.1.600e412 1.18.15+vmware.1-tkg.1.600e412
v1.18.5---vmware.1-tkg.1.c40d30d 1.18.5+vmware.1-tkg.1.c40d30d
v1.19.7---vmware.1-tkg.1.fc82c41 1.19.7+vmware.1-tkg.1.fc82c41
v1.20.2---vmware.1-tkg.1.1d4f79a 1.20.2+vmware.1-tkg.1.1d4f79a
We construct our ‘PATCH’ variable:
# read -r -d '' PATCH <<'EOF'
spec:
distribution:
fullVersion: null # set to null as just updating version
version: v1.19.7
EOF
Then we apply the patch to the existing tkc that we are targeting. The system should return that the TKG cluster has been patched:
# kubectl patch tkc tkgcluster1 --type merge --patch "$PATCH"
tanzukubernetescluster.run.tanzu.vmware.com/tkgcluster1 patched
Check the status of the TKG cluster; we can see that the ‘phase’ is shown as ‘updating’:
# kubectl get tkc
NAME CONTROL PLANE WORKER DISTRIBUTION AGE PHASE
tkgcluster1 1 3 v1.19.7+vmware.1-tkg.1.fc82c41 7m updating
In vCenter, we can see a rolling upgrade of the control-plane VMs, as well as the workers: new VMs will be created with the new version of Kubernetes (and once that is completed, it deletes the old version). This will be done one VM at a time, starting with the control-plane, until they are all completed.
After a few minutes, you will see that status will change from updating to running, at which point you can verify the cluster by running:
# kubectl get tkc
NAME CONTROL PLANE WORKER DISTRIBUTION
tkgcluster1 3 5 1.19.7+vmware.1-tkg.1.fc82c41
Delete Operations
Destroy TKC and related objects
In order to delete a Tanzu Kubernetes Cluster, first switch to the Supervisor Namespace where the cluster is located. Visually, this can be seen in vCenter:
We change context to the Supervisor Namespace that contains the TKG cluster that we would like to destroy:
# kubectl config use-context ns1
Double-check the namespace is the correct one; a star next to the name indicates the currently selected context:
# kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
172.168.161.101 172.168.161.101 wcp: ...
* ns1 172.168.161.101 wcp: ... ns1
ns2 172.168.161.101 wcp: ... ns2
ns3 172.168.161.101 wcp: ... ns3
See which TKG cluster(s) reside in the namespace:
# kubectl get tkc
NAME CONTROL PLANE WORKER DISTRIBUTION AGE PHASE
tkgcluster1 1 3 v1.20.2+vmware... 10d running
Prior to deletion, conduct a search for the TKG cluster within the vCenter search field to see all related objects:
Finally, to the delete TKG cluster, in this case with the name ‘tkgcluster1’:
# kubectl delete tkc tkgcluster1
tanzukubernetescluster.run.tanzu.vmware.com "tkgcluster1" deleted
vCenter will have tasks regarding the deletion of the TKG cluster and all related objects:
From vCenter, we can see that there are no more resources relating to the TKG cluster:
Delete Namespaces
To delete namespaces from the UI, navigate to Menu > Workload Management > Namespaces. Select the Namespace to be removed, then click on the namespace and click remove
Note, ensure that there are no TKG clusters contained within the namespace before removal.
Delete Supervisor Cluster and Confirm Resources are Released
The supervisor cluster gets deleted when you disable Workload Management for a specific cluster. This action will also delete any existing Namespaces and TKG clusters that exists within this cluster. Proceed with caution when disabling Workload Management for a cluster.
You can first verify the supervisor cluster member by using the following command:
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
421c2fba09ab60c0ffe80c27a82d04af Ready master 12d v1.19.1+wcp.3
421c4fcf29033faecfb403bb13656a39 Ready master 12d v1.19.1+wcp.3
421cefbcbfaeb030defbb8fcec097c48 Ready master 12d v1.19.1+wcp.3
From vCenter, use the search field to look for ‘supervisor’. This will return the supervisor VMs. You can add the DNS Name field and compare this with the output from the ‘kubectl get nodes’ command:
Once you have verified the supervisor cluster, you can delete this cluster and all other objects within this cluster by going to Menu > Workload Management > Select Clusters tab > Select the cluster to be deleted > Click DISABLE to remove the cluster and all of its objects
In this case you can see that the supervisor cluster houses a namespace and TKG cluster
You will receive a confirmation prompt prior to continuing with the deletion task:
Once you select the check box and click Disable, you will see some tasks such as powering off the TKC workers, deleting these virtual machines, deleting related folders, and lastly shutting down and deleting the Supervisor Cluster VMs.
When the tasks are complete, the clusters tab will no longer have the previously selected cluster and you will not be able to connect to it via kubectl as the cluster no longer exists.
Lifecycle Operations - TKG Extension
Contour Ingress
Day 1 Ops – Log Management
In this section we will examine few key Day2 activities on Contour ingress. Contour components are running under two different apps (a) contour (b) envoy
Let’s extract the POD details for the Envoy & Contour
# kubectl get pods -n tanzu-system-ingress
NAME READY STATUS RESTARTS AGE
contour-d968f749d-8tvl4 1/1 Running 5 14h
contour-d968f749d-jmmkm 1/1 Running 5 14h
envoy-2kgxs 2/2 Running 0 14h
envoy-4lmxc 2/2 Running 0 14h
envoy-t8nc5 2/2 Running 0 11h
Now we know the pod details contour and envoy, we can extract the logs for troubleshooting purpose.
Extract Contour logs by using the pod name we listed before
# kubectl logs contour-d968f749d-8tvl4 -c contour -n tanzu-system-ingress
time="2021-07-07T01:58:04Z" level=info msg="args: [serve --incluster --xds-address=0.0.0.0 --xds-port=8001 --envoy-service-http-port=80 --envoy-service-https-port=443 --contour-cafile=/certs/ca.crt --contour-cert-file=/certs/tls.crt --contour-key-file=/certs/tls.key --config-path=/config/contour.yaml]"
time="2021-07-07T01:58:05Z" level=info msg="Watching Service for Ingress status" envoy-service-name=envoy envoy-service-namespace=tanzu-system-ingress
Extract Envoy logs by using the pod name we listed before
# kubectl logs envoy-2kgxs -c envoy -n tanzu-system-ingress
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:324] initializing epoch 0 (base id=0, hot restart version=11.104)
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:326] statically linked extensions:
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:328] envoy.resolvers: envoy.ip
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:328] envoy.retry_priorities: envoy.retry_priorities.previous_priorities
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:328] envoy.thrift_proxy.transports: auto, framed, header, unframed
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:328] envoy.udp_listeners: quiche_quic_listener, raw_udp_listener
[2021-07-06 21:33:22.912][1][info][main] [source/server/server.cc:328] envoy.filters.http: envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.ext_proc, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.admission_control, envoy.filters.http.aws_lambda, envoy.filters.http.aws_request_signing, envoy.filters.http.buffer, envoy.filters.http.cache, envoy.filters.http.cdn_loop, envoy.filters.http.compressor, envoy.filters.http.cors, envoy.filters.http.csrf, envoy.filters.http.decompressor,
Since Envoy is the actual data-plane and dynamically implements filters to fulfil Devops ingress object requests, Envoy logs are more important to trouble shoot.
In a production / scaled environment, It is hard to go through the logs for each pod, CRDs and other objects. We can Simplify this by forwarding the logs to log server, and metrics to metrics server and Dashboards. We will explore the necessary tooling like Fluentbit, Prometheus, Grafana which are already part of TKGExtensions package.
Day 1 Ops - Config changes
Contour is highly configurable ingress, providing various options to customize the contour deployment according to the customer environmental needs. These configurations are broadly categorised under two sections (1) Contour.config (2) envoy.config
Contour & envoy config values can be found at
# ls $HOME/tkg-extensions-v1.3.1/ingress/contour/03-contour.yaml
# ls $HOME/tkg-extensions-v1.3.1/ingress/contour/03-envoy.yaml
Few config values for example
- contour.namespace: Namespaces on which contour and its packaged obejcts can be deployed. Organizations might have standard mechanism to define namespaces according their defined standards.
- Contour.config.default.HTTPVersion : A default HTTPversion need to be used by Contour
- contour.config.timeouts.requestTimeout:Timout for an entire ingress request.
- Envoy.hostPort.http : Port number for http requests , defaulted to 80
- Envoy.hostPort.https: Port number for https requests, defaulted to 443
Note: Config params with timeout value : Zero means, no value been set in contour, then Contour fall backs on Envoy default values. A full list of config values can be extracted from VMware official docs
Day 2 Ops - upgrade Contour
Like other immutable architectural pattern, the best way to upgrade is to Delete the Contour and re-install the new Ingress.
Note: You should take a backup of current config entries before you delete, and can be restored once the new version has been installed. This way the configs will remain same even after the upgrade.
Config file to be backed up
kubectl get secret contour-data-values -n tanzu-system-ingress -o 'go-template={{ index .data "values.yaml" }}' | base64 -d > contour-data-values.yaml
Day 2 Ops - Deleting Contour
Like other TKGExtensions, Contour can be deleted, upgraded, changed any time without impacting the core K8S setup. To delete Contour ingress, we shall need to delete the following objects
- Contour app
- Namespace containing contour and its dependency objects/
- Roles created for Contour
- Delete app
# kubectl delete app contour -n tanzu-system-ingress
app.kappctrl.k14s.io "contour" deleted
Delete NameSpace & roles
# kubectl delete -f namespace-role.yaml
FluentBit - Log forwarder
Day 1 Ops -Troubleshooting
Extracting FluentBit data values configured on the cluster
# kubectl get secret fluent-bit-data-values -n tanzu-system-logging -o 'go-template={{ index .data "values.yaml" }}' | base64 -d > fluent-bit-data-values.yaml
# cat fluent-bit-data-values.yaml
Note: in K8S secretes are base64 encoded, hence we shall decode the secret values with base64 to make it readable for us
Check the pods for FluentBit app
# kubectl get pods -n tanzu-system-logging
NAME READY STATUS RESTARTS AGE
fluent-bit-bxqf5 1/1 Running 0 17m
fluent-bit-dpmpf 1/1 Running 0 17m
fluent-bit-h72hp 1/1 Running 0 17m
fluent-bit-r9dq9 1/1 Running 0 17m
Read logs generated by FluentBit container running inside the pods
# kubectl logs pod/fluent-bit-7qg5h -c fluent-bit -n tanzu-system-logging
Fluent Bit v1.6.9
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2021/07/12 16:25:38] [ info] [engine] started (pid=1)
[2021/07/12 16:25:38] [ info] [storage] version=1.0.6, initializing...
[2021/07/12 16:25:38] [ info] [storage] in-memory
[2021/07/12 16:25:38] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/07/12 16:25:38] [ info] [input:systemd:systemd.1] seek_cursor=s=657e7711b1764c8bbb38b81ee2c7349b;i=82f... OK
[2021/07/12 16:25:38] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2021/07/12 16:25:38] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2021/07/12 16:25:38] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2021/07/12 16:25:38] [ info] [filter:kubernetes:kubernetes.0] API server connectivity OK
[2021/07/12 16:25:38] [ info] [output:syslog:syslog.0] setup done for 10.156.134.90:514
[2021/07/12 16:25:38] [ info] [output:syslog:syslog.1] setup done for 10.156.134.90:514
[2021/07/12 16:25:38] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2021/07/12 16:25:38] [ info] [sp] stream processor started
[2021/07/12 16:25:39] [ info] [input:tail:tail.0] inotify_fs_add(): inode=793191 watch_fd=1 name=/var/log/containers/antrea-agent-vckf8_kube-system_install-cni-9d3c3ccd3ec44477b29ec0c8481e6f51c2eba2493cd9cebf6b90e7e2e67dbbc5.log
Day 2 Ops - Config changes
Fluentbit configuration can be updated in the fluent-bit-data-values.yaml and re-apply the updated config file.
Get current config values from fluent bit secret object
# kubectl get secret fluent-bit-data-values -n tanzu-system-logging -o 'go-template={{ index .data "values.yaml" }}' | base64 -d > fluent-bit-data-values.yaml
Check for the file has been created from the
# ls fluent-bit-data-values.yaml
fluent-bit-data-values.yaml
Update configuration in fluent-bit-data-values.yaml
# vi fluent-bit-data-values.yaml
#@data/values
#@overlay/match-child-defaults missing_ok=True
---
logging:
image:
repository: projects.registry.vmware.com/tkg
tkg:
instance_name: "prasad-clu-01"
cluster_name: "prasad-clu-01"
fluent_bit:
output_plugin: "syslog"
syslog:
host: "10.156.134.90"
port: "514"
mode: "tcp"
format: "rfc5424"
For more detailed info on the config values please refer to VMWare Official documents.
Update/recreate FluentBit secret object
# kubectl create secret generic fluent-bit-data-values --from-file=values.yaml=fluent-bit-data-values.yaml -n tanzu-system-logging -o yaml --dry-run | kubectl replace -f-
Check the status of the FluentBit extension
# kubectl get app fluent-bit -n tanzu-system-logging
NAME DESCRIPTION SINCE-DEPLOY AGE
fluent-bit Reconcile succeeded 32s 66m
Detailed status and troubleshoot
# kubectl get app fluent-bit -n tanzu-system-logging -o yaml
Fluent Bit v1.6.9
apiVersion: kappctrl.k14s.io/v1alpha1
kind: App
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"kappctrl.k14s.io/v1alpha1","kind":"App","metadata":{"annotations":{"tmc.cloud.vmware.com/managed":"false"},"name":"fluent-bit","namespace":"tanzu-system-logging"},"spec":{"deploy":[{"kapp":{"rawOptions":["--wait-timeout=5m"]}}],"fetch":[{"image":{"url":"projects.registry.vmware.com/tkg/tkg-extensions-templates:v1.3.1_vmware.1"}}],"serviceAccountName":"fluent-bit-extension-sa","syncPeriod":"5m","template":[{"ytt":{"inline":{"pathsFrom":[{"secretRef":{"name":"fluent-bit-data-values"}}]},"paths":["tkg-extensions/common","tkg-extensions/logging/fluent-bit"]}}]}}
tmc.cloud.vmware.com/managed: "false"
creationTimestamp: "2021-07-12T16:25:29Z"
finalizers:
- finalizers.kapp-ctrl.k14s.io/delete
….
Day 2 Ops - Upgrade Fluent Bit
Like other immutable patterns, to upgrade the Fluentbit, you need to delete the current version of FluentBit resources and deploy the new version. Config values (input & output connection details) are independent from Fluentbit resources, hence you can re-use the current config values.
- Extract current config values from clusters
- Delete Fluentbit from the Cluster
- Download new version from the Tanzu Extensions package
- Deploy Fluentbit extension by re-using the config values file.
Day 2 Ops - Delete Fluentbit
Fluent bit app can be deleted from cluster in 2 steps (1) Delete app (2) Delete namespace & role.
Delete Fluentbit App
# kubectl delete app fluent-bit -n tanzu-system-logging
app.kappctrl.k14s.io "fluent-bit" deleted
Delete NameSpace-role
# kubectl delete -f namespace-role.yaml
namespace "tanzu-system-logging" deleted
serviceaccount "fluent-bit-extension-sa" deleted
role.rbac.authorization.k8s.io "fluent-bit-extension-role" deleted
rolebinding.rbac.authorization.k8s.io "fluent-bit-extension-rolebinding" deleted
clusterrole.rbac.authorization.k8s.io "fluent-bit-extension-cluster-role" deleted
clusterrolebinding.rbac.authorization.k8s.io "fluent-bit-extension-cluster-rolebinding" deleted
Prometheus Metric Server
Day 1 Ops – Troubleshooting
Ensure the Prometheus app in the Reconcile Success. For failure to Reconcile cold be a issue with the YAML file syntax, API mismatches or other resource issues.
To troubleshoot Prometheus, extract the pods running for Prometheus and verify the log messages from those pods.
Fetching Prometheus PODS (both Prometheus & alertmanager)
# kubectl get pods -n tanzu-system-monitoring
NAME READY STATUS RESTARTS AGE
prometheus-alertmanager-5c49dfb98c-2jqfn 2/2 Running 0 13h
prometheus-cadvisor-g6vbg 1/1 Running 0 13h
prometheus-cadvisor-ngtsg 1/1 Running 0 13h
prometheus-cadvisor-pfwlx 1/1 Running 0 13h
prometheus-kube-state-metrics-6f44c86df6-d5mql 1/1 Running 0 13h
prometheus-node-exporter-l2fd7 1/1 Running 0 13h
prometheus-node-exporter-mqsrr 1/1 Running 0 13h
prometheus-node-exporter-zlc4f 1/1 Running 0 13h
prometheus-pushgateway-6d5f49cbcb-wf8mq 1/1 Running 0 13h
prometheus-server-8cc9dc559-6cxjh 2/2 Running 0 13h
Validate Log output from “prometheus-alertmanager” containers running in one of the Prometheus POD listed from previous statement
# kubectl logs pod/prometheus-alertmanager-5c49dfb98c-2jqfn -c prometheus-alertmanager -n tanzu-system-monitoring
level=info ts=2021-07-12T21:30:55.076Z caller=main.go:231 msg="Starting Alertmanager" version="(version=, branch=, revision=)"
level=info ts=2021-07-12T21:30:55.076Z caller=main.go:232 build_context="(go=go1.13.15, user=, date=)"
level=info ts=2021-07-12T21:30:55.109Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/config/alertmanager.yml
level=info ts=2021-07-12T21:30:55.109Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/config/alertmanager.yml
level=info ts=2021-07-12T21:30:55.113Z caller=main.go:497 msg=Listening address=:9093
Verify log output for “prometheus-server” containers running in one of the Prometheus POD listed from pods listed earlier
# kubectl logs pod/prometheus-server-8cc9dc559-6cxjh -c prometheus-server -n tanzu-system-monitoring
level=info ts=2021-07-12T21:30:54.978Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.18.1, branch=non-git, revision=non-git)"
level=info ts=2021-07-12T21:30:54.979Z caller=main.go:338 build_context="(go=go1.14.8, user=root@781f2e89c308, date=20200907-23:58:33)"
level=info ts=2021-07-12T21:30:54.979Z caller=main.go:339 host_details="(Linux 4.19.190-1.ph3-esx #1-photon SMP Thu May 20 06:33:45 UTC 2021 x86_64 prometheus-server-8cc9dc559-6cxjh (none))"
level=info ts=2021-07-12T21:30:54.979Z caller=main.go:340 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2021-07-12T21:30:54.979Z caller=main.go:341 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2021-07-12T21:30:54.981Z caller=web.go:523 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2021-07-12T21:30:54.981Z caller=main.go:678 msg="Starting TSDB ..."
level=info ts=2021-07-12T21:30:54.989Z caller=head.go:575 component=tsdb msg="Replaying WAL, this may take awhile"
level=info ts=2021-07-12T21:30:54.989Z caller=head.go:624 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2021-07-12T21:30:54.989Z caller=head.go:627 component=tsdb msg="WAL replay completed" duration=242.481µs
level=info ts=2021-07-12T21:30:54.990Z caller=main.go:694 fs_type=EXT4_SUPER_MAGIC
level=info ts=2021-07-12T21:30:54.990Z caller=main.go:695 msg="TSDB started"
level=info ts=2021-07-12T21:30:54.990Z caller=main.go:799 msg="Loading configuration file" filename=/etc/config/prometheus.yml
level=info ts=2021-07-12T21:30:54.992Z caller=kubernetes.go:253 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
In case of app Reconcile failure, verify the YAML syntax in prometheus-data-values.yaml . Updae the YAML file and re-apply secret & app YAML files
#. kubectl create secret generic prometheus-data-values --from-file=values.yaml=prometheus-data-values.yaml -n tanzu-system-monitoring
# kubectl apply -f prometheus-extension.yaml
Day 2 Ops – Update Prometheus Configuration
Update the configuration for a Prometheus extension that is deployed to a Tanzu Kubernetes cluster.
Update the YAML file and re-apply secret & app YAML files
# cd ./tkg-extensions-v1.3.1/extensions/monitoring/prometheus/
# vi prometheus-data-values.yaml
#kubectl create secret generic prometheus-data-values --from-file=values.yaml=prometheus-data-values.yaml -n tanzu-system-monitoring
# kubectl apply -f prometheus-extension.yaml
Ref: Supported Prometheus Configuration parameters can be found at VMware official Documents
Note: By default, the kapp-controller will sync apps every 5 minutes. The update should take effect in 5 minutes or less. If you want the update to take effect immediately, change the sync period in Prometheus-extension.yaml to a lesser value and apply the Prometheus extension using kubectl apply -f prometheus-extension.yaml.
Check for app status
# kubectl get app prometheus -n tanzu-system-monitoring -o yaml
Day 2 Ops - Delete Prometheus
Prometheus and Grafana sharing the same namespace “tanzu-system-monitoring”, hence one should delete Grafana resources (if it is installed already) before deleting the common namespace. Prometheus setup deletion invoices 2 steps (1) Delete App (2) Delete
Delete Prometheus App
# kubectl delete app prometheus -n tanzu-system-monitoring
app.kappctrl.k14s.io "prometheus" deleted
Delete namespaces and roles
# kubectl delete -f namespace-role.yaml
Delete the secret object
# kubectl delete secret prometheus-data-values -n tanzu-system-monitoring
Grafana
Day 1 Ops – Troubleshooting
Check for grafana app deployment status
# kubectl get app grafana -n tanzu-system-monitoring
NAME DESCRIPTION SINCE-DEPLOY AGE
grafana Reconcile succeeded 113s 63m
If the app status is Reconcile Failed., Verify grafana data values file and do the necessary changes and redeploy
Access Grafana pod logs
# kubectl get pods -n tanzu-system-monitoring -l "app.kubernetes.io/name=grafana"
NAME READY STATUS RESTARTS AGE
grafana-5b575c6cc9-r7mb9 2/2 Running 0 72m
# kubectl logs pod/grafana-5b575c6cc9-r7mb9 -c grafana -n tanzu-system-monitoring
t=2021-07-13T17:13:19+0000 lvl=info msg="Starting Grafana" logger=server version=7.3.5 commit=unknown-dev branch=master compiled=2021-04-14T17:36:56+0000
t=2021-07-13T17:13:19+0000 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/conf/defaults.ini
t=2021-07-13T17:13:19+0000 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana.ini
t=2021-07-13T17:13:19+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.data=/var/lib/grafana"
t=2021-07-13T17:13:19+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.logs=/var/log/grafana"
t=2021-07-13T17:13:19+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.plugins=/var/lib/grafana/plugins"
the issue with access Grafana web interface
Verify Grafana FQDN
# kubectl get httpproxy -n tanzu-system-monitoring -l app=grafana
NAME FQDN TLS SECRET STATUS STATUS DESCRIPTION
grafana-httpproxy grafana.system.tanzu grafana-tls valid Valid HTTPProxy
Get ENVOY EXTERNAL_IP value
# kubectl get -n tanzu-system-ingress service envoy -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
envoy LoadBalancer 10.103.217.168 10.198.53.141 80:31673/TCP,443:32227/TCP 6d21h app=envoy,kapp.k14s.io/app=1625607181111946240
Create a Host entry on CLI-VM or a DNS name on the mapped DNS server
# echo "10.198.53.141 grafana.system.tanzu" | sudo tee -a /etc/hosts
Ensure network connectivity for LB IP, by ping Envoy External_IP
# ping -c 3 10.198.53.141
Username and password Is not working while very first access to grafana
- Validate grafana-data-values.yaml
- Make sure base64 encoding values are accurate for monitoring.grafana.secret.admin_user & monitoring.grafana.secret.admin_password
Ex :
admin_user: "YWRtaW4="
admin_password: "YWRtaW4="
Day 2 Ops – Update / change Grafana configuration
Ger the current grana data values
# kubectl get secret grafana-data-values -n tanzu-system-monitoring -o 'go-template={{ index .data "values.yaml" }}' | base64 -d > grafana-data-values.yaml
Update Grafana data values file
Vi grafana-extension.yaml
Re-create the Grafana secret object (with Grafana data values)
# kubectl apply -f grafana-extension.yaml
app.kappctrl.k14s.io/grafana configured
Day 2 Ops – Delete Grafana
Delete Secret
# kubectl delete secret grafana-data-values -n tanzu-system-monitoring
secret "grafana-data-values" deleted
Delete App
# kubectl delete app grafana -n tanzu-system-monitoring
app.kappctrl.k14s.io "grafana" deleted
Note: Both Grafana and Prometheus share the same common namespace, hence deleting tanzu-system-monitoring will delete both Grafana & Prometheus.
Monitoring
Monitor Namespaces, and K8s Objects resource utilization (vCenter)
Resource monitoring is an important aspect of managing a Tanzu environment. As part of the integration, monitoring namespaces and Kubernetes objects resource utilization is possible through vCenter.
At the cluster level, it is possible to monitor the different namespaces that exist within the vCenter. The overview pane provides a high-level view of the health, Kubernetes version and status, as well as the Control Plane IP and node health.
Navigate to Cluster>Monitor>Namespaces>Overview
Under the compute tab for the namespace, the resources for Tanzu Kubernetes as well as Virtual Machines display key information about the environment such as version, IP address, phase, etc.
For the Tanzu Kubernetes Clusters, the monitor tab also provides specific insights to the particular TKG Cluster. Information such as performance overview, tasks and evets, as well as resource allocation helps the admin understand the state and performance of the Tanzu Kubernetes Cluster.
Deploy Octant (optional)
Octant is a highly extensible Kubernetes management tool that, amongst many other features, allows for a graphical view of the Kubernetes environment. This is useful in a PoC environment to see the relationship between the different components. See https://github.com/vmware-tanzu/octant for more details.
If the TKG Demo Appliance is being used, Octant is already installed. Otherwise, download and install Octant, as described in the Octant getting started page:
https://reference.octant.dev/?path=/docs/docs-intro--page#getting-started
Launch Octant simply by the command ‘Octant’:
# octant &
Open an SSH tunnel port 7777 of the jump host –
For instance, from a Mac terminal:
$ ssh -L 7777:127.0.0.1:7777 -N -f -l root <jump host IP>
Or Windows, using putty — navigate to Connection > SSH > Tunnels on the left panel. Enter ‘7777’ for the source port and ‘127.0.0.1:7777’ as the destination. Then click on ‘add’ and open a session to the jump host VM:
Thus, if we open a browser to http://127.0.0.1:777 (note http not https) we can see the Octant console: