VMware Enterprise PKS on VMware vSAN
Executive Summary
This section covers the business case, solution overview, and key results of VMware Enterprise PKS on VMware vSAN for production.
Business Case
Digital transformation is driving a new application development and deployment approach called cloud-native. Cloud-native applications empower your developers by providing the resources and environments they need to innovate and deliver applications faster on-premises or in the cloud. Leverage containers, cloud resources, and automation without compromising security and compliance.
VMware® Enterprise PKS enables enterprises to deploy and consume container services with production-grade Kubernetes orchestration. It provides a comprehensive solution for enterprises developing and deploying cloud-native applications.
VMware vSAN™ supports hybrid and all-flash deployment modes. With vSAN all-flash deployments, users can benefit from high performance with less time spent managing and tuning performance, as well as cost savings. The space saving features of vSAN such as deduplication and compression can compound these savings further. The Storage Policy Based Management (SPBM) provided with vSAN gives users flexibility to define policies on demand and ease the management of container storage. Data services such as snapshots, cloning, encryption, deduplication, compression are available at container-volume level granularity. A container’s volume is backed by a VMDK on vSAN, which is associated with a storage policy. This gives developers the flexibility to manage container storage in a much finer granularity.
vSAN, as a shared storage system for vSphere, provides a uniform management plane for container storage. The deep integration between VMware Enterprise PKS and vSAN means that developers can consume storage as code with freedom by abstracting the complexity of the underlying storage infrastructure. vSAN’s SPBM offers users flexibility to define policies on demand in VMware vCenter® and delivers ease of management of storage for containers.
With Project Hatchway/Cloud Native Storage and vSAN services, cloud-native applications benefit from hyperconverged storage and compute as well as seamless application failover and rapid recovery. Advanced storage features, such as deduplication, compression and high availability, are all part of the core solution.
Solution Overview
This solution is a showcase of using VMware vSAN as a platform for deploying VMware Enterprise PKS in a vSphere environment. All storage management moves into a single software stack, thus taking advantage of the security, operational simplicity, and cost-effectiveness of vSAN in production environments. Workloads can be easily migrated from bare-metal configurations to a modern, dynamic, and consolidated HCI solution based on vSAN. vSAN is natively integrated with vSphere, and helps to provide smarter solutions to reduce the design and operational burden of a data center.
This solution describes an architecture of single Availability Zone (AZ) in Kubernetes (K8s). We plan to update this solution to reflect the Multi-AZ architecture of K8s.
Key Results
The reference architecture:
- Provides the solution architecture for deploying VMware Enterprise PKS with Single-AZ mode in a vSAN cluster for production.
- Uses the NSX-T network.
- Measures performance when running VMware Enterprise PKS in a vSAN cluster, to the extent of the testing and cluster size described.
- Measures performance of the workloads running atop of VMware Enterprise PKS in a vSAN cluster.
- Evaluates the impact of different parameter settings in the performance testing.
- Performs operation testing and validates the scalability of running VMware Enterprise PKS in vSAN.
- Identifies steps required to ensure resiliency and availability against various failures.
- Provides best practice guides.
Introduction
This section provides the purpose, scope, and audience of this document.
Purpose
This solution illustrates how VMware Enterprise PKS can be run in a vSAN environment and provides testing results based on parameter variations running various workloads.
Scope
The reference architecture covers the following testing scenarios:
- Deploy Kubernetes clusters using VMware Enterprise PKS management plane
- Scale out Kubernetes clusters using VMware Enterprise PKS management plane
- MongoDB performance running in the Kubernetes cluster deployed by VMware Enterprise PKS
- MySQL performance running in the Kubernetes cluster deployed by VMware Enterprise PKS
- Operation testing
- Scalability testing
- Resiliency and availability testing
Audience
This paper is intended for cloud-native application administrators and storage architects involved in planning, designing, or administering of VMware Enterprise PKS on vSAN.
Technology Overview
This section provides an overview of the technologies used in this solution: • VMware vSphere® 6.5 Update 2 • VMware vSAN 6.6.1 Update 2 • VMware Enterprise PKS 1.2 • VMware NSX-T™ Data Center 2.2 • MongoDB 3.6 • MySQL 5.7
VMware vSphere 6.5 Update 2
VMware vSphere 6.5 is the infrastructure for next-generation applications. It provides a powerful, flexible, and secure foundation for business agility that accelerates the digital transformation to cloud computing and promotes success in the digital economy.
vSphere 6.5 supports both existing and next-generation applications through its:
- Simplified customer experience for automation and management at scale
- Comprehensive built-in security for protecting data, infrastructure, and access
- Universal application platform for running any application anywhere
With vSphere 6.5, customers can run, manage, connect, and secure their applications in a common operating environment, across clouds and devices.
VMware vSAN 6.6.1 Update 2
VMware vSAN, the market leader in Hyperconverged Infrastructure (HCI), enables low-cost and high-performance next-generation HCI solutions, converges traditional IT infrastructure silos onto industry-standard servers and virtualizes physical infrastructure to help customers easily evolve their infrastructure without risk, improve TCO over traditional resource silos, and scale to tomorrow with support for new hardware, applications, and cloud strategies. The natively integrated VMware infrastructure combines radically simple VMware vSAN storage, the market-leading VMware vSphere Hypervisor, and the VMware vCenter Server® unified management solution all on the broadest and deepest set of HCI deployment options.
See VMware vSAN documentation for more information.
VMware NSX-T Data Center 2.2
NSX-T Data Center is focused on emerging application frameworks and architectures that have heterogeneous endpoints and technology stacks. In addition to vSphere hypervisors, these environments include other hypervisors such as KVM, containers, and bare metal.
NSX-T Data Center is designed for management, operations, and consumption by development organizations. NSX-T Data Center allows IT and development teams to choose the technologies best suited for their applications.
You can read the NSX-T Data Center document for more information.
VMware Enterprise PKS 1.2
VMware Enterprise PKS is a purpose-built container solution to operationalize Kubernetes for multicloud enterprises and service providers. It significantly simplifies the deployment and management of Kubernetes clusters with day 1 and day 2 operations support. With hardened production-grade capabilities, VMware Enterprise PKS takes care of your container deployments from the application layer all the way to the infrastructure layer.
VMware Enterprise PKS is built in with critical production capabilities such as high availability, autoscaling, health-checks, as well as self-healing and rolling upgrades for Kubernetes clusters. With constant compatibility to GKE, VMware Enterprise PKS provides the latest stable Kubernetes release so developers have the latest features and tools available to them. It also integrates with VMware NSX-T for advanced container networking including micro-segmentation, ingress controller, load balancing and security policy. Through an integrated private registry, VMware Enterprise PKS secures container image via vulnerability scanning, image signing and auditing.
VMware Enterprise PKS exposes Kubernetes in its native form without adding any layers of abstraction or proprietary extensions, which lets developers use the native Kubernetes CLI that they are most familiar with. VMware Enterprise PKS can be easily deployed and operationalized via Pivotal Operations Manager, which allows a common operating model to deploy VMware Enterprise PKS across multiple IaaS abstractions like vSphere, Google Cloud Platform (GCP), and Amazon Web Services (AWS) EC2.
MongoDB 3.6
MongoDB is a document-oriented database. The data structure is composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays, and arrays of documents.
The advantages of using documents are:
- Documents (for example, objects) correspond to native data types in many programming languages.
- Embedded documents and arrays reduce the need for expensive joins.
- Dynamic schema supports fluent polymorphism.
For more key features of MongoDB, refer to Introduction to MongoDB.
MySQL 5.7
MySQL is the most popular open source database system which enables the cost-effective delivery of reliable, high-performance and scalable web-based and embedded database applications. It is an integrated transaction safe, ACID-compliant database with full commit, rollback, crash recovery, and row-level locking capabilities. MySQL delivers the ease of use, scalability, and high performance, as well as a full suite of database drivers and visual tools to help developers and DBAs build and manage their business-critical MySQL applications.
Configuration
This section introduces the resources and configurations:
• Solution architecture
• Network architecture
• Hardware resources
• Software resources
• VM configurations
• Test tool and workload
Solution Architecture
We used a 4-node vSphere and vSAN cluster, and deployed the necessary virtual machines on it. The cluster architecture is depicted in Figure 1.
Figure 1. High-Level Architecture of VMware Enterprise PKS on vSAN Reference Architecture
The virtual machines are divided into four groups:
- Infrastructure VMs:
- vCenter: Manages this vSphere and vSAN cluster.
- DNS Server: Acts as a DNS forwarder as well as contains the local DNS entries for NSX Manager, Pivotal Operations Manager, and Harbor.
- VMware Enterprise PKS Client: Runs the ‘pkscli’ and ‘kubectl’ binaries for managing the VMware Enterprise PKS environment.
- NSX-T VMs: The NSX-T virtual machines contain NSX-T Manger, NSX-T Edges, and NSX-T Controllers. These virtual machines are deployed based on the NSX-T installation guide: NSX-T documentation.
- VMware Enterprise PKS VMs: The VMware Enterprise PKS management virtual machines are deployed based on the VMware Enterprise PKS installation guide: VMware Enterprise PKS documentation.
- Deployed Kubernetes Clusters:
These Kubernetes clusters are deployed by VMware Enterprise PKS. The virtual machines are grouped by each Kubernetes cluster. Each Kubernetes cluster contains at least one virtual machine acting as the primary node and multiple virtual machines acting as the worker nodes. We can deploy more than one Kubernetes clusters based on the vSAN cluster’s physical resources.
Network Architecture
Physical network connection
On each physical server, four Network Interface Card (NIC) ports are used. Figure 2 shows the NIC and physical switch connection architecture. NIC port 1 and NIC port 2 are configured as the uplinks for a vSphere Distributed Switch (vDS).
- Four port groups: Management, VMware vSphere vMotion® VMKernel network, Edge VTEP, Edge Uplink use NIC 1 as active uplink and NIC 2 as standby uplink.
- vSAN VMKernel network uses NIC 2 as active uplink and NIC 1 as standby uplink.
- NIC 3 is used for NSX-T host overlay transport zone.
- NIC 4 is used for NSX-T host overlay transport zone’s standby NIC.
MTU settings
NSX-T requires MTU to be at least 1600 for the physical switches and NICs.
vSAN supports MTU 1500 and MTU 9000 and the performance tests show that larger MTU settings can help vSAN improve throughput.
Based on the above requirements, we set the MTU to 9000 in the whole environment to reduce the physical network management complexity and pursue a higher performance.
For high availability purpose, NSX-T recommends deploying more than 1 NSX-T controllers to form an NSX-T controller cluster. It is also recommended deploying more than 1 NSX-T edge nodes for from an NSX-T edge cluster. Meanwhile, for VMware Enterprise PKS to automatically deploy virtual machines to a vSphere resource pool, DRS is enabled in the cluster. So we should use DRS anti-affinity rule to separate the NSX-T controller virtual machines and NSX-T edge virtual machines to different physical hosts to achieve high availability in case of a host failure.
The NSX-T virtual machines have a low requirement for storage I/O. We cloned the vSAN’s default storage policy and modify the ‘Fault Tolerance Method’ to ‘Erasure Coding (RAID 5/RAID 6)’ and applied this policy to NSX-T virtual machines to save the storage space.
Figure 2. NIC Ports and Switch Physical Connection Architecture
Hardware Resources
Table 1 shows the hardware resources used in this solution.
We used a vSAN cluster with 4 physical servers. Each ESXi Server in the vSAN cluster has the following configuration.
Table 1. Hardware Resources per ESXi Server
PROPERTY |
SPECIFICATION |
Server |
Quanta Cloud Technology D52B-1U |
CPU and cores |
2 sockets, 20 cores each of 2.0GHz with hyper-threading enabled |
RAM |
256GB |
Network adapter |
4 x 10Gb NIC ports |
Disks |
Cache-layer SSD: 1 x 375 GB Intel Optane SSD DC P4800X NVMe (controller included) Capacity-layer SSD: 3 x 2000GB Intel Optane SSD DC P4500 NVMe (controller included) |
In our various vSAN reference architectures, we recommended vSAN cluster with at least 4 physical hosts. If there are only 3 hosts in a vSAN cluster, there will be no rebuild if a host fails. Therefore, some data may be in risk during the repairing windows. With vSAN cluster of 4 hosts, degraded data due to host failure can be rebuilt immediately as long as there is enough free space.
Tailored for a hyperconverged solution, QuantaGrid D52B-1U features ultimate compute and storage density in a 1U platform. Quanta Cloud Technology’s (QCT) well-designed second-generation Purley Server Platform and marketing -leading virtualization software developed by VMware, delivering a reliable and confident choice for customers. With the careful validation process on vSAN ReadyNode by QCT and VMware on QuantaGrid D52B-1U, customers can rest assured of the solution reliability and focus on strategic and productive tasks.
QCT powerful server provides ultimate compute and storage density, flexible and scalable I/O options, and solid reliability making it excellent for diversified cloud-native applications workloads. The compute capability with IntelXeon Gold 6138 CPUs empower VMware vSAN by supporting a wide range of critical workloads. Enterprises that value performance can benefit from the cache layer using the Intel SSD Data Center Family with NVMe. By adopting Optane SSD DC P4800X as the cache layer in vSAN, we can deliver an extremely high-performance and reduce the transaction cost while running the write-intensive workload. Therefore, QuantaGrid D52B-1U is an optimum option for the next-generation workload.
Visit the QCT website for more details of Quanta Cloud Technology and Quanta hardware platform.
Software Resources
Table 2 shows the software resources used in this solution.
Table 2. Software Resources
Software |
Version |
Purpose |
VMware vCenter Server and ESXi |
6.5 Update 2 (vSAN 6.6.1 Update 2 is included) |
ESXi Cluster to host virtual machines and provide vSAN Cluster. VMware vCenter Server provides a centralized platform for managing VMware vSphere environments. |
VMware vSAN |
6.6.1 Update 2 |
Solution for HCI. |
Ubuntu |
16.04 |
Ubuntu 16.04 is used as the guest operating system of the DNS server, testing clients, etc. |
NSX-T Data Center |
2.2 |
NSX-T 2.2 |
Pivotal Operations Manager |
2.2 |
Pivotal Operations Manager is the central place for deploying and managing Pivotal products such as VMware Enterprise PKS. |
VMware Enterprise PKS |
1.2 |
VMware Enterprise PKS 1.2 |
Harbor |
1.4.1 |
Harbor is used for deploying a private enterprise-class container image repository. |
VM Configurations
We used the virtual machine settings as the base configuration as shown in Table 3 and Table 4. The virtual machines are grouped by different categories: NSX-T virtual machines, VMware Enterprise PKS management virtual machines and VMware Enterprise PKS deployed virtual machines.
For vm sizing, the rule is that the aggregated CPU cores and memory should not exceed the physical resources to avoid contention. When calculating physical resources, we should count the physical cores before hyper-threading is taken into consideration.
Table 3. NSX-T Virtual Machines Configuration
PROPERTY |
vCPU |
MEMORY(GB) |
STORAGE (GB) |
NSX-T Manager |
8 |
32 |
140 |
NSX-T Controller |
4 |
16 |
120 |
NSX-T Edge |
8 |
16 |
120 |
The configuration of NSX-T Controller is for one instance of controller. We should properly size for the controller cluster since a controller cluster is recommended to have at least 3 controllers. The configuration of NSX-T Edge in the table is also for one edge instance.
Note: For NSX-T edge, we used the ‘large’ profile when deploying edge virtual machines. ‘Large’ edge instances are required for VMware Enterprise PKS deployment with NSX-T integration.
Table 4 shows the configuration of the VMware Enterprise PKS management plane virtual machines.
Table 4. VMware Enterprise PKS Management Plane Virtual Machines Configuration
PROPERTY |
vCPU |
MEMORY(GB) |
STORAGE (GB) |
Pivotal Operations Manager |
1 |
8 |
160 |
Bosh director |
4 |
16 |
3+100+100 |
VMware Enterprise PKS controller |
2 |
8 |
3+16+100 |
Harbor |
2 |
8 |
3+64+1000 |
We used 1,000GB for Harbor’s data disk. This data disk size could be set during the Harbor deployment configuration page in Pivotal Operations Manager. The data disk size is calculated by the stored number of images and size of images. In vSAN, a 1,000GB VMDK is striped into at least 4 components because the maximum size of a component in vSAN is 255GB. If the disk size of Harbor’s data is smaller than 255GB, consider increase the stripe width of the VMDK to improve read and write performance of Harbor by changing its vSAN storage policy.
Test Tool and Workload
MongoDB and YCSB
For MongoDB, we used the YCSB tool to evaluate the performance. YCSB is a popular Java open-source specification, and program suite developed at Yahoo to compare the relative performance of various NoSQL databases. Its workloads are used in various comparative studies of NoSQL databases.
We used YCSB workload A as summarized:
- Workload A (Update heavy workload): 50/50% mix of reads/writes
Some key configuration parameters are in Table 5.
Table 5. YCSB Parameter Settings Used in this Solution Testing
PROPERTY |
SPECIFICATION |
Record count |
100,000,000 |
Operation count |
50,000,000 |
Threads on each client |
16 |
Write acknowledgement |
Majority |
MySQL and SysBench
In this solution, we used SysBench to measure the performance of MySQL cluster.
SysBench is a modular, cross platform and multithreaded benchmark tool for evaluating OS parameters that are important for a system running a database under intensive load. The OLTP test mode is used in solution validation to benchmark a real MySQL database performance.
In this solution validation, we used the SysBench OLTP library to populate MySQL test database, and generate workload for performance and protection testing.
We used a containerized SysBench client described in docker hub. We needed to modify the parameters and hostnames in the yaml files to match our own testbed. Table 6 lists some key configuration parameters.
Table 6. SysBench Parameter Settings
PROPERTY |
SPECIFICATION |
Oltp_table_size |
10,000,000 |
Oltp_tables_count |
8 |
Threads on each client |
8 |
Time |
3,600 |
Solution Validation
In this section, we present the test methodologies and results to validate this solution.
Overview
We conducted an extensive validation to demonstrate vSAN as a persistent storage layer for VMware Enterprise PKS.
The solution goal was to prove the functionality, flexibility, and performance of vSAN when used as the persistent storage layer for VMware Enterprise PKS. We tested the rapidity of spawning a bunch of Kubernetes clusters in parallel. We also tested performance of the traditional database MySQL and a next-generation NoSQL database MongoDB.
This solution included the following tests:
- Performance testing of Kubernetes cluster operations: This set of tests validated the performance of Kubernetes cluster operations by using VMware Enterprise PKS, such as cluster creation and scaling out.
- Performance testing of MongoDB: This set of tests validated the basic performance of running MongoDB as pods in a Kubernetes cluster deployed by VMware Enterprise PKS. We also tested the scaling out performance of MongoDB by increasing the number of MongoDB clusters from 1 to 2 to 4.
- Performance testing of MySQL: This set of tests validated the basic performance of running MySQL as pods in a Kubernetes cluster deployed by VMware Enterprise PKS. We also tested the scaling out performance of MySQL by increasing the number of MySQL clusters from 1 to 2 to 4.
- Resiliency and availability testing: From different layers, we tested different failure types. The failure types included host failure, virtual machine failure, vSAN disk failure, and Kubernetes pod failure.
Result Collection and Aggregation
- The test results are aggregated because each client instance produces its own output:
- For MongoDB, the total throughput ops/sec is the sum-up of ops/sec in all clients and latencies are the average values of all client instances.
- For MySQL, TPS (transactions per second) is the sum-up of ops/sec in all clients and latencies are the average values of all client instances.
- If there are any abnormal results, we use vSAN Performance Service to monitor each level of performance in a top-down approach of the vSAN stack.
Kubernetes Workloads Deployment and vSAN Interaction
In this section, we take deploying MongoDB workloads as an example to illustrate ‘storage class creation’, ‘persistent volume claim creation’ and other automations when running VMware Enterprise PKS with vSAN.
We used storage policy and persistent volume claim to demonstrate the easy management, automation and integration of vSAN and VMware Enterprise PKS. For use of other operations such as persistent volume creation and manual binding, refer to VMware Enterprise PKS and Kubernetes user guide.
Deploy Kubernetes Clusters using VMware Enterprise PKS
After we successfully installed VMware Enterprise PKS and VMware Enterprise PKS command line tools, we could deploy Kubernetes clusters for successive use. We managed (create, delete, resize) the Kubernetes clusters by using the ‘pks’ command line tool.
1. Firstly, we used the following command to create a Kubernetes cluster named “k8s-cluster-1”:
$ pks create-cluster k8s-cluster-1 -e k8s-cluster-1.vsphere.local -p 2xlarge -n 3
Figure 3. Creating a Kubernetes Cluster by Using the ‘pks’ Command Line Tool
2. As depicted in the screen shot, we could use the ‘pks cluster k8s-cluster-1’ command to monitor the creating status of the cluster.
The ‘Last Action State’ was “in progress” when the cluster was not ready as shown in Figure 4. After a while, the cluster was successfully created and the ‘Last Action State’ was shown as ‘succeeded’.
Figure 4. Monitoring the Creating Cluster Status
3. From Figure 4, we could see that the IP address ‘192.168.130.9’ was allocated as the Kubernetes primary IP.
We used ‘k8s-cluster-1.vsphere.local’ as the custom domain name for this cluster. So after the Kubernetes primary IP was allocated, we could modify our own corporate DNS server to map ‘k8s-cluster-1.vsphere.local’ to ‘192.168.130.9’. In the NSX-T management UI, we could view the corresponding load balancer, routers, and switches created by NSX-T. They were ending with the UUID of the cluster as shown in Figure 5, Figure 6, and Figure 7.
Figure 6. The Routers Created by NSX-T
Figure 7. The Switches Created by NSX-T
NSX-T created the corresponding components automatically. After creation, we could use the Kubernetes primary IP to point to the Kubernetes cluster and all containers are created later to the switches and routers automatically to gain network access.
Scale out Kubernetes Clusters using VMware Enterprise PKS
After we initially deploy a Kubernetes cluster, we can scale it out if the workloads grow. We used the following command to increase the number of worker nodes to 4. The process was monitor in Figure 8.
$ pks resize k8s-cluster-1 -n 4
Figure 8. Scaling out the Kubernetes Cluster and Monitor the Resizing Status
After the operation was successfully completed, the ‘Last Action’ showed ‘UPDATE’ and the ‘Last Action State’ showed ‘succeeded’. VMware Enterprise PKS just added a new worker VM to the cluster and all the NSX-T components remained the same.
Create Storage Class
Kubernetes supports various vSphere storages such VMFS, NFS, and vSAN. For example, we can define a storage class for creating VMDK with ‘Eager Zeroed Thick’ in an NFS datastore:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
datastore: NFSDatastore
If we intend to use a different datastore, we need to modify the ‘datastore’ section in the storage class definition and add the corresponding parameters such as storage policy in vSAN.
If we use vSAN as the datastore, we can obtain the vSAN advantages such as Deduplication and Compression, Checksum verification, user-defined Stripe Width, more flexible data protection method (RAID 1/RAID 5/RAID 6) and so on. All users need to do is modifying the storage class definition and the rest of the management is automated.
Tag-based placement allows an administrator to assign tags to different datastores and then create a storage policy that can restrict access to datastores based on tag. VMware vSAN supports the full capabilities of SPBM, other non-vSAN vSphere-compatible storage backends only support the tag-based placement capability of SPBM. Using the tag-based placement, a developer can call a SPBM policy from the K8s manifest using the storagePolicyName variable. When the K8s primary schedules the pod for provisioning and requests a volume, the storage policy ensures that the volume can only be provisioned from datastores that match the tags defined in the specified storage policy. If using vSAN, the following parameters can be dynamically tuned for each pod/deployment via the K8s manifest: cacheReservation, diskStripes, forceProvisioning, hostFailuresToTolerate, iopsLimit, objectSpaceReservation.
So, vSAN is a better solution for VMware Enterprise PKS compared to other vSphere storage types.
In this section, we described the procedures of creating a Kubernetes storage class using vSAN datastore in details.
We created the necessary Kubernetes storage class before we deployed the actual stateful sets. This storage class would be used in the stateful sets creation in later steps.
1. Firstly, we defined the storage class yaml file called default-vsan-sc.yaml:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: default-vsan-sc
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: thin
storagePolicyName: "vSAN Default Storage Policy"
datastore: vsanDatastore
The configuration item in bold used “vSAN Default Storage Policy” defined in vCenter. We can check the predefined policies in vCenter.
Figure 9. Predefined vSAN Storage Policies in vCenter
2. We also created a vSAN storage policy called “Raid5-Policy” that used the erasure coding for fault tolerance method. We could also create another Kubernetes storage class by using this “Raid5-Policy”. Then the VMDKs created in Kubernetes can use this vSAN policy. We create another storage class yaml file called raid5-vsan-sc.yaml:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: raid5-vsan-sc
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: thin
storagePolicyName: "Raid5-Policy"
datastore: vsanDatastore
3. We used “kubectl” to create the storage classes and checked their status.
Figure 10. Create Storage Classes and Check the Status
Persistent Volume Claim
After the corresponding Kubernetes storage classes were created, we can use persistent volume claims to create persistent VMDKs in vSAN to be consumed by containers.
For each MongoDB component, we used a Kubernetes stateful set. For example, we used the following yaml file to define MongoDB’s ConfigDB. This file is called mongodb-configdb-service.yml.
apiVersion: v1
kind: Service
metadata:
name: mongodb-configdb-service
labels:
name: mongo-configdb
spec:
ports:
- port: 27017
targetPort: 27017
clusterIP: None
selector:
app: mongo-configdb
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongod-configdb
spec:
serviceName: mongodb-configdb-service
selector:
matchLabels:
app: mongo-configdb
replicas: 1
template:
metadata:
labels:
app: mongo-configdb
tier: configdb
replicaset: ConfigDBRepSet
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: replicaset
operator: In
values:
- ConfigDBRepSet
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 10
containers:
- name: mongod-configdb-container
image: harbor1.vsphere.local/library/mongo:3.6.9
command:
- "numactl"
- "--interleave=all"
- "mongod"
- "--port"
- "27017"
- "--wiredTigerCacheSizeGB"
- "0.25"
- "--bind_ip"
- "0.0.0.0"
- "--configsvr"
- "--replSet"
- "ConfigDBRepSet"
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-configdb-persistent-storage-claim
mountPath: /data/db
volumeClaimTemplates:
- metadata:
name: mongo-configdb-persistent-storage-claim
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: default-vsan-sc
resources:
requests:
storage: 4Gi
We used the following Kubernetes command to create this stateful set:
$ kubectl create -f mongodb-configdb-service.yml
During the execution of this command, a VMDK of 4 GB was automatically created in the vsanDatatore.
Figure 11. VMDK Creation and Attachment with Persistent Volume Claim
This VMDK used the ‘vSAN Default Storage Policy’ because we used the “default-vsan-sc” storage class in the persistent volume claim definition section. In the previous step, we created this “default-vsan-sc” storage class with the ‘vSAN Default Storage Policy’.
VMware Enterprise PKS automatically selected a worker in this Kubernetes cluster and attached this VMDK to the worker VM. In this way, this VMDK created by the persistent volume claim can be consumed by the stateful set.
Scale out the Stateful Set
In the previous step of creating the “ConfigDB” stateful set, we used “replicas=1” in the yaml file as an initial configuration point. In a production environment, it is recommended to use more than one replica to provide redundancy. We can easily scale out the stateful set’s replicas by using this command:
$ kubectl scale --replicas=2 statefulset/mongod-configdb
Figure 12. Status of Persistent Volume Claim after Scaling out the Stateful Set
After scaling the stateful set by running the above command, another persistent volume claim and VMDK of 4 GB were created. It used the same configuration as defined in the yaml file.
This persistent volume claim was automatically created and automatically attached to another worker VM in the Kubernetes cluster. It used the same “vSAN Default Storage Policy” as defined in the storage class. The only command that users need to run is the one-line ‘kubectl’ command shown above.
Use Harbor for Local Image Registry
Harbor is integrated into VMware Enterprise PKS and we can use it for the local image registry. Harbor also resides on the vSAN datastore. In the previous example of the MongoDB’s ConfigDB yaml, we used the following line to indicate that we were using a local harbor registry instead of the Docker.com’s public registry:
image: harbor1.vsphere.local/library/mongo:3.6.9
To achieve this, we need to store images into harbor before we deploy any Kubernetes workloads.
On a Linux client that can access Internet, run the following commands (using MongoDB v3.6.9 as example):
$ docker pull mongo:3.6.9
$ docker tag mongo:3.6.9 harbor1.vsphere.local/library/mongo:3.6.9
$ docker push harbor1.vsphere.local/library/mongo:3.6.9
We pulled the MongoDB image from the docker.com’s public registry and stored it to our local Linux client. Then we tagged it with our custom harbor domain and custom library. Finally, we pushed it to our local harbor registry.
After we pushed the image to local harbor registry, we can define our custom image URL in the stateful set’s yaml configuration file. The Kubernetes clusters can pull the images from the local harbor registry in case they did not have Internet access.
Delete Stateful Set
We used the following commands to delete the ‘ConfigDB’ stateful set:
$ kubectl delete -f mongodb-configdb-service.yml
Figure 13. Deletion of Stateful Set and Persistent Volume Claim
From Figure 13, we can see that after the stateful set was deleted, the corresponding persistent volume claims were reserved, which was the reason that they were called persistent volumes. VMware Enterprise PKS detached them from worker VMs and they were standalone VMDKs in the vSAN datastore.
If we create the stateful set later without changing the configuration, these reserved persistent volume claims will be reused, thus the data are persistent. This process is totally automatic. Users need to operate the ‘kubectl’ command.
If we want to delete the data and start a fresh installation of the MongoDB stateful set, delete the persistent volume claims as shown in the screenshot with the following command:
$ kubectl delete pvc –-all
Delete Kubernetes Clusters Deployed by VMware Enterprise PKS
If the Kubernetes clusters used in previous sections were no longer needed, we could delete the cluster by using the ‘pks’ command line tool.
We used the following command to delete a cluster:
$ pks delete-cluster k8s-cluster-1
Figure 14. Deletion of a Kubernetes Cluster Deployed by VMware Enterprise PKS
During the deletion as shown in Figure 14, the ‘Last Action State’ was shown as ‘in progress’. After the deletion was successfully completed, we used ‘pks cluster k8s-cluster-1’ to query the status of the cluster and the cluster was not found.
The corresponding NSX-T load balancer, routers, and switches were automatically deleted by VMware Enterprise PKS.
Performance Testing
VMware Enterprise PKS Plan Definition
In VMware Enterprise PKS, a plan is used to define the VM’s CPU, memory and disk size of the primary nodes and worker nodes. We could define up to 3 different plans in VMware Enterprise PKS for running different workloads.
In this paper, we defined 3 plans as follows. The primary nodes and worker nodes are with identical size in each plan.
- ‘small’ plan: (cpu: 1, ram: 2GB, disk: 8GB)
- ‘large’ plan: (cpu: 4, ram: 16GB, disk: 32GB)
- ‘2xlarge’ plan: (cpu: 8, ram: 32GB, disk: 64GB)
Kubernetes Cluster Operations
One of the advantages that VMware Enterprise PKS provides is the ability to automatically create and resize the Kubernetes clusters. Creating, deleting or resizing the Kubernetes clusters requires lots of virtual machine cloning operations. Virtual machine cloning operations are mostly intensive ‘write’ operations for the storage layer, especially when there are multiple clusters being created in parallel.
In this section, we validated the costed time of creating and resizing Kubernetes clusters under different conditions.
Note: The ‘Worker VM Max in Flight’ parameter was set to 1 in the VMware Enterprise PKS configuration tab. So the worker VMs were created sequentially and at most one worker VM was created at any time.
Figure 15. Cluster Creation Time with Different Plans and Different Number of Worker Nodes
Figure 15 showed the Kubernetes cluster creations time with different plans and different number of worker nodes. When there was just 1 work node, the time cost lied between 600 to 650 seconds. 3 worker nodes costed more time since more virtual machines needed to be created and more initialization work inside the virtual machines needed to be done.
For different cluster plans, the time cost changed a little bit. It costed less time for larger plans based on the following reasons:
- The cloning time of virtual machines was almost the same as that of VMware Enterprise PKS, you only need to edit the virtual machines settings for virtual CPU and memory after the virtual machines were cloned.
- For virtual machines with larger plans, there were more virtual CPU and memory resources for the Kubernetes initialization work, which can accelerate the creation process.
- Figure 16. Cluster Creation Time with Different Number of Worker Nodes for ‘2xlarge’ Plan
Specifically, we further tested the cluster creation time with different number of worker nodes called the ‘2xlarge’ plan. Figure 16 showed the result. We observed that with the growth of the worker node number, the cluster creation time grew nearly linearly, from 610 seconds to 900 seconds, 1,180 seconds, 1,390 seconds respectively.
Figure 17. Cluster Creation Time for Different Number of Clusters Created in Parallel
Figure 17 showed the cluster creation time for different number of clusters created in parallel. The create time also grew nearly linearly from 900 seconds to 1,450 seconds to 2,050 seconds when 1, 2 and 4 clusters were created in parallel. More clusters creating in parallel means more cloning operations happened at the same time, which costs more time. Meanwhile, the Kubernetes clusters’ initialization led to more CPU and memory competition. Thus, the overall cost time grew.
Figure 18. Cluster Scales out with Worker Nodes Increasing from 1 to 3
Then we tested the time of cluster scaling-out after the initial creation. For comparison, we created 3 Kubernetes clusters using 3 different plans. Initially they all had just 1 worker node. Then we increased the number of worker nodes from 1 to 3. Figure 18 showed the cost time of the scaling-out procedure. For the small plan, it costed around 750 seconds, which were the longest. That was because the small plan used less CPU and memory resources. The ‘2xlarge’ plan costed around 400 seconds, which were the shortest.
Figure 19. Cluster Scales out to Different Number of Worker Nodes
We also tested the costed time of scaling out the cluster to different number of worker nodes Figure 19 showed the result for ‘2xlarge’ plan. Initially we created a cluster with only 1 worker node. Then we increased the number of worker nodes to 3. Furthermore, we increased the number to 5 and to 7 successively. Each time we increased the number of worker nodes by 2 and we measured the cost time. Increasing the number of worker nodes from 1 to 3 only costed 400 seconds while increasing it from 3 to 5 and from 5 to 7 both costed around 700 seconds.
MongoDB Performance in VMware Enterprise PKS
Testing Procedure
We referred to k8smongodb for deploying sharded MongoDB clusters.
We used the ‘pkscli’ command line tool to deploy Kubernetes clusters. Each Kubernetes cluster used for MongoDB was deployed with the ‘2xlarge’ plan described in the ‘PKS Plan’ section. Each Kubernetes cluster contained 1 control plane node and 6 worker nodes.
Note: 1 control plane node is for testing and demonstration purpose. For production, we recommend using at least 3 control plane nodes, which is fully supported by VMware Enterprise PKS.
For each MongoDB cluster, we deployed 1 ‘configdb statefulset’ to act as the internal configuration database of MongoDB. There were also 4 ‘shards statefulsets’ as MongoDB’s actual data nodes and 1 ‘mongos statefulset’ as the routing service of MongoDB.
We used YCSB and the workload type of ‘workload A (50% write/50% read)’ as described in the Test Tool and Workload section.
Each round of test costed 30 minutes for warming up and 1 hour for performance testing. The results are shown in Figure 20.
We started by deploying only 1 Kubernetes cluster and thus only 1 MongoDB cluster. Then we deployed another 1 Kubernetes cluster and MongoDB cluster to test the scalability. Finally, there were four clusters running and being tested in parallel.
With the number of clusters growing from 1 to 2 to 4, the throughput (ops/sec) increased from 13,914 to 27,306 to 46,323. The increasing rate was 96.2% from 1 cluster to 2 clusters and 69.6% from 2 clusters to 4 clusters.
Meanwhile, the average read latency grew from 1.097ms to 1.11ms to 1.303ms. The increasing rate was 1.18% from 1 cluster to 2 clusters and 17.3% from 2 clusters to 4 clusters.
The average write latency grew from 1.195ms to 1.234ms to 1.469ms. The increasing rate was 3.26% from 1 cluster to 2 clusters and 19.0% from 2 clusters to 4 clusters.
We could see that the ops/sec scaled out well as the number of clusters increased and the average latency kept a relatively slow increasing rate.
Figure 20. MongoDB Performance
MySQL Performance in VMware Enterprise PKS
Testing Procedure
We referred to this Kubernetes tutorial for deploying MySQL primary-secondary clusters.
We used the ‘pkscli’ command line tool to deploy Kubernetes clusters. Each Kubernetes cluster used for MySQL was deployed with the ‘2xlarge’ plan described in the ‘PKS Plan’ section. Each Kubernetes cluster contained 1 control plane node and 4 worker nodes.
For each MySQL cluster, we deployed 1 primary pod and 2 secondary pods. They are running in the primary-secondary mode.
We used SysBench to test the MySQL clusters performance as described in the Test Tool and Workload section.
Each round of test costed 30 minutes for warming up and 1 hour for performance testing. The results are shown in Figure 21.
Note: The transaction time was measured by the SysBench client for a database transaction operation. It is not only the storage layer read and write latency. So the average transaction time was usually much higher than the storage backend read and write latency. It is a common performance index in a SQL database testing.
We started by deploying only one Kubernetes cluster and thus only 1 MySQL cluster. Then we deployed another one Kubernetes cluster and MySQL cluster to test the scalability. Finally, there were four clusters running and being tested in parallel.
With the number of clusters growing from 1 to 2 to 4, the TPS increased from 464 to 959 to 1,410. The increasing rate was 106.6% from 1 cluster to 2 clusters and 47.0% from 2 clusters to 4 clusters.
Meanwhile, the average transaction time grew from 17.26ms to 19.88ms to 32.75ms. The increasing rate was 15.17% from 1 cluster to 2 clusters and 64.7% from 2 clusters to 4 clusters.
We could see that the TPS scaled out well as the number of clusters increased and the average transaction time kept a relatively slow increasing rate.
Figure 21. MySQL Performance
Deduplication and Compression
The VMware Enterprise PKS deployed Kubernetes cluster has at least one primary virtual machine and more than one worker virtual machines. These virtual machines are all cloned from one template. They share the same operating system, so the virtual machines’ operating system can be highly deduplicated in a vSAN cluster. This deduplication ratio could be very high if there are lots of Kubernetes cluster deployed.
On the other hand, the persistent volumes containing the data had a lower deduplication ratio. In our previous testing, we deployed four MySQL clusters and four MongoDB clusters. The data population methods are deployed in the Test Tool and Workload section. After the performance testing finished, the deduplication and compression ratio was 1.65 times and saved 3.07TB spaces.
Note: This value was based on our testbed. It should be different based on different testbeds and workloads.
Resiliency and Availability
Storage Level Protection by vSAN
We used an FTT=1 setting for the virtual machines in this vSAN cluster. With FTT=1, vSAN can tolerant one physical disk failure or one host failure from the storage perspective. vSAN can ensure the storage is accessed by virtual machines after the failure.
vSphere Fault Tolerance for Virtual Machines
For VMware Enterprise PKS version 1.0, the following virtual machines can only be deployed with one instance thus no clustering service: Pivotal Operations Manager, VMware Enterprise PKS Controller, and Harbor. To ensure 0 downtime from the computing’s perspective in addition to the storage’s perspective, we use vSphere Fault Tolerance to protect these virtual machines.
vSphere High Availability (HA) for Virtual Machines
For some other virtual machines like NSX-T Edges, they are already clustered. However, we also want fast recovery of the virtual machines in case of a host failure. We used vSphere High Availability (HA) to restart the virtual machines if there are physical host failures.
Best Practices
This section provides the recommended best practices for this solution.
When configuring VMware Enterprise PKS in a vSAN cluster, consider the following best practices:
- For NSX-T manager, controllers and edges, use a vSAN storage policy with erasure coding (RAID 5/RAID 6) to save space.
- Enable HA in the vSAN cluster.
- Enable DRS in the vSAN cluster.
- Use Anti-Affinity DRS rule to force the virtual machines of the NSX-T controller cluster reside on separate physical hosts.
- Use Anti-Affinity DRS rule to force the virtual machines of the NSX-T edge cluster reside on separate physical hosts.
- If Harbor’s data disk is smaller than 255GB, increase its stripe width to improve Harbor’s performance.
- Enable vSAN cluster’s deduplication and compression feature to save space.
- Set MTU equals to 9000 for all the physical switches.
- Choose different VMware Enterprise PKS plans for different workload demands.
Conclusion
Overall, deploying, running, and managing VMware Enterprise PKS on VMware vSAN provides predictable performance and scalability by taking advantages of the security, performance, scalability, operational simplicity, and cost-effectiveness of vSAN. Furthermore, by combining VMware Enterprise PKS and vSAN as a solution, all storage managements including infrastructure and operations of vSphere and vSAN move into a single software stack, thus you do not need two separate infrastructures for traditional storages and containers.
Reference
See more vSAN details and customer stories:
About the Author
Victor Chen, Senior Solutions Engineer in the Product Enablement team of the Storage and Availability Business Unit wrote the original version of this paper.
Catherine Xu, Senior Technical Writer in the Product Enablement team of the Storage and Availability Business Unit edited this paper to ensure that the contents conform to the VMware writing style.