Red Hat OpenShift Container Platform 4.10 on VMware Cloud Foundation 4.5

Executive Summary

Business Case

Red Hat® OpenShift® Container Platform offers automated installation, upgrades, and lifecycle management throughout the container stack—the operating system, Kubernetes and cluster services, and applications on any cloud. OpenShift helps teams build with speed, agility, confidence, and choice. OpenShift is focused on security at every level of the container stack and throughout the application lifecycle. It includes long-term and enterprise support from one of the leading Kubernetes contributors and open-source software companies.

The manageability of operating an OpenShift environment with virtualized infrastructure can be improved over the management of traditional IT infrastructure on bare metal, since the demand for resources can fluctuate with business needs, leaving the OpenShift cluster either under-powered or over-provisioned. IT needs a more flexible, scalable, and secure infrastructure to handle the ever-changing demands of OpenShift. With a single architecture that is easy to deploy, VMware Cloud Foundation™ can provision compute, network, and storage on demand. VMware Cloud Foundation protects network and data with micro-segmentation and satisfies compliance requirements with data-at-rest encryption. Policy-based management delivers business-critical performance. VMware Cloud Foundation delivers flexible, consistent, secure infrastructure and operations across private and public clouds and is ideally suited to meet the demands of OpenShift.

In this solution, we provide the generic design and deployment guidelines for running OpenShift on VMware Cloud Foundation.

Why Running OpenShift on VMware Cloud Foundation?

VMware Cloud Foundation combines automation with a standardized and repeatable approach to infrastructure, giving IT Operations the infrastructure agility necessary to support developers by providing developer-ready infrastructure.

The infrastructure automation capabilities of VMware Cloud Foundation enable administrators to quickly deploy, manage, and scale the underlying infrastructure with cloud-like agility and at the speed of business.

By running OpenShift on VMware Cloud Foundation, you get all the benefits of a modern private cloud based on the proven VMware Software-Defined Data Center architecture:

  • A consistent and repeatable approach to standardized Infrastructure.
  • The advanced automation eliminates human error and fosters admin productivity.
  • Cloud scale and agility that enables easy scale at the speed of the business.
  • Integrate with the well-proved networking solution from VMware: VMware NSX™.
  • Integrate with the well-proved storage solution from VMware: vSAN™.
  • Leverage the advantages of VMware vSphere® features, such as VMware vSphere Distributed Resource Scheduler™ (DRS), vSphere vMotion®, Fault Tolerance, and so on.

OpenShift on vSphere Benefits over OpenShift on Physical Server 

vSphere offers efficiencies and security in storage, networking, and memory usage. Meanwhile, virtualization brings operational savings to containerized workloads.

Virtualization Brings Operational Savings to Containerized Workloads

vSphere provides a developer-ready container platform where OpenShift runs natively on the hypervisor. Well-known vCenter tools and processes can now manage both traditional VM-based and containerized workloads across your hybrid cloud. vSphere brings trusted capabilities such as high availability and policy-based management to ensure availability and resiliency for all workloads. vSphere also enhances security of containers by naturally providing isolation of pods inside the VMs. In-addition, vSphere life-cycle management and enterprise resiliency reduces admin time required to manage bare metal updates and failures. All above benefits help improve daily operational efficiency for DevOps.

To learn more about these benefits in detail, check out Why Choose VMware Virtualization for Kubernetes and Containers.

Higher Container Pod Density Results in Lower Capex

The higher pod density translates to lower cost as the number of physical hosts required to run the same number of containers will be lower. By abstracting physical hardware, running OpenShift on vSphere allows for better utilization of resources than OpenShift on bare metal, which is a key advantage that virtualization offers over bare metal.

Audience

This reference architecture paper is intended for the following audiences:

  • Corporate CTOs and CIOs who are architecting OpenShift or Kubernetes in a private datacenter.
  • vSphere VI administrators who are familiar with VMware virtualized infrastructure and need to deploy and manage OpenShift in a virtualized environment.
  • DevOps who are deploying, managing, or using OpenShift on vSphere.
  • Any other engineer/operator/end-user who are interested in OpenShift/Kubernetes/vSphere and have the basic knowledges about VMware Cloud Foundation, vSAN, NSX, NSX Container Plug-in (NCP), Cloud Native Storage (CNS), Container Storage Interface (CSI), OpenShift, and Kubernetes.

Technology Overview

Solution technology components are listed below:

  • VMware Cloud Foundation
  • VMware vSphere
  • VMware vSAN
  • VMware NSX Data Center
  • Kubernetes vSphere CSI
  • Red Hat OpenShift Container Platform
  • VMware NSX Container Plug-in for OpenShift

VMware Cloud Foundation

VMware Cloud Foundation is an integrated software stack that combines compute virtualization (VMware vSphere), storage virtualization (VMware vSAN), network virtualization (VMware NSX), and cloud management and monitoring (VMware vRealize® Suite) into a single platform that can be deployed on-premises as a private cloud or run as a service within a public cloud. This documentation focuses on the private cloud use case. VMware Cloud Foundation bridges the traditional administrative silos in data centers, merging compute, storage, network provisioning, and cloud management to facilitate end-to-end support for application deployment. 

VMware vSphere

VMware vSphere is VMware's virtualization platform, which transforms data centers into aggregated computing infrastructures that include CPU, storage, and networking resources. vSphere manages these infrastructures as a unified operating environment and provides operators with the tools to administer the data centers that participate in that environment. The two core components of vSphere are ESXi™ and vCenter Server®. ESXi is the hypervisor platform used to create and run virtualized workloads. vCenter Server is the management plane for the hosts and workloads running on the ESXi hosts.

VMware vSAN

VMware vSAN is the industry-leading software powering VMware’s software defined storage and Hyperconverged Infrastructure (HCI) solution. vSAN helps customers evolve their data center without risk, control IT costs, and scale to tomorrow’s business needs. vSAN, native to the market-leading hypervisor, delivers flash-optimized, secure storage for all of your critical vSphere workloads, and is built on industry-standard x86 servers and components that help lower TCO in comparison to traditional storage. It delivers the agility to scale IT easily and offers the industry’s first native HCI encryption.

VMware NSX Data Center

VMware NSX Data Center is the network virtualization and security platform that enables the virtual cloud network, a software-defined approach to networking that extends across data centers, clouds, and application frameworks. With NSX Data Center, networking and security are brought closer to the application wherever it’s running, from virtual machines to containers to bare metal. Like the operational model of VMs, networks can be provisioned and managed independently of the underlying hardware. NSX Data Center reproduces the entire network model in software, enabling any network topology—from simple to complex multitier networks—to be created and provisioned in seconds. Users can create multiple virtual networks with diverse requirements, leveraging a combination of the services offered via NSX or from a broad ecosystem of third-party integrations ranging from next-generation firewalls to performance management solutions to build inherently more agile and secure environments. These services can then be extended to a variety of endpoints within and across clouds.

Kubernetes vSphere CSI Driver

Cloud Native Storage (CNS) is a vSphere and Kubernetes (K8s) feature that makes K8s aware of how to provision storage on vSphere on-demand, in a fully automated, scalable fashion as well as providing visibility for the administrator into container volumes through the CNS User Interface within vCenter. Run, monitor, and manage containers and virtual machines on the same platform—in the same way:

  • Simplify your infrastructure needs, lifecycle, and operations.
  • Lower costs, using a platform you already know for consistent operations across workloads and across clouds.
  • Spend less time managing infrastructure and more time building apps that provide business value.

The main goal of CNS is to make vSphere and vSphere storage, including vSAN, a platform to run stateful Kubernetes workloads. vSphere’s data path is highly reliable, highly performant, and mature for enterprise. CNS enables access of this data path to Kubernetes and brings an understanding of Kubernetes volume and pod abstractions to vSphere.

See https://www.vmware.com/products/cloud-native-storage.html for detailed information regarding CNS.

Red Hat OpenShift Container Platform

Red Hat OpenShift Container Platform ships with Red Hat Enterprise Linux® CoreOS for the Kubernetes control plane nodes and supports both Red Hat Enterprise Linux CoreOS and Red Hat Enterprise Linux for worker nodes. OpenShift supports the Open Container Initiative (OCI), which is an open governance structure around container formats and runtimes. OpenShift includes hundreds of fixes to defect, security, and performance issues for upstream Kubernetes in every release. It is tested with dozens of technologies and is a robust tightly integrated platform. OpenShift includes software-defined networking and validates additional common networking solutions. OpenShift also validates numerous storage and third-party plug-ins for every release.

See https://www.openshift.com/products/container-platform for detailed information regarding OpenShift Container Platform.

VMware NSX Container Plug-in for OpenShift

VMware NSX Container Plugin (NCP) provides the integration between NSX Data Center and container orchestrators such as Kubernetes, as well as integration between NSX Data Center and container-based PaaS (platform as a service) software products such as OpenShift.

The main component of NCP runs in a container and communicates with NSX Manager and with the OpenShift control plane. NCP monitors changes to containers and other resources and manages networking resources such as logical ports, switches, routers, and security groups for the containers by calling the NSX Policy API.

The NSX CNI plug-in runs on each OpenShift node. It monitors container life cycle events, connects a container interface to the guest vSwitch, and programs the guest vSwitch to tag and forward container traffic between the container interfaces and the vNIC.

See https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.2/ncp-kubernetes/GUID-52A92986-0FDF-43A5-A7BB-C037889F7559.html for detailed information regarding NCP.

Solution Configuration

This section introduces the resources and configurations:

  • Architecture diagram
  • Hardware resources
  • Software resources
  • VMware Cloud Foundation Installation
  • Workload Domain Preparation
  • Network configuration
  • vSAN configuration
  • OpenShift installation

Architecture Diagram

In this solution, the VMware Cloud Foundation test environment was composed of a management domain and a workload domain.

Typically, a VMware Cloud Foundation management domain can simultaneously manage multiple work domains. All of the workload domains’ management VMs, such as vCenters and NSX® Manager™, are placed in the management domain. All the workload domains can share one management domain.

Different workload domains can serve for different business purposes. The one-to-many mapping simplifies the overall management of the whole VMware Cloud Foundation environment.

We deployed the OpenShift Container Platform in one of the workload domains. The workload domain also contains the VMware NSX® Edge™ nodes. All other infrastructure VMs were in the shared management workload domain (figure 1).

This figure only shows the workload domain of OpenShift Container Platform that we focus on this architecture.

Obviously, the management domain can manage multiple work domains, but other work domains are not shown in this figure.

Figure 1. OpenShift on VMware Cloud Foundation Solution Standard Architecture

Notation in Figure 1:

  • Temp Bootstrap: This is the temporary bootstrap node. It only exists during the installation process and will be deleted after the OpenShift Container Platform is fully deployed.
  • Control plane-0,1,2: These are the control plane nodes of Kubernetes deployed and managed by OpenShift.
  • Worker-0,1,2,3: These are the worker nodes of Kubernetes deployed and managed by OpenShift. We deployed 4 worker nodes as the starting point. More worker nodes can be added on demand through the OpenShift control plane.

The above architecture is called a VMware Cloud Foundation’s standard architecture.

Note: Apart from the architecture in figure 1, for a small test environment, the management domain and the workload domain can be consolidated into one domain. This architecture only requires a minimum of 4 ESXi hosts, which is ideal for a small or test environment. See https://docs.vmware.com/en/VMware-Cloud-Foundation/4.4/vcf-getting-started/GUID-C6AF75AE-569C-49F8-A15E-E9A6EF9549DA.html for details about VMware Cloud Foundation Consolidated Architecture Model.

Figure 2. OpenShift on VMware Cloud Foundation Solution Consolidated Architecture

In our solution, we tested both architectures.

Standard Architecture

For the standard architecture, we created a 4-node ESXi cluster for the VMware Cloud Foundation management domain, running management virtual machines and appliances. The management domain can be shared with other workload domains.

 

Table 1. Management Domain VMs

VM Role

vCPU

Memory (GB)

VM Count

Management Domain vCenter Server

4

16

1

SDDC Manager

4

16

1

Management Domain NSX Manager

6

24

3

Workload Domain NSX Manager

12

48

3

Workload Domain vCenter Server

8

28

1

 

For the workload domain, we created another 4-node ESXi cluster with a separate NSX Fabric, deployed an NSX Edge Cluster, and deployed the OpenShift VMs in the workload domain.

Table 2 shows the deployment of the workload domain edge nodes and OpenShift VMs. For the workload domain edge node, we recommend that NSX Edge transport nodes are deployed with “Large” form factor.

Table 2. Workload Domain VMs

VM Role

Minimum vCPU

Minimum Memory (GB)

Storage

 Deployment Size

VM Count

Workload Domain Edge node

8

32

    200 GB

Large

2

OpenShift Control Plane Nodes

4

16

120GB for OS

n/a

3

OpenShift Compute Nodes

2

8

120GB for OS

n/a

Minimum of 2 for a standard OpenShift cluster

OpenShift Bootstrap Node (Temporary)

4

16

120GB for OS

n/a

1

 

Consolidated Architecture

For consolidated architecture, the virtual machines and their roles are the same. The difference is that in consolidated architecture,  there is only one cluster consisting of four ESXi hosts, and all virtual machines are in this cluster.

The above solution architecture as shown in figure 1 and figure 2, either in Standard Architecture Model or Consolidated Architecture Model, is called a building block for a basic installation of OpenShift with VMware Cloud Foundation. Based on the customer demands and database size, we can expand the workload domain to include more physical hosts. A cluster with vSAN enabled supports up to 64 physical hosts for non-stretched cluster. With adding more hosts to the vSAN cluster, not only the capacity of CPU and memory is increased for computing but also the capacity of vSAN storage is increased accordingly. This is one of the benefits of HCI that we can increase the capacity of computing and storage at the same time and proportionally.

Hardware Resources

In this solution, for the workload domain of OpenShift, we used a total of four Dell R630 nodes. Each server was configured with two disk groups, and each disk group consisted of one cache-tier write-intensive SAS SSD and four capacity-tier read-intensive SAS SSDs.

Each ESXi node in the cluster had the following configuration, as shown in table 3.

Table 3. Hardware Configuration for Workload Cluster

PROPERTY

SPECIFICATION

 

Server model name

 

Dell PowerEdge R630

CPU

2 x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, 28 core each

RAM

512GB

Network adapter

2 x Broadcom BCM57414 NetXtreme-E 25Gb RDMA Ethernet Controller

Storage adapter

1 x Dell HBA330 Adapter

Disks

Cache - 2 x 800GB Write Intensive SAS SSDs

Capacity - 8 x 3.84TB Read Intensive SAS SSDs

Software Resources

Table 4 shows the software resources used in this solution.

Table 4. Software Resources

Software

Version

Purpose

VMware Cloud Foundation

4.5

 

A unified SDDC platform that brings together VMware vSphere, vSAN, NSX, and optionally, vRealize Suite components, into a natively integrated stack to deliver enterprise-ready cloud infrastructure for the private and public cloud.

See BOM of VMware Cloud Foundation for details.

We used VMware Cloud Foundation 4.5 in this solution, later version is also supported. See Preparing to install on vSphere for VMware vSphere infrastructure requirements.

VMware vSphere

7.0.3

VMware vSphere is a suite of products: vCenter Server and ESXi.

VMware vSAN

7.0.3

vSAN is the storage component in VMware Cloud Foundation to provide low-cost and high-performance next-generation HCI solutions.

NSX Data Center

3.2

NSX Data Center is the key network component in VMware Cloud Foundation and is deployed automatically. It is designed for networking management and operation.

OpenShift

4.10

The version of OpenShift software being tested in this solution. See OpenShift Container Platform for the latest release introduction.

VMware Cloud Foundation Installation

The key steps for VMware Cloud Foundation installation are as follows:

  1. Deploy a management domain.
  2. Add ESXi hosts into the system.
  3. Create a workload domain with the idle ESXi hosts.

Follow the official document for the detailed information about VMware Cloud Foundation installation steps.

After the installation, just make sure that NSX and vSAN are successfully enabled on this workload domain. NSX and vSAN are integrated in this solution and will be used in successive configuration of OpenShift.

Graphical user interface, application

Description automatically generated

Figure 3. Check the Configuration and State of VMware Cloud Foundation Deployment

Workload Domain Preparation

After the workload domain is created for OpenShift, NSX Managers are deployed in the Management Domain. However, the NSX Edge cluster in not deployed by default.

We must add an edge cluster after the workload domain is created, as shown in Figure 3.

After the edge cluster is deployed, the state should be ‘active’ and the edge nodes’ names are also shown as in Figure 3.

Graphical user interface, text, application, email

Description automatically generated

Figure 4. Enable NSX Edge Cluster on the Workload Domain

Network Configuration

Figure 5 shows the VMware vSphere Distributed Switch™ network configuration for OpenShift cluster in the workload domain of the VMware Cloud Foundation. NSX, which underlies the vSphere infrastructure, is used for the OpenShift cluster networking. To enable external access for the OpenShift cluster, an NSX Edge cluster is required to deploy. Also, it is required to configure the BGP peering and route distribution of the upstream network. For more details, refer to VMware’s NSX document.

  • The ‘VCF-xxxx-External-1’ and External-2 port groups are created by VMware Cloud Foundation automatically for NSX.
  • The ‘xxxx-management’, ‘xxxx-vsan, ‘xxxx-vmotion’ are also created by VMware Cloud Foundation automatically. They are used for management, vSAN and vMotion separately.
  • The ‘ocp-v-c2-segment’ is a logical switch manually created in the NSX. It is used for OpenShift VM nodes.

For the management domain, the port groups are similar. For each domain, two 10 GbE vmnics were used and configured with teaming policies. The management domain can be shared among different workloads.

Graphical user interface, text, application, email

Description automatically generated

Figure 5. vSphere Distributed Switch Network Configuration

Follow the NSX Container Plugin for OpenShift -Installation and Administration Guide to create necessary NSX components, such as tier-0 and tier-1 routers.

After creation, the overall network topology is shown in Figure 6.

A picture containing graphical user interface

Description automatically generated

Figure 6. The Overall NSX Networking Architecture

The NSX controllers resided in the management domain. In our case, the OpenShift virtual machines were configured with a VM network called ‘ocp-v-c2-segment’ on an NSX segment. DHCP is enabled on this segment, which is required by OpenShift. VMware vSphere vMotion®, vSAN, and VXLAN VTEP for NSX had another dedicated segment created.

Jumbo Frame (MTU=9000) was enabled on the physical switches, vSAN VMkernel, and all the virtual switches to improve performance.

NSX Managers and Edges have more than one instance to form NSX clusters to achieve HA and better load balancing. Besides, based on workloads, the vCPU and memory may be adjusted to achieve better performance. Table 5 shows the configuration of the NSX managers and edge nodes virtual machines. The NSX managers reside in the management workload domain, so it will not cost the compute resources for OpenShift VMs. However, the NSX edge nodes reside in the OpenShift workload domain, and it will cost some CPU and memory resources. This should be taken into consideration while doing the sizing of the cluster before OpenShift is deployed.

Table 5. NSX Data Center VM Configuration

nsx data center VM Role

INSTANCE

vCPU

memory (GB)

vm name

Virtual disk size

Operating System

NSX Manager

3

12

48

NSX-unified-appliance-<version>

200GB

Ubuntu

NSX  Edge Nodes

2

4

8

Edge-<UUID>

120GB

Ubuntu

vSAN Configuration

The solution validation was based on a 4-node vSAN cluster as a building block.

The validation tests were conducted using the default vSAN datastore storage policy of RAID 1 FTT=1, checksums enabled. The vSAN cluster has deduplication and compression deactivated, and no encryption. In the below sections, we explained the detailed configurations of the vSAN cluster and some items in the Storage Policy Based Management (SPBM).

Deduplication and Compression

The ‘Deduplication and Compression’ option was configured on the cluster level and it can be enabled or deactivated for the whole vSAN cluster. While in our testing we deactivated it, by enabling it we can reduce the vSAN storage usage but induce higher latencies for the OpenShift application. This is a tradeoff for customers’ choices.

Failures to Tolerance (FTT)

Failures to Tolerance (FTT) is a configuration item in vSAN’s storage policy. For the ‘StorageClass’ in OpenShift and the corresponding vSAN’s storage policy, we recommended setting vSAN’s Failures to Tolerate (FTT) to 1. In our testing, we set FTT to 1 as the baseline. Do not set the FTT to 0 in an OpenShift with vSAN deployment because FTT=0 may possibly cause the data of the replications of the same pod to be stored in the same physical disk. This may cause data loss in case of a physical disk failure.

In the case of using RAID 1 in vSAN policy, there are two copies for each piece of data in vSAN. So, the estimated database capacity requirement should not exceed half of the vSAN’s overall capacity. In the case of RAID 5, vSAN consumes 1.33 times of the raw capacity and you can calculate the storage usage accordingly. If the capacity increase is needed, the additional machines can be added to the cluster and vSAN can increase the data capacity storage for OpenShift online without the service interruption to OpenShift users.

Erasure Coding (RAID 1 vs. RAID 5)

Erasure Coding is a configuration item in vSAN’s storage policy. It is also known as configuring RAID 5 or RAID 6 for vSAN objects. With FTT=1 and RAID 1, the data in vSAN is mirrored and the capacity cost would be 2 times of the raw capacity. With FTT=1 and RAID 5, the data is stored as RAID 5 and the capacity cost would be 1.33 times of the raw capacity.

In our testing, we used FTT=1 without Erasure Coding (RAID 1). By enabling Erasure Coding, we could save some vSAN storage spaces but induce higher latencies for the Kubernetes applications. Again, this is a tradeoff for customers’ choices.

OpenShift Installation

There are two methods to install OpenShift on vSphere: ‘User Provisioned Infrastructure’ (UPI) and Installer-Provisioned Infrastructure (IPI). IPI is an easier way for installation as it has more automation during deployment and follow-up management.

As of OpenShift version 4.10, which is used in this reference architecture, IPI is fully supported for installing OpenShift on VMware Cloud Foundation. IPI is fully compatible with all the software components such as vSphere, Cloud Native Storage (CNS) and VMware NSX Container Plugin (NCP).

During the installation process, we mainly used this OpenShift documentation and NSX Container Plugin for OpenShift - Installation and Administration Guide for reference.

We will demonstrate the major steps and highlights in this reference architecture while the above documents have more detailed information about the explanation of each step.

With IPI installation, OpenShift installer creates virtual machines automatically in the vSphere cluster. There is no need for manually creating virtual machines.

For the network configuration, the sample customized NCP operator installation YAML files are in this Github page for reference.

The major steps are:

  1. In NSX Data Center, create the corresponding networking resources. With VMware Cloud Foundation, the edge cluster and overlay transport zones are already created. We do not need to manually create them again.

We need to manually create the following resources:

  • A Tier-0 router
  • A Tier-1 router
  • A segment, for example, named ‘ocp-segment’
  • An IP Block for Kubernetes Pods
  • An External IP Pool
  1. Create a Linux virtual machine as the client, such as RHEL 8. We recommend connecting this client to the same segment as OpenShift nodes, ‘ocp-segment’ in our example. On this client, prepare the following basic software:
  1. On this Linux Client machine for OpenShift, create the install-config.yaml file.

The command is:

./openshift-install create install-config –dir ./

 

The required information is very simple as shown in figure 7.

Figure 7. Create the install-config.yaml file for OpenShift

 

  1. Pay attention to that in the OpenShift YAML file configuration above. Instead of the default ‘OpenshiftSDN’ networking method, we should specify ‘ncp’ as the networking type like the following:

networking:

  networkType: ncp

  clusterNetwork:

  - cidr: 10.4.0.0/16

    hostPrefix: 23

  machineCIDR: 10.114.16.0/24

  serviceNetwork:

  - 172.30.0.0/16

 

  1. Run the following command to create manifests’ YAML files from the install-config.yaml file.

```

./openshift-install create manifests --dir ./

```

  1. The NCP YAML files are included in the NCP download file from download.vmware.com. You can go to https://github.com/vmware/nsx-container-plugin-operator/releases, find the corresponding operator release (for example, v3.1.1) and download openshift4.tar.gz.
  2. Extract the YAML files package:

$ tar xzvf openshift4.tar.gz

 

  1. Modify the YAML file for the NCP operator, as described in the official documentation. We only need to modify the following two files:
  • configmap.yaml – Update this file with the NSX information.
  • operator.yaml – Specify the NCP image location in this file.
  1. For configmap.yaml, modify the following parameters according to your environment:
  • cluster
  • nsx_api_managers
  • nsx_api_user
  • nsx_api_password
  • external_ip_pools
  • tier0_gateway
  • overlay_tz
  • edge_cluster
  • apiserver_host_ip
  • apiserver_host_port
  1. In operator.yaml, you must specify the location of NCP operator image location in the env section and NCP image in the ‘NCP_IMAGE’ environment variable.
  • For the NCP Operator image, the default location is from docker hub. We recommend to pull the image and upload it to a private registry.
  • For the NCP image, download it from https://downloads.vmware.com -> NSX Data Center Product Page, then find it in the “Drivers & Tools” table. After manually downloading it, also upload it to a private registry.

These locations are in bold in the following example.

kind: Deployment

metadata:

 name: nsx-ncp-operator

 namespace: nsx-system-operator

spec:

 replicas: 1

 selector:

 matchLabels:

 name: nsx-ncp-operator

 template:

 metadata:

 labels:

 name: nsx-ncp-operator

spec:

 hostNetwork: true

 serviceAccountName: nsx-ncp-operator

 tolerations:

 - effect: NoSchedule

 key: node-role.kubernetes.io/master

 - effect: NoSchedule

 key: node.kubernetes.io/not-ready

 containers:

 - name: nsx-ncp-operator

 # Replace this with the built image name

 image: vmware/nsx-container-plugin-operator:latest

 command: ["/bin/bash", "-c", "nsx-ncp-operator --zap-time-encoding=iso8601"]

 imagePullPolicy: Always

 env:

 - name: POD_NAME

 valueFrom:

 fieldRef:

 fieldPath: metadata.name

 - name: OPERATOR_NAME

 value: "nsx-ncp-operator"

 - name: NCP_IMAGE

 value: "{NCP Image}"

 

  1. Copy the NCP operator configuration files to the OpenShift’s manifests’ folder, generated in step 7,8,9 above.

cp openshift4/*.yaml manifests/

 

  1. Create the OpenShift cluster.

./openshift-install create cluster --dir ./ --log-level=debug

In this way, we can use NSX and NCP.

Some other configurations during the installation include the following key points:

DHCP

We must use DHCP in the vSphere cluster for the OpenShift cluster. Besides, we must make sure that the allocated IP addresses for virtual machines are persistent across rebooting or maintenance.

Since we used NSX in this reference architecture, turn on ‘DHCP’ service in the NSX segment for OpenShift.

Static IP Addresses

An installer-provisioned vSphere installation requires two static IP addresses:

  • The API address is used to access the cluster API.
  • The Ingress address is used for cluster ingress traffic.

DNS Server

We must have access to the DNS server and create two DNS records in the server for the API and Ingress IP addresses.

Table 4. DNS Server Components

Component

Record

Description

API VIP

api.<cluster_name>.<base_domain>.

This DNS A/AAAA or CNAME record must point to the load balancer for the control plane machines. This record must be resolvable by both clients external to the cluster and from all the nodes within the cluster.

Ingress VIP

*.apps.<cluster_name>.<base_domain>.

A wildcard DNS A/AAAA or CNAME record that points to the load balancer that targets the machines that run the Ingress router pods, which are the worker nodes by default. This record must be resolvable by both clients external to the cluster and from all the nodes within the cluster.

See Installing a Cluster on vSphere for detailed information.

With VMware Cloud Foundation, Cloud Native Storage (CNS) is already enabled by default. There is no need for manually install CNS or install any third-party CSI driver. The CSI driver is shipped with OCP by default and it is used in this solution.

 

Solution Validation

Test Tools

We leveraged the following monitoring tools in this solution.

vSAN Monitoring Tools

vSAN Performance Service

vSAN Performance Service is used to monitor the performance of the vSAN environment through the vSphere Client. The performance service collects and analyzes performance statistics and displays the data in a graphical format. You can use the performance charts to manage your workload and determine the root cause of the problems.

vSAN Health Check

vSAN Health Check delivers a simplified troubleshooting and monitoring experience of all things related to vSAN. Through the vSphere client, it offers multiple health checks specifically for vSAN including cluster, hardware compatibility, data, limits, and physical disks. It is used to check the vSAN health before the mixed-workload environment deployment.

Application Validation Tools

Redis

Redis is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, HyperLogLogs, bitmaps, streams, and spatial indices.

For data persistent in Redis deployment with Kubernetes, a persistent volume claim needs to be created, Thus, it is a prerequisite to test the Cloud Native Storage usage.

We used Redis as one of the applications for OpenShift’s functional validation.

Functional Testing

Installation Validation

During the cluster bootstrap process, we can try some ‘kubectl’ commands to validate how the pods are created and whether the cluster is completely installed or not.

To monitor the installation status during bootstrap, ssh into the temporary bootstrap virtual machine:

$ ssh core@<bootstrap.ocp-v-c1.xxx.com>

Then, use the root user to monitor:

$ sudo su –

# kubectl --kubeconfig=/opt/openshift/auth/kubeconfig get po --all-namespaces

For example, we can check if all the pods are running without error. Some of the pods are already running and some are still under the creation stage.

Note: Due to version difference, the pods may vary.

The sample pods include the following:

Figure 8. Monitoring the Pods’ Creation Status during Cluster Bootstrap

Then, go back to the installation client virtual machine.

After a while, when the OpenShift Container Platform is successfully installed, the command line tool will automatically show the following information, including how to access the cluster.

INFO Install complete!

INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/vmware/ocp/auth/kubeconfig'

INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp-v-c1.xxx.com

INFO Login to the console with user: "kubeadmin", and password: "xxxx"

INFO Time elapsed: 33m14s

After successfully installing OpenShift, we can run some simple ‘oc’ commands to check if OpenShift is running healthily.

For example:

$ oc get nodes

The result is:

It means there are 3 control plane nodes and 3 worker nodes, and they are ready.

Application Deployment Validation

After the OpenShift Container Platform was installed, we deployed a sample Redis application which has persistent volume claim (PVC). So, we can validate that OpenShift can successfully provision an application and create PVC through CNS backed by vSAN.

There are two ways to deploy the sample Redis application:

  1. Deploy Redis Operator through OpenShift Web Console.
  2. Deploy Redis though command line tool with YAML configuration files.

Deploying Redis Operator through the OpenShift Web Console

Key steps are:

  1. Go to https://console-openshift-console.apps.ocp-v-c1.xxx.com

Log in with the credentials provided during installation:

INFO Login to the console with user: "kubeadmin", and password: "xxxx"

  1. Look for the Redis operator:

Graphical user interface

Description automatically generated

  1. After the operator is installed, edit the configuration files.

  1. Validate that Redis is successfully deployed.

 

Deploy Redis through command line tool with YAML configuration files.

Redis has an approved operator in OpenShift’s OperatorHub. However, for other applications, if there is no available operator, we can also deploy an application through YAML configuration files. We take this sample Redis deployment as an example.

Key steps are:

  1. Create namespace:

$ kubectl create ns redis

  1. Create a storage class which uses the vSAN’s RAID5 Storage Policy.

NOTE: The RAID5 Storage Policy is just a showcase to explain how to create a storage class in OCP with vSAN. We can define different policies in vSAN to meet different applications’ requirements. Then, storage classes can be created here which are backed by different storage policies in vSAN.

$ kubectl apply -f vsan-raid5-sc.yaml

A sample vsan-raid5-sc.yaml is

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

  name: vsan-raid5-sc

  annotations:

    storageclass.kubernetes.io/is-default-class: "false"

provisioner: csi.vsphere.vmware.com

parameters:

  storagePolicyName: “vSAN-raid5-policy”

  datastore: vSANDatastore

  diskformat: thin

  1. Create a persistent volume claim:

$ kubectl apply -f pvc.yaml

A sample pvc.yaml is:

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: fileserver-claim

spec:

  accessModes:

  - ReadWriteOnce

  storageClassName: "vsan-raid5-sc"

  volumeName: fileserver

  resources:

    requests:

      storage: 1G

  1. Deploy Redis:

$ kubectl apply -f redis.yaml

A sample Redis Pod configuration YAML file is:

apiVersion: v1

kind: Pod

metadata:

  name: redis-leader

  labels:

    app: redis

spec:

  containers:

    - name: leader

      image: harbor-repo.vmware.com/dockerhub-proxy-cache/redis

      env:

        - name: LEADER

          value: "true"

      volumeMounts:

        - mountPath: /mnt/fileserver

          name: redis-pvc

      ports:

        - containerPort: 6379

  volumes:

    - name: redis-pvc

      persistentVolumeClaim:

        claimName: fileserver-claim

        readOnly: false

After the PVC is successfully created, we can find the corresponding PV entry in Cloud Native Storage monitoring page in vSphere client as shown in figure 8.

Graphical user interface

Description automatically generated

Figure 8. PVC Validation in Cloud Native Storage Monitoring Page in vSphere Client

Failure Testing

This section introduces the failure scenarios and the behavior of failover and failback. This section includes:

  • Physical host failure
  • Physical cache disk failure
  • Physical capacity disk failure

These are related to vSAN storage functional testing under hardware failure scenarios. Under each failure condition, we saw data in vSAN stay intact and no data is lost. For more detailed information, refer to vSAN Operations Guide.

Best Practices

vSAN Best Practices

  • Enable Jumbo Frame on the physical switches. Use Jumbo Frames on the vSAN VMKernel and all virtual switches.
  • Apply vSAN default storage policy or with higher Failures to Tolerate (FTT) value depending on the business data protection needs.

vSphere Best Practices

  • Enable vSphere HA in the cluster.

We recommend enabling vSphere High Availability for the workload domain cluster.

If vSphere HA is enabled, in case of a physical host failure and there are enough remaining resources to satisfy the resource reservation like having a spare host, vSphere can automatically power on the impacted virtual machines on the other surviving hosts.

In case of a physical host failure and if there are not enough remaining resources to satisfy the resource reservation, vSphere HA would not restart the impacted virtual machines, which is by design. Because forcing a virtual machine restart on a surviving host may cause resource contention and imbalanced performance among the OpenShift nodes. We suggest that the resource reservation should at least be set to all the control plane nodes.

  • Enable vSphere DRS in the cluster

 

For DRS Anti-Affinity rules, see the DRS documentation.

  • Disable vSphere storage DRS in the cluster
  • vSphere storage DRS in not supported for OpenShift Container Platform. So, disable vSphere storage DRS in the cluster.
  • Use compute-only vMotion and do not use Storage vMotion
  • OpenShift Container Platform generally supports compute-only vMotion.
  • Storage vMotion of the vSphere volumes used by pods is not supported.

 

Other Recommendations

  • Use the same server model for the physical hosts in the workload domain.
  • Follow the guidelines from OpenShift documentation for the detailed deployment and optimization items.
  • Follow the NCP documentation for the NCP installation and configuration.

Conclusion

VMware Cloud Foundation delivers flexible, consistent, secure infrastructure and operations across private and public clouds. It is ideally suited to meet the demands of modern applications running on Red Hat OpenShift Container Platform in a virtualized environment.

With VMware Cloud Foundation, we can easily manage the lifecycle of the hybrid cloud environment. Besides, we have a unified management plane for all applications including OpenShift. With VMware Cloud Foundation, we can leverage the leading virtualization technologies including vSphere, NSX, and vSAN.

In this solution paper, we demonstrated the architecture of running Red Hat OpenShift Container Platform with VMware Cloud Foundation. We showed the configuration details, the hardware resources, and the software resources used in the solution validation. We showed the various configuration options in addition to the best practices. VMware Cloud Foundation Manager provided the lifecycle management. vSAN provides reliable, high-performance, and flexible storage to OpenShift. NSX Data Center provided the fine-grained, secured, and high-performance virtual networking infrastructure to OpenShift. Also, vSphere DRS and vSphere HA provided efficient resource usage and high availability. All the above lead to a consolidated solution of running Red Hat OpenShift Container Platform with VMware Cloud Foundation.

About the Author

Victor (Shi) Chen, Sr. Technical Marketing Manager of the Workload Technical Marketing team, wrote the original version of this paper.

The following reviewers also contributed to the paper contents: 

  • Chen Wei, Director of the Workload Technical Marketing team in VMware
  • Ka Kit Wong, Staff Technical Marketing Architect of the Workload Technical Marketing team in VMware
  • Catherine Xu, Senior Manager of the Workload Technical Marketing team in VMware
  • Gregory Charot, Senior Principal Product Manager in Red Hat
  • Vivien Wang, Ecosystem Partner Manager in Red Hat
  • Mike Guerette, Principal Product Marketing Manager in Red Hat
  • Ju Lim, Senior Manager, Product Management in Red Hat
  • Hemant Kumar, Principal Software Engineer in Red Hat
  • Ramon Acedo Rodriguzez, Senior Principal Product Manager in Red Hat

 

Filter Tags

Modern Applications Cloud Foundation 4.5 vSAN 7 vSphere 7 Cloud Native Storage Container Storage Interface Distributed Resource Scheduler (DRS) Kubernetes Management Domain Workload Domain Document Reference Architecture