]

Solution

  • Automation
  • Lifecycle Management
  • Modern Applications
  • Networking
  • Storage
  • Upgrade

Type

  • Document

Level

  • Advanced

Category

  • Proof of Concept

Product

  • Cloud Foundation 4

Technology

  • Kubernetes
  • Management Domain
  • vSphere Lifecycle Manager (vLCM)
  • Workload Domain

Phase

  • Design
  • Deploy
  • Manage

VCF 4.0 Proof of Concept Guide

Pre-Requisites and Bring-Up Process

Prerequisites and Preparation

VMware Cloud Foundation (VCF) deployment is orchestrated by the Cloud builder appliance, which builds and configures VCF components. To deploy VCF, a parameter file (in the form of an Excel workbook or JSON file) is used to set deployment parameters such as host name, IP address, and initial passwords. Detailed descriptions of VCF components may be found in the VCF Architecture and Deployment Guide.

The Cloud Builder appliance should be deployed on either an existing vSphere cluster, standalone host, or laptop (requires VMware Workstation or VMware Fusion). The Cloud Builder appliance should have network access to the Management Network segment defined the parameter file to enable connectivity to the ESXi hosts composing the management workload domain.

There are specific requirements that need to be fulfilled before the automated build process or ‘bring-up’ may begin – for instance, DNS records of the hosts, vCenter, NSX Manager, etc. should have been configured. Before starting, download the parameter spreadsheet to support planning and configuration of deployment prerequisites.

The OVA for Cloud Builder appliance and parameter workbook (Cloud Builder Deployment Parameter Guide) for version 4.0 can be found at:

https://my.vmware.com/web/vmware/details?downloadGroup=VCF400&productId=994&rPId=45721 

 

Alternatively, the parameter workbook may also be downloaded from the Cloud Builder appliance after it has been deployed.

Once the workbook has been completed, the file should be uploaded to the appliance, whereupon a script converts the Excel to a JSON file. This JSON file is then validated and used in the bring-up process.

The VMware Cloud Foundation YouTube channel is a useful resource to reference alongside this guide.

Deploy Cloud Builder Appliance

Download the Cloud Builder appliance and import the OVA. Once the OVA has been imported, complete the appliance configuration:

 

The ‘Deployment Architecture’ should be set to ‘vcf’ (default). 

Enter credentials for the admin and root accounts; the hostname and IP address of the appliance and gateway, DNS, and NTP details.

Bring-Up Parameters

Parameters required for configuring VCF during the bring-up process are entered into an Excel workbook, which may be downloaded from the Cloud Builder download page or from the appliance itself. Each version of VCF has a specific version of the parameter workbook associated with it.

There are several worksheets within the Excel workbook. Certain fields are subject to validation based on inputs elsewhere in the workbook. Care should be taken not to copy/paste cells, or otherwise alter the structure of the spreadsheet.

'Prerequisite Checklist’: This worksheet lists deployment prerequisites. Mark the ‘Status’ column for each row ‘Verified’ when each prerequisite is satisfied.

‘Management Workloads’: Shows the VMs deployed on the management domain. Only licenses, i.e. column L, should be populated on this worksheet. For the current versions of VCF (4.0.x), leave the SDDC Manager Appliance license empty.  Entering information into the 'Management Workload Domain Calculations' section will help to understand resource usage after deployment.

‘Users and Groups’: Enter a password for each service account. Ensure that each password entered meets cell validation requirements.

‘Hosts and Networks’: VLANs, IP addresses/gateways, and management workload domain hostnames should be entered in this worksheet. If the ‘Validate ESXi Thumbprints’ option is set to ‘No’, then the respective host SSH fingerprints will be ignored. Any native VLAN should be marked with a zero (0). In many cases, and especially for POC deployments, the vSAN and vMotion networks may be non-routable and not have a gateway. In this case, enter a gateway value within the respective subnet range, but not used by any device (this will produce a warning on bring-up which may be ignored).
Note: The MTU used here is not reflective of a production enviroment. the MTU was choosen for internal lab restrictions when creating this document. Supported MTU sizes for are 1600 - 9000 for NSX-T based traffic

‘Deploy Parameters’: This worksheet contains the bulk of the information required. Here, infrastructure dependencies such as DNS and NTP are specified, along with hostnames and IP addresses of management components. There are several sections to this worksheet:

  • Existing infrastructure details, i.e. DNS/NTP, etc.

  • vSphere Infrastructure, vCenter and Host details, etc.

  • NSX-T details

  • For more details about 'Application Virtual Networks' see the next section on 'Network and VLAN configuration'.
  • SDDC Manager details

Specifications related to host network configurations, as well as object names within the vSphere hierarchy are also specified within this worksheet.

To view an interactive demonstration of this process with step-by-step instructions, please visit Deployment Parameters Worksheet in the VCF resource library on StorageHub.

Network and VLAN Configuration

There are four VLANs that must be configured for the management domain:

  • Management (for hosts and management VMs)
  • vMotion
  • vSAN
  • NSX-T Host Overlay/Edge

In addition, for VCF version 4.0.x, two uplink VLANs are required for BGP peering between the NSX-T Edge VMs and the top of rack switch (see below).

For initial host bring-up, it is important to note that the default ‘VM Network’ port group should be on the same VLAN as the Management port group. The Cloud Builder appliance and SSDC Manager should be deployed to the same VLAN.

Jumbo frames are required for NSX / VTEP (MTU of at least 1600) and recommended for other VLANS (MTU 9000). Configure the network infrastructure to facilitate frames of 9000 bytes.

Note: In the above example, the VLAN was marked zero (0) due to internal lab restrictions. As mentioned previously, native VLAN should be marked with a zero (0).
Also note the MTU used here is not reflective of a production enviroment. the MTU was choosen for internal lab restrictions when creating this document. Supported MTU sizes for are 1600 - 9000 for NSX-T based traffic

Finally, a DHCP server is required on the Host Overlay VLAN to issue addresses on each host. If there is no DHCP server available, there will be warnings during the bring-up process. To bypass this issue for the purpose of a POC, static IP addresses may be assigned directly to the newly created VMkernel ports on each host. The bring-up process may then be resumed/restarted.

Application Virtual Networks

In order to support Application Virtual Networks (AVNs); BGP peering between the NSX-T Edge Service Gateways and upstream network switches is required for the management domain.

The diagram below shows an overview the BGP AS setup between the two NSX-T Edges deployed with VCF and the physical top of rack switches:

Inside the rack, the two NSX-T edges and UDLR form one BGP AS (autonomous system). Upstream, we connect to two separate ToR switches, each in their own BGP AS. The two uplink VLANs connect northbound from each edge to both ToRs.

The BGP configuration is defined in the parameter spreadsheet, in the 'Deploy Parameters' tab, under the section 'Application Virtual Networks'. We define the ToR details (as per the diagram above), with the respective IP address, BGP AS and password:

To complete the peering, the IP addresses of the two edges, with the ASN should be configured on the ToR (as BGP neighbors).

Note: BGP Password is required and cannot be blank. NSX-T Supports a maximum of 20 characters for the BGP password. The Cloud builder appliance should be able to resolve and connect to the NSX-T edges in order to validate the BGP setup, etc.

Note that for the purposes of a PoC, virtual routers (such as Quagga) could be used to peer with. In this case, make sure that communication northbound for NTP and DNS is available.

ESXi Installation and Configuration

Hardware components should be checked to ensure they align with the VMware vSphere Compatibility Guide (VCG). Drives and storage controllers must be vSAN certified, and firmware/drivers must be aligned with those specified in the VCG. See section 4.1.1 for a full list of host requirements.

Note that VCF requires identical hardware and software configuration for each ESXi host within a given workload domain, including the Management workload domain.

ESXi should be installed on each host. Hosts must match the ESXi build number specified the VCF Bill of Materials (BOM) for the version of VCF being deployed. Failure to do so may result in failures to upgrade ESXi hosts via SDDC Manager. It is permissible to use a custom image from a hardware vendor as long as the ESXi build number still matches the VCF BOM. The BOM may be located within the Release Notes for each version of VCF.

The release notes for VCF 4.0.x is located at: https://docs.vmware.com/en/VMware-Cloud-Foundation/4.0/rn/VMware-Cloud-Foundation-40-Release-Notes.html 

From here, we can see that the ESXi build number should be 15843807

Therefore, ensure the correct version and build of ESXi is installed on the hosts:

After ESXi has been installed, login to the host client on each host and ensure that:

  • The login password is the same as on the parameter spreadsheet  
  • The correct management IP address and VLAN (as per the parameter spreadsheet) has been configured
  • Only one physical adapter is connected to the Standard Switch
  • No vSAN configuration is present, and all disks (other than the boot disk) have no partitions present
  • NTP should be configured with the IP address or hostname of the NTP server
  • Both the SSH and NTP service should started and the policy changed to ‘Start and stop with host’

Finally, ensure that the hosts are not in maintenance mode.

DNS Configuration

Every IP address and hostname combination defined in the parameter workbook (i.e. hosts, NSX Manager, vCenter, etc.) must have forward and reverse entries in DNS before bring-up.

Ensure entries are correct and accounted for before starting the bring-up process, and test each DNS entry for forward and reverse lookup.

Post bring-up tasks such as creating new VI Workload domains, new clusters, adding hosts, etc. also require creating forward and reverse DNS lookup entries for associated components.

Management Workload Domain Overview

SDDC Manager and other vSphere, vSAN, and NSX components that form the core of VMware Cloud Foundation are initially deployed to an environment known as the Management workload domain. This is a special-purpose grouping of systems devoted to managing the VMware Cloud Foundation infrastructure. 

Each Cloud Foundation deployment begins by establishing the Management workload domain, which initially contains the following components:

  • SDDC Manager
  • vCenter Server with integrated Platform Services Controller
  • vSAN
  • NSX-T

Management Workload Domain Logical View:

In addition to the Cloud Foundation components that are provisioned during the bring-up process, additional virtual machine workloads may be deployed to the Management workload domain if required. These optional workloads may include third party virtual appliances or other virtual machine infrastructure workloads necessary to support a particular Cloud Foundation instance. 

The vCenter with internal Platform Service Controller instance deployed to the Management workload domain is responsible for SSO authentication services for all other workload domains and vSphere clusters that are subsequently deployed after the initial Cloud Foundation bring-up is completed.

Additional details regarding the configuration and usage of Cloud Foundation workload domains may be found in the following section of this guide, Workload Domain Creation.

SDDC Bring-Up

Once each host has been configured, DNS entries confirmed, and networks setup, verify that the parameter workbook is complete, then begin the bring-up process.

Power-on the Cloud Builder appliance. If configured correctly, the appliance will boot to a console displaying the IP address of the appliance:

 

To start the bring up process, navigate to the Cloud Builder in a web browser and login with the credentials that were provided in the OVA import.

Select ‘VMware Cloud Foundation’ as the platform.

Next, review the bring-up checklist to ensure all steps have been completed:

On the next page, we are given the option to download the parameter spreadsheet and upload a completed file for validation. If needed, download the Deployment Parameter Spreadsheet.

After the spreadsheet has been completed, upload it to Cloud builder.

Introduced in VCF 3.9.1 Application Virtual Networks (AVN) are the network foundation for supporting workload mobility in applications such as VMware vRealize Automation, VMware vRealize Operations Manger, and VMware vRealize Orchestrator. VMware recommends enabling AVNs from the beginning: configuring AVNs later is possible, but it is a manual process. 

The configuration for AVNs can be found in the deployment spreadsheet: 

  

Once the parameter spreadsheet has been uploaded, click on ‘Next’ to begin the validation process.

Once the process has completed, review any errors and warnings. Pay close attention to any password, DNS, or network warnings (note that in many cases, especially for POCs, both vSAN and vMotion networks may not be routable – and therefore the gateway for that network may show as unreachable).

Once satisfied that the issues have been addressed, click Next:

Click on 'Deploy SDDC’ to begin the deployment process:

During the bring-up process, periodically monitor the running tasks. Filter for ‘In-progress' to see the current task, the Deployment of VCF usually completes in 2-4 hours:

To monitor progress with greater visibility, use tail to display the bring-up logs on the Cloud Builder appliance: open an SSH session to the appliance and log in using the admin account. Run the command below to tail the bring-up logs. Note that there will be a considerable number of messages:

tail -Fqn0 /var/log/vmware/vcf/bringup/* | grep -v "Handling get all"

It may also be useful to login to the deployed vCenter instance (check the status messages to determine when it is available) to monitor bring-up progress.

To view an interactive demonstration of this process with step-by-step instructions, please visit VCF4 Bringup without AVN or VCF4 Bringup with AVN  in the VCF resource library.

Once all tasks have been finished, the appliance will indicate that the SDDC setup has been successfully completed:

Bring-up is complete, and the Cloud Builder appliance may be powered off.

Post Bring-Up Health Check

SDDC Manager: Dashboard

After the bring-up process has finished, login to SDDC Manager. Upon login, a dashboard presenting an overview of the VCF environment is presented.

All VCF management activities are accomplished through SDDC Manager – no configuration changes should be made to any of the deployed components, including vCenter.

SDDC Manager: User Management

The ‘Users’ panel on the left of the interface shows a list of users inherited from vCenter. To add a user or group, click on '+ User or Group' :

As such, identity sources from Active Directory, LDAP and OpenLDAP added to vCenter will appear here. Note that there are two roles defined in SDDC Manager, ADMIN and OPERATOR.

SDDC Manager: Repository Settings

Once SDDC Manager is setup, users are required to enter ‘My VMware’ account details to enable software bundle downloads. This may require configuration of a proxy in some environments.

Navigate to the ‘Repository Settings’ panel on the left of the interface and enter the account details:

 

Once bundles are available to download, the ‘Bundles’ panel will populate:

See the section on ‘LCM Management’ for further information on managing bundles.

SDDC Manager: Backup Configuration

It is recommended that the NSX managers are backed up to an external destination (currently SFTP is supported).

Navigate to ‘Backup Configuration’ on the panel on the left and click on ‘Register External’:

Enter the IP address, port, user credentials, etc. for the external destination:

SDDC Manager: Password Management

Service account passwords for deployed infrastructure components (e.g. ESXi hosts, NSX Manager) may be changed with SDDC Manager. SDDC Manager either updates passwords with a user-specified password (‘Update’ option), or automatically generates new randomly-generated passwords (‘Rotate’ option).

From the left panel, select Security > Password Management. Then, from the drop-down menu, select the component that will have passwords updated or rotated:

 

To rotate the password with a new, randomly generated password, select the user account(s) that needs to be updated and click ‘Rotate’. This will bring up a window to confirm the change:

To update a particular password with a new user-specified password, select only one user account, and click ‘Update’:

Note that the SDDC Manager password must be manually updated using the passwd command.

Passwords may be viewed by opening an SSH session to SDDC manager and issuing the following command:

/usr/bin/lookup_passwords

Workload Domain Creation

Workload Domain Overview

In VMware Cloud Foundation, a “workload domain” (or WLD) is a policy based resource container with specific availability and performance attributes that combines compute (vSphere), storage (vSAN, NFS, or VMFS on FibreChannel), and networking (NSX-T) into a single consumable entity. Each workload domain may be created, expanded, and deleted as part of the SDDC lifecycle operations, and may contain one or more clusters of physical hosts. 

Every Cloud Foundation deployment begins with provisioning a management workload domain, which hosts SDDC components necessary for Cloud Foundation to function. After the management workload domain is successfully deployed, SDDC Manager may be used to deploy additional Virtual Infrastructure VI) workload domains to host VM and container workloads. Each VI workload domain is managed by a corresponding vCenter instance, which resides within the VCF management domain; other management-related workloads associated with each workload domain instance may also be deployed within the management domain. 

While the management domain always uses vSAN for storage, workload domains may use vSAN, NFS (version 3), or VMFS on FibreChannel (FC). The type of storage used by a workload domain is defined when each workload domain is initially created. After the workload domain has been created with a specific storage type, the storage type cannot be changed later. Additionally, the storage type selected during workload domain creation applies to all clusters that are created within the workload domain.

Each VCF workload domain requires a minimum of three (3) hosts. Exact requirements vary depending on the workload domain type the host resides in. See the table below for details.

Component

Requirements

Servers

  • For vSAN-backed VI workload domains, three (3) compatible vSAN ReadyNodes are required. For information about compatible vSAN ReadyNodes, see the VMware Compatibility Guide.
  • For NFS-backed workload domains, three (3) servers compatible with the vSphere version included with the Cloud Foundation BOM are required. For information about the BOM, see the Cloud Foundation Release Notes. For compatible servers, see the VMware Compatibility Guide.
  • For VMFS on FibreChannel-backed workload domains, three (3) servers compatible with the vSphere version included with the Cloud Foundation BOM are required. For information about the BOM, see the Cloud Foundation Release Notes. In addition, the servers must have supported FibreChannel (FC) cards (Host Bus Adapters) and drivers installed and configured. For compatible servers and FibreChannel cards, see the VMware Compatibility Guide

 

Servers within a cluster must be of the same model and type.

CPU, Memory, and Storage

  • For vSAN-backed VI workload domains, supported vSAN configurations are required.
  • For NFS-backed VI workload domains, configurations must be compatible with the vSphere version included with the Cloud Foundation BOM. For more information about the BOM, see the Cloud Foundation Release Notes.
  • For VMFS on FibreChannel-backed workload domains, configurations must be compatible with the vSphere version included with the Cloud Foundation BOM. For information about the BOM, see the Cloud Foundation Release Notes.

NICs

  • Two 10GbE (or faster) NICs. Must be IOVP certified.
  • (Optional) One 1GbE BMC NIC

 

In this proof of concept guide, we will focus on configuration of workload domains with vSAN-backed storage. For configuration of NFS or FC-backed storage, please consult the Cloud Foundation documentation in conjunction with documentation from the NFS or FC storage array vendor.

Create VI Workload Domain

To configure a new VI workload domain, a minimum of three unused vSphere hosts must be available in the Cloud Foundation inventory.

Further, the host management interfaces should be accessible by SDDC Manager, and appropriate upstream network configurations should be made to accommodate vSphere infrastructure traffic (i.e. vMotion, vSAN, NSX-T, management traffic, and any required VM traffic).

If available hosts that meet requirements are not already in the Cloud Foundation inventory, they must be added to the inventory via the Commission Hosts process. Hosts that are to be commissioned should not be associated with a vCenter and should not be a member of any cluster. Additionally, prior to commissioning, each host must meet certain configuration prerequisites:

  • Hosts for vSAN-backed workload domains must be vSAN compliant and certified per the VMware Hardware Compatibility Guide. BIOS, HBA, SSD, HDD, etc. must match the VMware Hardware Compatibility Guide.
  • Host has a standard virtual switch back by two (2) physical NIC ports with a minimum 10 Gbps speed. NIC numbering should begin with vmnic0 and increase sequentially.
  • Host has the drivers and firmware versions specified in the VMware Compatibility Guide.
  • Host has ESXi installed on it. The host must be preinstalled with supported versions listed in the BOM.
  • SSH and syslog are enabled on the host.
  • Host is configured with DNS server for forward and reverse lookup and FQDN.
  • Hostname should be same as the FQDN.
  • Management IP is configured to first NIC port.
  • Ensure that the host has a standard switch and the default uplinks with 10Gb speed are configured starting with traditional numbering (e.g., vmnic0) and increasing sequentially.
  • Host hardware health status is healthy without any errors.
  • All disk partitions on HDD / SSD are deleted.
  • Ensure required network pool is created and available before host commissioning.
  • Ensure hosts to be used for VSAN workload domain are associated with vSAN enabled network pool.
  • Ensure hosts to be used for NFS workload domain are associated with NFS enabled network pool.
  • Ensure hosts to be used for VMFS on FC workload domain are associated with NFS or vMotion only enabled network pool.

Host Commissioning Steps:

  • To commission a host in SDDC manager, navigate to the Inventory > Hosts view, and select ‘Commission Hosts’ at the top right of the user interface.
  • Verify that all host configuration requirements have been met, then click ‘Proceed’.
  • On the next screen, add one or more hosts to be commissioned.  These may be added via the GUI interface, or alternatively may be added through a bulk import process. To add hosts via the GUI, ensure the ‘Add new’ radio button has been selected, and fill in the form. Then, click ‘Add’.
  • Alternatively, to bulk import hosts, click the ‘JSON’ hyperlink to download a JSON template for entering host information.  After entering host details into the .JSON file, save it locally and select the ‘Import’ radio button.  Then, click ‘Browse’ to select the .JSON file, and click ‘Upload’ at the lower right to upload the file to SDDC Manager.
  • When all hosts for commissioning are added, confirm the host fingerprints by selecting all hosts in the ‘Hosts Added’ table by clicking the grey circle with a check-mark located beside each host fingerprint listed in the ‘Confirm Fingerprint’ column. When the circle turns green, click the ‘Validate All’ button located near the upper right corner of the table.
  • After clicking ‘Validate All’, wait for the host validation process to complete.  This may take some time.  When the validation process completes, verify that all hosts have validated successfully, then click ‘Next’ to advance the wizard.
  • On the final screen of the wizard, review the details for each host, then click ‘Commission’ to complete the process.

Workload Domain Creation Steps:

  • To create a VI workload domain, navigate to the Workload Domains inventory view. Then, at the top right of the screen, click “+Workload Domain”, then select VI – Virtual Infrastructure from the dropdown.
  • Choose the storage type to use for this workload domain, vSAN, NFS, or VMFS on Fiber Channel (this cannot be changed later).
  • Enter configuration details for the workload domain.  Note that information will be used to provision a new instance of vCenter. This instance’s VM resides in within the Management workload domain, and manages the clusters associated with its respective VI workload domain.  Please ensure that valid forward and reverse DNS entries for the vCenter FQDN are configured, then click ‘Next’.
  • On the next screen, choose the version of NSX to deploy, and enter the deployment parameters. Ensure that forward and reverse DNS entries for NSX Manager FQDNs are be configured, and that the correct NSX software bundles have been downloaded on SDDC Manager. Then, click ‘Next’.
  • On the fourth screen in the wizard, configure the vSAN default Failures To Tolerate (FTT). Enabling the Dedupe and Compression feature for all-flash clusters is optional. Then, click ‘Next’.
  • The next step requires selecting available hosts from inventory to add to the workload domain.  If there are no hosts available, please follow the instructions above for commissioning hosts within SDDC Manager. VMware recommends deploying no less than 4 hosts per workload domain in order to ensure that compliance with vSAN FTT=1 policies may be maintained if a vSAN cluster host is offline. However, in cases where hosts available for the POC are limited, it is acceptable to construct a workload domain with the minimum three (3) required hosts, then later add an additional host for the purposes of demonstrating workload domain expansion functionality. For clusters supporting vSAN FTT polices greater than one (1) (i.e. FTT=2 or FTT=3), it is recommended to deploy at least one additional host above the minimum required for policy compliance. See the vSAN Design and Sizing guide for additional details. After selecting the hosts to be added, click ‘Next’.
  • Now, choose licenses available within SDDC Manager for the workload domain. If no applicable licenses are present, please add them to SDDC Manager (Administration > Licensing) then click ‘Next’.
  • Finally, review the deployment summary, then click ‘Finish’ to launch the deployment.

To view an interactive demonstration of this process with step-by-step instructions, please visit Create Workload Domain (NSX-T and vSAN) in the VCF resource library on TechZone.

Review Workload Domain Components

Components deployed during the workload domain creation process may be viewed within SDDC Manager. To view these components, navigate to Inventory > Workload Domains within SDDC Manager, then click the name of the workload domain you would like to inspect. 

To view an interactive walk-through of VCF SDDC components, please visit the Review Workload Domain Components demonstration in the VCF resource center on TechZone.

Expand Workload Domain Cluster

To expand the host resources available within a workload domain, the SDDC Manager interface is used to move one or more unused hosts from the SDDC Manager inventory to a workload domain cluster.

Before attempting to add additional hosts to a workload domain, verify that ‘Unassigned’ hosts are available in the SDDC Manager Inventory. If no hosts are presently ‘Unassigned’, please follow the host commissioning process to make one or more hosts available for use. 

To view an interactive demonstration of this process with step-by-step instructions, please visit the Expand Cluster demonstration in the VCF resource center on StorageHub.

NSX Integration

NSX Overview

NSX provides the core networking infrastructure in the software-defined data center stack within VCF. Every workload domain is integrated with and backed by an NSX-T platform. The Management workload domain is preconfigured with NSX-T, For VI workload domains, NSX-T can be deployed along side new workload domains or new workload domains can be added to existing NSX-T deployments. By default, workload domains do not include any NSX-T Edge clusters and as such are isolated.  

NSX Configuration: Management Workload Domain

During the initial VCF bring-up process, NSX-T is automatically deployed and configured in the management workload domain. Its default components include NSX-T 3.0 instance which is comprised of three controllers with a VIP for management access . Follow the steps below to review the main components of the NSX-T architecture and how it relates to VCF 4.0

  1. Log into SDDC Manager
  2. In the left panel, navigate to Inventory > Workload Domains
  3. Select the workload domain that is of type ‘MANAGEMENT’ (‘ mgmt-domain’ in this example):

Select the management cluster, (‘mgmt01’ in this example):

We can see that there is an NSX-T instance (nsx-mgmt-vcf.ssdc.lab) deployed with an associated NSX-T Edge Cluster (mgmt-edge-cluster), as we have chosen to deploy VCF with the option of AVN (Application Virtual Networks). 

Note: If AVN was not chosen as part of bring-up, an Edge Cluster would not be available. as per the following screenshot:

Accessing NSX-T interface from SDDC Manager

Click on the hyperlink to launch NSX-T Web interface, and login with administrative privileges defined on bring-up i.e. admin

Once logged in  from the NSX-T Dashboard we are shown four main dashboards Networking, Security, Inventory and System.

In the next section we we will focus on System and Networking and how that relates to VCF

NSX-T Fabric Overview

Lets understand the main components that make up the fabric

NSX-T Appliances.

The NSX-T Data Center Unified Appliance is an appliance included in the installation of NSX-T. It includes the ability to deploy the appliance in the roles of NSX Manager, Policy Manager, or Cloud Service Manager.
VMware has combined both the NSX Manager and NSX controller into a single virtual appliance called “NSX unified appliance” which can be run in a clustered configuration.
Durning initial VCF 4.0 bringup NSX-T Appliances are deployed on the management cluster and automatically configured as per the bring-up spec
This is the screen-shot of the VCF 4.0 Excel spread-sheet section relating to NSX-T

To inspect the NSX-T appliances and cluster status

  1. Click on System to review the Fabric.
  2. On the left hand navigation pane click on Appliances

We will first inspect the NSX-T Appliances. There are three appliances that are deployed and clustered together. To access the cluster a Virtual IP (in our case IP 10.0.0.20) to automatically configured as per VCF 4.0 bringup spec.

The NSX-T Cluster Status should be of status stable

Transport Zones

In NSX-T Data Center, a transport zone (TZ) is a logical construct that controls which hosts a logical switch can reach. A transport zone defines a collection of hosts that can communicate with each other across a physical network infrastructure. This communication happens over one or more interfaces defined as Tunnel Endpoints (TEPs).

There are two types of transport zones: Overlay and VLAN. An overlay transport zone is used by ESXi host transport nodes and NSX-T Edge Nodes. When an ESXi host or NSX-T Edge transport node is added to an Overlay transport zone, an N-VDS is installed on the ESXi host or NSX Edge Node.
VCF 4.0 will automatically configure three transport zones (two if no AVNs are specified)

To inspect the  transport zones automatically configured by VCF, click on Fabric > Transport Zones


We have three configured transport zones. The VLAN transport zone is used by NSX-T Edge Nodes and ESXi host transport nodes for its VLAN uplinks. When an NSX-T Edge Node is added to a VLAN transport zone, a VLAN N-VDS is installed on the NSX-T Edge Node.

  • Overlay transport zone for host transport nodes and edge nodes
  • VLAN-backed transport zone for host management networks, e.g. vSAN and VMotion
  • VLAN-backed edge transport zone for Edge uplinks

To inspect host transport zone overlay

  1. Click on transport zone name, in our example,  mgmt-domain-tz-overlay01. The overview will show number of hosts and edges associated, number of switches and switch ports.
  2.  Click on Monitor to review the health and status of the transport nodes, in this case hosts and edge appliances

You may repeat this procedure for the remaining transport nodes.

Host Transport Nodes

In NSX-T Data Center, a Transport Node allows nodes to exchange traffic for virtual networks.

The vSphere hosts  were defined on the VCF 4.0 Excel Spread-sheet and act as Transport Nodes for NSX-T

To inspect the host transport nodes from an NSX-T perspective

1. From the system view Click on Fabric -> Nodes

2. From Host Transport Nodes click on drop down pick list next to "Managed by

3. Select the Compute Manager, in our case vcenter-mgmt.vcf.sddc.lab 

4. Expand the Cluster, in our case mgmt-cluster 

We should now see (since this is a management cluster) a minimum of four vSphere hosts from the management cluster prepared successfully and Node status should be Up

The hosts were defined on the VCF 4.0 Excel Spread-sheet as esxi-1 through to esxi-4

Edge Transport Nodes

The NSX Edge provides routing services and connectivity to networks that are external to the NSX-T Data Center deployment. An NSX Edge is required if you want to deploy a tier-0 router or a tier-1 router with stateful services such as network address translation (NAT), VPN and so on.

An NSX Edge can belong to one overlay transport zone and multiple VLAN transport zones. If a Virtual Machine requires access to the outside world, the NSX Edge must belong to the same transport zone that the VM's logical switch belongs to. Generally, the NSX Edge belongs to at least one VLAN transport zone to provide the uplink access.

We defined NSX-T Edges in the VCF 4.0 Excel spread-sheet

To review the edge transport nodes and clusters 

  1. Click on Fabric > Nodes > Edge Transport Nodes

2. Click on one of the edge-transport nodes for more details

We can see this edge is associated with 2 transport zones; a VLAN sfo01-m01-edge-uplink-tz and a host overlay transport zone mgmt-domain-tx-overlay01

3. Click on Monitor to review the system resources and how each interface on the appliance is associated to each uplink. the interfaces fp-ethX can be mapped to the virtual nic interfaces on the edge appliance

 

Compute Manager

A compute manager, such as vCenter Server manages resources such as hosts and VMs.
NSX-T Data Center is decoupled from vCenter. When VCF bringup process adds a vCenter Server compute manager to NSX-T, it will use the vCenter Server user's credentials defined in the VCF 4.0 bringup specifications
When registered NSX-T polls compute managers to find out about changes such as, the addition or removal of hosts or VMs and updates its inventory accordingly.

To inspect the configuration

  1. Click Fabric > Compute Managers
  2. Click on the registered Compute Manager to gather more details, in this case it is the management vCenter server

NSX-T Logical Networking Overview

in this section we will review the logical networking concepts and how they relate to VCF 4.0 Management Domain bring-up.

A few terms to help with this overview, for more information please review the NSX-T 3.0 installation guide https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.0/installation/GUID-A1BBC650-CCE3-4AC3-A774-92B195183492.html 

Tier-0 Gateway or Tier-0 Logical Router
The Tier-0 Gateway in the Networking tab interfaces with the physical network. The Tier-0 gateway runs BGP and peers with physical routers.
Tier-1 Gateway or Tier-1 Logical Router
The Tier-1 Gateway in the Networking tab connects to one Tier-0 gateway for northbound connectivity and one or more overlay networks for southbound connectivity.

Segment
This is also known as a logical switch. A segment provides virtual Layer 2 switching for Virtual Machine interfaces and Gateway interfaces. A segment is a logical entity independent of the physical hypervisor infrastructure and spans many hypervisors, connecting VMs regardless of their physical location. 

Virtual machines attached to the same segments can communicate with each other across transport nodes through encapsulation over tunneling.
When segments are created, they appear as a port group on vSphere.

Tier-0 and Tier-1 Gateways

To review the Tier-0 Gateway deployment

1. From the main NSX-T Dashboard, click on Networking > Tier-0 Gateways. We can see that the Tier-0 gateway, mgmt-edge-cluster-t0-gw01, has been deployed as part of VCF 4.0 bring-up.

2. Click on Linked -Tier1 Gateways. we now see that the Tier-1 Gateway, mgmt-edge-cluster-t1-gw01 is associated with the Tier-0 gateway

3. Click on Networking > Tier-1 Gateways

4. Click on Linked Segments, we see the Tier-1 Gateway is associated with 2 segments. These segments were defined on the VCF 4.0 bring-up spec

Tier-0 BGP Routing

To enable access between your VMs and the outside world, you can configure an external or internal BGP (eBGP or iBGP) connection between a tier-0 gateway and a router in your physical infrastructure.

 

This is a general review of BGP routing and how it was defined on the VCF 4.0 bring-up and what it looks like on the NSX-T Manager

Here are the deployment parameters for the AVNs on VCF 4.0 spread-sheet. This part of the bring-up simply defined how the Edges for the management cluster would be deployed and what should the configuration of Tier-0 (with BGP) and Tier-1 gateways looks like.

Note the following details, 

  • Edge Autonomous System ID 65003
  • Top of Rack Autonomous System ID 65001
  • Top of Rack IPs 192.168.16.10 and 192.168.17.10 

When configuring BGP, you must configure a local Autonomous System (AS) number for the Tier-0 gateway.  VCF spec set this value to 65003. Both edges must use the same AS number
You must also configure the remote AS number on the Top Of Rack switches. As per the VCF bringup spec, the phyiscal Top of Rack switch AS number is 65001

EBGP neighbors must be directly connected and in the same subnet as the tier-0 uplink.
We can we can see by the VCF screenshot above both edge node 1 and edge node 2 have Uplinks defined on 192.168.16.0/24 and 192.168.17.0/24
We also see the both top of rack switches have IP addresses on the same subnet i.e. 192.168.16.10 and 192.168.17.10

  1. From the main NSX-T Dashboard, click on Networking > Tier-0 Gateways. 
  2. Expand the Tier-0 Gateway for more details.
  3. Expand BGP Details.
  4. We can see the local AS number is 65003 which matches the excel spreadsheet entry.

5. Next we will look at the BGP Neighbors. we can see the detail by looking at the details behind the BGP Neighbors, in this case 2

Now we see that there are two neighbors configured, 192.168.16.10 and 192.168.17.10, with AS number 65001 this matches the Top Of Rack Switch details defined on the VCF spreadsheet.

6. For a graphical representation of Tier-0 BGP configuration ,  close the BGP Neighbors detail and click on topology view  from step 5 as highlighted below in red
We can see the IP addresses 192.168.16.2, 192.168.16.3, 192.168.17.2 and 192.168.17.3 are configured and Peered to the Top Of rack Switches (192.168.16.10 and 192.168.17.10)

Segments

This is also known as a logical switch. as per the VCF 4.0 Bring up spec, we have defined two segments as per the AVN setup for Virtual Machine Traffic. 

These are Region -A Logical Segment called local-sgement and xRegion Logical Segment xregion-segment

To view the AVN Segments, click on Networking > Segments 

Take note of the two segments highlighted below, these are backed by management domain overlay transport zone

These Segments are presented as Port Groups on vSphere

To view on vSphere login to vSphere Management vCenter server, Navigate from Home > Networking > Management Networks
expand management distributed switch and locate the segments

Edge Segments

The remaining two Segments are for VLAN backed up-link connectivity for the NSX-Edges 

These VLANs were defined on bring-up on the VCF 4.0 excel spreadsheet, see NSX-T Edge Uplink-1 and Edge-Uplink-2

This is a detailed view of one of the NSX-T Edge uplink segments (Edge Uplink 1)

NSX-T Edge Overlay

A NSX-T Edge overlay is also defined for on VCF 4.0 bringup excel spreadsheet

Separate VLANs and subnets are required for NSX-T Host Overlay (Host TEP) VLAN and NSX-T Edge Overlay (Edge TEP) VLAN, as it is a form of isolating the traffic for each onto a separate VLAN.
In this way we use separate VLANs for each cluster for the Host TEPs - so if you had three clusters you could have three separate Host TEP VLANs and one Edge TEP VLAN.
By separating the traffic onto different VLANs and subnets we remove a potential SPOF. e.g. if there was a Broadcast Storm in the Host TEP VLAN for one cluster it wouldn’t impact the other clusters or Edge Cluster.

NSX-T Host Overlay (Host TEP) VLAN and NSX-T Edge Overlay (Edge TEP) VLAN must be routed to each other. so in our case the NSX-T Host Overlay VLAN 0  is routed to NSX-T Edge Overlay VLAN 1252
You cannot use DHCP for the NSX-T Edge Overlay (Edge TEP) VLAN.

Note: The NSX Manager interface provides two modes for configuring resources: Policy and Manager View. For more information read the NSX-T 3.0 Overview guide  https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.0/administration/GUID-BB26CDC8-2A90-4C7E-9331-643D13FEEC4A.html#GUID-BB26CDC8-2A90-4C7E-9331-643D13FEEC4A 

1. To review the NSX-T Overlay configuration you may have to switch to Manager View. Click on Manager on the top right of the NSX-T Main menu if not already in Manager mode

2. Now click on Logical Switches on Networking Dashboard

3. Click on Edge Overlay name as defined on the VCF 4.0 excel spreadsheet, in this case sddc-edge-overlay
The summary shows this logical switch is associated with the overlay transport zone mgmt-domain-tz-overlay01

4. Click on Transport Zone to view the transport zone and "Where Used"

5. To review where the VLAN 1252 is defined , click on System > Fabric > Nodes > Edge Transport nodes.

6. Select an edge node and select edit. the Edge node is associated with two transport zones, and a profile that defines  VLAN 1252

7. Click on System > Fabric > Profiles

An uplink profile defines policies for the links from  hosts to NSX-T Data Center logical switches or from NSX Edge nodes to top-of-rack switches. 

The settings defined by uplink profiles  include teaming policies, active/standby links, the transport VLAN ID, and the MTU setting

in our case uplink-profile-1252 has the teaming and VLAN settings defined on the uplink profile assocated with the Edge transport nodes

NSX Configuration: VI Workload Domain(s)

When creating a VI workload domain, NSX-T is deployed to support its networking stack. There are prerequisites for deploying NSX-T; please refer to the VCF Product Documentation for details.

A cluster of three NSX-T Manager nodes is deployed by default when a NSX-T based workload domain is created.  

On the workload domain page, select the summary view, and then

If a NSX-T Edge Cluster is also created it will be visible and associated to the workload domain instance of NSX-T

Click the FQDN of the NSX-T Cluster. This will open a new browser tab and automatically log into one of the NSX-T Manager instance:

 

Confirm that the NSX-T management cluster is in ‘STABLE’ state. Also verify that the Cluster Connectivity for each node is ‘Up’:

To review the Transport Zones configured, Select System > Fabric > Transport zones

There are two Overlay transport zones  and one VLAN transport zone.

Hosts associated with this workload domain are connected to the default NSX-T overlay for the workload domain, in this case four hosts

  Select System > Fabric > Nodes and select Host Transport Nodes tab   in the “Managed by” drop-down list to show all transport nodes in the cluster associated with the vCenter instance associated with the workload domain
. Ensure the ‘Configuration’ is set to ‘Success’ and “Node Status” is ‘Up’ for each node:

vCenter

All NSX Managers for new workload domains are deployed on the Management workload domain resource pool

From the management vCenter , go to Hosts and Clusters and expand the resource pool  mgmt-rp

vSphere Networking

With previous versions of NSX-T, installing NSX-T required setting N-VDS and migrating to from vDS.  Now it is possible to use a single vSphere Distributed Switch for both NSX-T 3.0 and vSphere 7 networking

When installing NSX-T 3.0 it can run it straight on top of the existing vDS without needing to move pnics to N-VDS.
Note: When NSX-T is associated with the vSphere VDS it will be updated on the summary page that it is managed by NSX-T instance

NSX-T Edge Cluster Deployment

Intro

You can add multiple NSX-T Edge clusters to workload domains for scalability and resiliency. However, multiple Edge clusters cannot reside on the same vSphere cluster.
NSX-T Data Center supports a maximum of 16 Edge clusters per NSX Manager cluster and 8 Edge clusters per vSphere cluster.
The north-south routing and network services provided by an NSX-T Edge cluster created for a workload domain are shared with all other workload domains that use the same NSX Manager cluster.
For more information please review VCF documentation https://docs.vmware.com/en/VMware-Cloud-Foundation/4.0/com.vmware.vcf.admin.doc_40/GUID-AFD3A096-8CC2-4271-98A3-57454313FC2F.html

So in this POC guide we will have already deployed  an additional workload domain.
The purpose of this document is to walk through the configuration to understand the network requirements and finally to check and validate that the edge(s) were deployed successfully

Prerequisites

As per the documentation, https://docs.vmware.com/en/VMware-Cloud-Foundation/4.0/com.vmware.vcf.admin.doc_40/GUID-D17D0274-7764-43BD-8252-D9333CA7415A.html 

  • Separate VLANs and subnets are available for NSX-T Host Overlay (Host TEP) VLAN and NSX-T Edge Overlay (Edge TEP) VLAN. A DHCP server must be configured on the NSX-T Host Overlay (Host TEP) VLAN. 
  • You cannot use DHCP for the NSX-T Edge Overlay (Edge TEP) VLAN.
  • NSX-T Host Overlay (Host TEP) VLAN and NSX-T Edge Overlay (Edge TEP) VLAN are routed to each other.
  • For dynamic routing, set up two Border Gateway Protocol (BGP) Peers on Top of Rack (ToR) switches with an interface IP, BGP autonomous system number (ASN), and BGP password.
  • Reserve a BGP ASN to use for the NSX-T Edge cluster’s Tier-0 gateway.
  • DNS entries for the NSX-T Edge nodes are populated in the customer-managed DNS server.
  • The vSphere cluster hosting an NSX-T Edge cluster must include hosts with identical management, uplink, host TEP, and Edge TEP networks (L2 uniform).
  • You cannot deploy an Edge cluster on a vSphere cluster that is stretched. You can stretch an L2 uniform vSphere cluster that hosts an Edge cluster.
  • The management network and management network gateway for the Edge nodes must be reachable.
  • Workload Management supports one Tier-0 gateway per transport zone. When creating an Edge cluster for Workload Management, ensure that its overlay transport zone does not have other Edge clusters (with Tier-0 gateways) connected to it.

Deployment

As a proof of concept we will deploy a new Edge Cluster to an already deployed workload domain called wld01. 

  • We will deploy a small Edge Node Form Factor: 4 GB memory, 2 vCPU, 200 GB disk space. The NSX Edge Small VM appliance size is suitable for lab and proof-of-concept deployments.
    Note: You cannot change the size of the edge form factor after deployment. Only use SMALL for lab or proof of concepts
  • We will deploy two Edges in Active-Active High Availability Mode (In the active-active mode, traffic is load balanced across all members and If the active member fails, another member is elected to be active)

We have gathered the following details prior to deployment

  • ASN number : 65004 for Tier-0 BGP
    • ToR Switch IP addresses and subnets
  • NSX-T Overlay VLAN 1252 (routable to host overlay) 
    • Static IP addresses for Edge Overlay VLAN 1252
  • Edge Uplinks VLAN 2081 and 2082.  (for connectivity to Top of Rack Switch
    • Static IP addresses for VLAN 2081 and 2082

Procedure

  1.  From SDDC manager Click on Workload Domains, select a workload domain, and click on Actions
    Chose Click Add Edge Cluster
  2.  Click Begin to walk through the wizard

The following walk through demo can be reviewed here to understand the process.
Please navigate to  Add NSX-T Edge Cluster 

 This is an example table to complement the walk through demo , which may  help work out the required  IP addresses and VLANs used through the wizard. having this detail upfront will help with the deployment and avoid confusion.

Edge Cluster Configuration    
FIELD Sample Value  
Edge Cluster Name wld01-edge01  
MTU 9000  
ASN 65004  
Tier 0 Name edge01-t0  
Tier 1 name edge01-t1  
Edge Cluster Profile Type Default  
Edge Cluster Profile Name Custom  
Use Case Custom  
Edge Form Factor Small Note:-  Only be used for Proof of Concepts  
Tier-0 Service High Availability Active-Active  
Tier-0 Routing Type EBGP  
     
Node Configuration    
  Edge-Node-1 Edge-Node-2
FIELD Sample Value Sample Value
Edge Node Name (FQDN) edge01-wld01.vcf.sddc.lab edge02-wld01.vcf.sddc.lab
Management IP (CIDR) 10.0.0.53/24 10.0.0.54/24
Management Gateway 10.0.0.250 10.0.0.250
Edge TEP 1 IP (CIDR) 192.168.52.14/24 192.168.52.16/24
Edge TEP 2 IP (CIDR) 192.168.52.15/24 192.168.52.17/24
Edge TEP Gateway 192.168.52.1  192.168.52.1 
Edge TEP VLAN (must route to host overlay) 1252 1252
Cluster (Workload domain Cluster) wld01-cluster01 wld01-cluster01
Cluster Type L2 Uniform L2 Uniform
     
First Uplink VLAN 2081 2081
First Uplink Interface IP (CIDR) 192.168.16.4/24 192.168.16.5/24
Peer IP (CIDR) 192.168.16.10/24 192.168.16.10/24
ASN Peer 65001 65001
BGP Peer Password ******* *******
     
Second Uplink VLAN 2082 2082
Second Uplink Interface IP(CIDR) 192.168.17.4/24 192.168.17.5/24
Peer IP (CIDR) 192.168.17.10/24 192.168.17.10/24
ASN Peer 65001 65001
BGP Peer Password ******* *******

Validation

From SDDC Manager UI the new edge cluster is listed on the Workload Domain summary
Validation is also explored by reviewing the walk through demo can be reviewed here to understand the process please navigate to demo Add NSX-T Edge Cluster 
Once the demo is started navigate to step 47 of the demo

From the SDDC manager shortcut  launch the NSX-T web interface

Click System > Fabric Edge Transport Nodes to see the edge node details. We can see edge01-wld01.vcf.sddc.lab and edge02-wld01.vcf.sddc.lab deployed

From the top menu click Networking to view the Tier-0 and Tier-1 dashboards

We can see from the dashboard. the Tier-0 gateways. This is responsible for North-South Routing. We can see BGP is enabled and a Peer is configured. The Tier-1 gateway is used for East-West traffic.

To view the topology layout between Tier-1, Tier-0 and the outside physical infrastructure, Select Network Topology

We can see 192.168.17.4/24, 192.168.16.4/24 192.168.17.5/24 and 192.168.16.5/24 represent the IP addresses on the edges that are peered to the  of rack AS 65001. 

To verify BGP connectivity  and peering status to the top of rack switches 

1. Navigate back to  Network Overview, select Tier-0 Gateways , select the Tier-0 Edge, edge01-t0, to expand on the details

2. Click on BGP to expand

3. Click on for BGP neighbor details (as we have two neighbors configured

We can see the status of both Top of Rack BGP Peers. Status of Success indicates peering has been successfully established.

vSphere 

The NSX-T Edge Cluster will be deployed on the associated workload domain. And Edge Cluster resource pool is created and the edges are deployed onto the workload domain cluster, in this case wld01
Note: the vCenter and NSX-T unified controllers are deployed on the Management Domain vSphere Cluster
To view the edges from a vSphere perspective, login to the vSphere Client, navigate from host and clusters to the vCenter instance associated with the workload domain, expand the cluster and resource pool to inspect the NSX-T Edges

vSphere Networking and vDS details

Two additional vDS port groups are created on the workload domain vDS 

from the vSphere Web client , navigate to vSphere Networking, the workload domain vCenter , and the associated VDS to inspect the edge port-groups.

Edge vNICs

Each Edge will have a similar VM networking configuration

  • Network adapter 1 is for the mgmt network connectivity (MGMT VLAN 0)
  • Network Adapter 2 is associated with the Edge Uplink (VLAN 2081)
  • Network Adapter 2 is associated with the Edge Uplink (VLAN 2082)

This configuration can be explored on the summary of the Edge Virtual Machine appliance.

Reusing an existing NSX-T manager for a new workload domain

If you already have an NSX Manager cluster for a different VI workload domain, you can reuse that NSX Manager cluster

In order to share an NSX Manager cluster, the workload domains must use the same update manager mechanism . The workload domains must both use vSphere Lifecycle Manager (vLCM) or they must both use vSphere Update Manager (VUM).

Note:- Do not share an NSX Manager cluster between workload domains catering to different use cases that would require different NSX-T Edge cluster specifications and configurations.

Please review the click through demo that complements this guide Add Workload Domain with NSX Manager Reuse

The demo first reviews an existing workload domain and then walks through deploying a new workload domain

Here is an overview that complements the demo

To quickly go through this scenario we will go through the main parts of the demo

From SDDC Manager start the deploy wizard for a new  VI - Virtual Infrastructure to deploy workload domain,.
Once new workload domain wizard is launched, add the entries for the workload domain name and new vCenter instance

Reuse existing NSX-T Manager Instance

Instead of deploying a brand new NSX-T Instance , we will Re-use the NSX-T Instance associated with an existing workload domain, in our case wld01

The VLAN ID for the preexisting NSX-T host overlay needs to be validated

All NSX-T entries are greyed out as we are using the NSX-T instance associated with wld01 which  SDDC Manager is already aware of

the following resources will be pre-fixed workload domain name wld01

  • vSphere Distrbuted Switch
  • Resource Pool
  • Distributed port-group vSAN
  • Distributed port-group vMotion
  • vSAN Datastore

Once the Workload domain has been deployed it will simply appear as a new workload domain on SDDC Manager but associated with the NSX-T instance belonging to wld01

From a vSphere perspective, a new vCenter Server is deployed , a new datacenter and cluster object is created and hosts added and configured

We can also observe the vCenter server appliance vcenter-wld02.vcf.sddc.lab is hosted on the management workload domain with no further additional NSX-T instances.

vSphere Networking comprises of a vDS and 3 port-groups for mgmt, vSAN and vMotion.

NSX-T

The vCenter Server is registered as an additional compute manager to the existing NSX-T instance (as specified on the new workload domain wizard)

The vSphere hosts are configured as Host Transport nodes associated with that vCenter

However they are added to the same transport zone as the transport Nodes in the first workload domain, wld01, i.e. overlay-tx-nsx-wld01.vcf.sddc.lab

Deploying vSphere 7.0 with Tanzu on VCF

vSphere with Tanzu Benefits

vSphere with Tanzu provides the capability to create upstream compliant Kubernetes clusters within dedicated resource pools by leveraging Tanzu Kubernetes Clusters. Another advantage of vSphere with Tanzu is the ability to run Kubernetes workloads directly on ESXi hosts (vSphere Pods).

vSphere with Tanzu brings Kubernetes awareness to vSphere and bridges the gap between IT Operations and Developers. This awareness fosters collaboration between vSphere Administrators and DevOps teams as both roles are working with the same objects. 

IT Operators continue to provision, view and monitor their virtual infrastructure as they have always done, but now with the Kubernetes awareness and insight that has eluded them in the past.

Developers can now deploy K8s and container-based workloads directly on vSphere using the same methods and tools they have always used in the public cloud. VMware vSphere with Tanzu provides flexibility as developers can choose to run pods native to ESXi (native pods) or inside purpose-build Kubernetes clusters hosted on top of namespaces configured on the vSphere clusters (Tanzu Kubernetes Clusters).

Both teams benefit by being able to use their existing tools, nobody has to change they way the work, learn new tools, or make concessions. At the same time, both teams have a consistent view and are able to manage the same objects.

Benefits of Cloud Foundation

Running vSphere with Tanzu on VMware Cloud Foundation (VCF) provides a best-in-class modern hybrid cloud platform for hosting both traditional and modern application workloads. VMware Cloud Foundation is a proven, prescriptive approach for implementing a modern VMware based private cloud. One of the key benefits of VCF is the advanced automation capabilities to deploy, configure, and manage the full VMware SDDC software stack including products such as vSphere with Tanzu, vSAN, and NSX among others. 

Enabling vSphere with Tanzu

In order to enable vSphere with Tanzu it is necessary to complete a set of tasks. vSphere with Tanzu will be deployed in a Virtual Infrastructure Workload Domain; however, there is also an option to deploy vSphere with Tanzu on a Consolidated VCF deployment (Management Domain). For more information about vSphere with Tanzu supportability on VCF Management Domain please refer to this Blog Post  and this White Paper. An NSX-T Edge Cluster will be required as well as tasks including enabling Workload Management, creating a content library, creating a namespace, deploying harbor, obtaining CLI Tools, creating guest clusters and deploying containers. 

vSphere with Tanzu Workflow

This is a workflow overview of the procedure from a two persona perspective (IT Operator and Developer).

vSphere with Tanzu Requirements

The requirements are as below; a VI workload domain needs to be created with at least three hosts, backed by an NSX-T edge cluster.

vSphere with Tanzu on Consolidated Architecture Requirements

This is a special case whereby a K8s cluster can be stood up with just four hosts in total. In order to achieve this, an NSX Edge cluster must be created for the Management domain. Application Virtual Networks (AVNs) is now supported on the management domain together with K8s. The requirements are:

  • Cloud Foundation 4.0 deployed with one vSphere cluster on the management domain
  • NSX-T configured (edge cluster (large form factor) created, hosts added. etc.) 
  • Enough capacity on the vSAN datastore for all components

NOTE: vSphere with Tanzu on consolidated architecture requires some important steps to be followed. Please refer to this document for step-by-step instructions: https://blogs.vmware.com/cloud-foundation/files/2020/05/VMW-WP-vSphr-KUBERNETES-USLET-101-WEB.pdf

See this blog post for more information: https://cormachogan.com/2020/05/26/vsphere-with-kubernetes-on-vcf-4-0-consolidated-architecture/

Creating VI Workload Domain

Creating a VI Workload Domain (VI WLD) falls in the IT Operator persona. The IT Operator will create a new VI WLD from SDDC by following the steps from the that particular POC section. However, there are a few aspects that should be taken into considerations when creating a VI WLD for vSphere with Tanzu use case.

Note that the VI WLD for Kubernetes should be created should be using VUM (as opposed to vLCM):

Requirements:

  • Minimum of 3 hosts; 4 or more hosts recommended
  • Licensed for vSphere with Tanzu
  • New NSX-T Fabric
  • VI WLD with VUM enabled (no vLCM)
  • IP subnets for pod networking, service cluster, ingress and egress defined

Click HERE for a step by step demonstration.

Deploying Edge Cluster

Deploying a NSX Edge Cluster falls in the IT Operator persona. The IT Operator will deploy a new NSX Edge Cluster from SDDC by following the steps below. After creation, the NSX Manager UI can be used to manage such cluster. 

Requirements:

  • One edge cluster per domain
  • Edge cluster type = ” Workload Management “
  • Two edge nodes 
  • Large form factor
  • Configured as active/active 
  • Configures Tier-0 and Tier-1 logical routers

From SDDC Manager, navigate to the VI Workload Domain, and click on the three vertical dots that appear when hovering on the domain name. Choose the "Add Edge Cluster":

Verify all the Prerequisites have been met and click Begin:

Enter all the necessary information for the Edge Cluster. 

Important: make sure that there are no other T0 edge clusters connected for the overlay transport zone of the vSphere cluster

Ensure 'Workload Management' is set for the use-case

Add the details for the first node by filling out the information needed and clicking on 'add edge node':

After adding the first node, fill out the information for the second node and click "add edge node". Click 'next' to continue:

Double-check the values entered in the summary section and click 'next':

Click next in the validation section and then Finish after all status shows as succeeded. 

Monitor the creation of the Edge Cluster in the Task pane of SDDC Manager.

Once completed. open NSX Manager UI to verify the status of the Edge Cluster. 

Result:

Click HERE for a step by step demonstration.

Enabling vSphere with Tanzu

The IT Operator can enable vSphere with Tanzu from SDDC Manager by following the steps below.

Overview:

  • Deploys Workload Management from SDDC Manager
  • Domain and Edge Cluster Validation
  • Hand-off to vSphere Client
  • Installs Kubernetes VIBs on hosts
  • Deploys ’Supervisor’ Pods
  • Instantiates Pod Service

In SDDC Manager click on the "Solutions" section, click on "Deploy":

Verify that all the prerequisites have been met, and click "Begin":

Select the VI Workload Domain and the cluster within the VI WLD to be used for vSphere with Tanzu, then click Next. 

After Validation is Successful, click Next. 

Review the input information, then click "Complete in vSphere" to go to vCenter to add the remaining information. This button will take you directly to the appropriate location in the correct vCenter server.  

In vCenter UI, select the cluster and click Next.

Select the size for the Control Plane VMs. Click Next.

Enter the information for K8s ingress and egress, and the management network for the control pane that corresponds to the diagram below

Select the Storage where the Control Plane VMs will live. If using vSAN, you are able to select the Storage Policy.

Review all the information for accuracy and click Finish. 

Monitor for success in the task pane.

Once completed, the Supervisor Control Plane VMs will be visible under the Namespaces Resource Pool

Result:

Click HERE for a step by step demonstration.

Creating Content Library

IT Operator Persona

Before creating namespaces, the IT Operator needs to configure a content library. A subscribed or local content library needs to be created on each Supervisor Cluster. For Tanzu Kubernetes, create a content library with the subscription pointing to:

https://wp-content.vmware.com/v2/latest/lib.json

To create the Content Library, simply navigate to the Content Library section of the vSphere Client to configure the content library.

From vSphere Client, select Menu > Content Libraries > Create

Provide a Name for the Content Library, and the correct vCenter server. Click Next.

Choose "Subscribed content library" and provide the subscription URL to be used. Click Next.

You may get a certificate warning from the subscription source. Click yes if you trust the subscription host.

Select the storage to be used. Click Next. 

Then click Finish to create the Subscribed Content Library.

Result:

Creating Namespace

IT Operator Persona

vSphere with Tanzu introduces a new object in vSphere called a Namespace.

A namespace sets the resource boundaries where vSphere Pods and Tanzu Kubernetes clusters created by using the Tanzu Kubernetes Grid (TKG) Service can run. When initially created, the namespace has unlimited resources within the Supervisor Cluster. As a vSphere administrator, you can set limits for CPU, memory, storage, as well as the number of Kubernetes objects that can run within the namespace. A resource pool is created for each namespace in vSphere. Storage limitations are represented as storage quotas in Kubernetes.

To provide access to namespaces, as a vSphere administrator you assign permission to users or user groups available within an identity source that is associated with vCenter Single Sign-On.

To create namespace, navigate to Workload Management and select the Namespaces tab.

Steps:

In vCenter, navigate to Menu > Workload Management and click Create Namespace

Select the cluster where the namespace will be created and provide a name for the Namespace. Click Create

That's it

Result:

Click HERE for a step by step demonstration.

Enable Harbor Registry

IT Operator Persona

Along with the content library, we must also enable a private image registry on the Supervisor Cluster. DevOps engineers use the registry to push and pull images from the registry as well as deploy vSphere Pods by using these images. Harbor Registry stores, manages, and secures container images.

From the vSphere Cluster, navigate to Configuration and scroll down to Harbor Registry.  Simply click the link to enable the harbor registry.

Click on the vSphere with Tanzu enabled Cluster, select Configure. Under Namespaces, select Image Registry

 

Click Enable Harbor and select the Storage for the Image Registry. The new Harbor Registry will be visible under Namespaces

Result:

Click HERE for a step by step demonstration.

Kubernetes CLI Tools

Developer Persona

The previous steps from this section of the POC Guide has allowed for a successful deployment and configuration of vSphere with Tanzu. The previous steps were conducted by an IT Operator; however, this step involves the developer side of tasks to complete in order to utilize the deployed environment. 

The namespace has already been created and it is ready to be passed on to the developer by simply providing the name of the namespace along with the Kubernetes Control Plane IP address.

The developer will be able to access the Control Plane IP address to download the vSphere CLI plugin along with the Docker Credential Helper. This plugin allows the developer to login to the Kubernetes environment and to deploy and manage workloads. 

The link to the CLI Tools can be obtained from the vSphere Client by clicking on the namespace previously created. The link can be copied and provided to the developer or can be opened from the UI. 

Select the operating system being used and follow the steps provided to install the kubectl and kubectl-vsphere commands

You can open a terminal window from this location to execute the commands

Deploying Tanzu Kubernetes Cluster (TKG)

Developer Persona

Developers will usually start by deploying a Tanzu Kubernetes (TKG cluster). A Tanzu Kubernetes Cluster is a full distribution of the open-source Kubernetes that is easily provisioned and managed using the Tanzu Kubernetes Grid Service.  Note that TKGs provides an “opinionated” implementation of Kubernetes optimized for vSphere and supported by VMware.  

Note that there are two Kubernetes environments.  The Pod Service, which hosts “native pods” and the TKC cluster with the vSphere optimized Kubernetes pods.

Use the kubectl-vsphere binary downloaded in the previous step to login to the supervisor cluster, e.g.

kubectl-vsphere login --server <supervisor-cluster IP> --insecure-skip-tls-verify
Username: administrator@vsphere.local
Password:
Logged in successfully.


You have access to the following contexts:
   172.16.69.1
   mgmt-cluster
   ns01

If the context you wish to use is not in this list, you may need to try
logging in again later, or contact your cluster administrator.

To change context, use `kubectl config use-context <workload name>`

Here we see that the namespaces 'mgmt-cluster' and 'ns01' are available.

We can see the list of nodes, etc. using the standard K8s commands, e.g.

$ kubectl get nodes
NAME                               STATUS   ROLES    AGE     VERSION
421923bfdd5501a22ba2568827f1a954   Ready    master   4d21h   v1.16.7-2+bfe512e5ddaaaa
4219520b7190d95cd347337e37d5b647   Ready    master   4d21h   v1.16.7-2+bfe512e5ddaaaa
4219e077b6f728851843a55b83fda918   Ready    master   4d21h   v1.16.7-2+bfe512e5ddaaaa
dbvcfesx01.vsanpe.vmware.com       Ready    agent    4d20h   v1.16.7-sph-4d52cd1
dbvcfesx02.vsanpe.vmware.com       Ready    agent    4d20h   v1.16.7-sph-4d52cd1
dbvcfesx03.vsanpe.vmware.com       Ready    agent    4d20h   v1.16.7-sph-4d52cd1
dbvcfesx04.vsanpe.vmware.com       Ready    agent    4d20h   v1.16.7-sph-4d52cd1

Here we see our three K8 master VMs and the four ESXi servers as agents. At the time of writing, the supervisor cluster runs K8s version 1.16.7

To get a list of contexts, we can run the following:

$ kubectl config get-contexts
CURRENT   NAME           CLUSTER       AUTHINFO                                      NAMESPACE
*         172.16.69.1    172.16.69.1   wcp:172.16.69.1:administrator@vsphere.local
          mgmt-cluster   172.16.69.1   wcp:172.16.69.1:administrator@vsphere.local   mgmt-cluster
          ns01           172.16.69.1   wcp:172.16.69.1:administrator@vsphere.local   ns01

Switch to the appropriate context. In this case, 'tkg-guest':

$ kubectl config use-context ns01
Switched to context "ns01".

We can see the storage classes by using the following command - in this case we are using vSAN so we can see the default SPBM policy mapped to the storage class:

$  kubectl get sc
NAME                          PROVISIONER              AGE
vsan-default-storage-policy   csi.vsphere.vmware.com   4d21h

Ensure that we have access to the Tanzu VM images (configured as the subscribed content library previously):

$ kubectl get virtualmachineimages
NAME                                                        AGE
ob-15957779-photon-3-k8s-v1.16.8---vmware.1-tkg.3.60d2ffd   31s

Next, we construct a manifest to create the TKG guest cluster - for more details on the various parameters, see https://docs.vmware.com/en/VMware-vSphere/7.0/vmware-vsphere-with-kubernetes/GUID-360B0288-1D24-4698-A9A0-5C5217C0BCCF.html

Create a new file, e.g.

vi tanzu-deploy.yaml

First, we need the api-endpoint. At the time of writing it is:

apiVersion: run.tanzu.vmware.com/v1alpha1      #TKG API endpoint

Next, we set the 'kind' parameter correctly:

kind: TanzuKubernetesCluster                   #required parameter

And then set the name and namespace:

metadata:
   name: tkgcluster1
   namespace: tkg-guest

The and finally topology section. Here we set the K8s version to v1.16:

spec:
   distribution:
      version: v1.16

Then we set the topology; first the controlPane:

topology:
        controlPane:
            count: 1

Next, we define the VM class for the Tanzu supervisor cluster. We can see the available classes by using the command:

$ kubectl get virtualmachineclasses
NAME                 AGE
best-effort-large    4d21h
best-effort-medium   4d21h
best-effort-small    4d21h
best-effort-xlarge   4d21h
best-effort-xsmall   4d21h
guaranteed-large     4d21h
guaranteed-medium    4d21h
guaranteed-small     4d21h
guaranteed-xlarge    4d21h
guaranteed-xsmall    4d21h

The recommended class is 'guaranteed-small', thus:

 class: guaranteed-small

Finally, we define the storage class:

  storageClass: vsan-default-storage-policy

Then we define the topology for the worker nodes. We create three workers using the same settings as above, 

workers:
          count: 3
          class: guaranteed-small
          storageClass: vsan-default-storage-policy

Putting it all together, we have:

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
   name: tkgcluster1
   namespace: tkg-guest
spec:
  distribution:
   version: v1.16
  topology:
   controlPlane:
     count: 1
     class: guaranteed-small
     storageClass: vsan-default-storage-policy
   workers:
     count: 3
     class: guaranteed-small
     storageClass: vsan-default-storage-policy

We can then apply this manifest to create the deployment:

$ kubectl apply -f tanzu-deploy.yaml

To monitor we can use the following commands:

$ kubectl get tkc
NAME          CONTROL PLANE   WORKER   DISTRIBUTION                     AGE    PHASE
tkgcluster1   1               3        v1.16.8+vmware.1-tkg.3.60d2ffd   3m7s   creating
$ kubectl describe tkc
Name:         tkgcluster1
Namespace:    tkg-guest
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"run.tanzu.vmware.com/v1alpha1","kind":"TanzuKubernetesCluster","metadata":{"annotations":{},"name":"tkgcluster1","namespace...
API Version:  run.tanzu.vmware.com/v1alpha1
Kind:         TanzuKubernetesCluster
Metadata:
  Creation Timestamp:  2020-06-01T15:54:52Z
  Finalizers:
    tanzukubernetescluster.run.tanzu.vmware.com
  Generation:        1
  Resource Version:  2569051
  Self Link:         /apis/run.tanzu.vmware.com/v1alpha1/namespaces/tkg-guest/tanzukubernetesclusters/tkgcluster1
  UID:               51408c8e-9096-4139-b52d-ff7e74547e39
Spec:
  Distribution:
    Full Version:  v1.16.8+vmware.1-tkg.3.60d2ffd
    Version:       v1.16
  Settings:
    Network:
      Cni:
        Name:  calico
      Pods:
        Cidr Blocks:
          192.168.0.0/16
      Service Domain:  cluster.local
      Services:
        Cidr Blocks:
          10.96.0.0/12
  Topology:
    Control Plane:
      Class:          guaranteed-small
      Count:          1
      Storage Class:  vsan-default-storage-policy
    Workers:
      Class:          guaranteed-small
      Count:          3
      Storage Class:  vsan-default-storage-policy
Status:
  Addons:
    Cloudprovider:
      Name:
      Status:  pending
    Cni:
      Name:
      Status:  pending
    Csi:
      Name:
      Status:  pending
    Dns:
      Name:
      Status:  pending
    Proxy:
      Name:
      Status:  pending
    Psp:
      Name:
      Status:  pending
  Cluster API Status:
    Phase:  provisioning
  Node Status:
    tkgcluster1-control-plane-tdl5z:             pending
    tkgcluster1-workers-lkp87-7d9df77586-9lzdj:  pending
    tkgcluster1-workers-lkp87-7d9df77586-kkjmt:  pending
    tkgcluster1-workers-lkp87-7d9df77586-vqr6g:  pending
  Phase:                                         creating
  Vm Status:
    tkgcluster1-control-plane-tdl5z:             pending
    tkgcluster1-workers-lkp87-7d9df77586-9lzdj:  pending
    tkgcluster1-workers-lkp87-7d9df77586-kkjmt:  pending
    tkgcluster1-workers-lkp87-7d9df77586-vqr6g:  pending
Events:                                          <none>

Under the namespace, the TKC cluster will now be visible

Navigating to the namespace in vCenter (Menu > Workload Management > ns01 > Tanzu Kubernetes) shows the newly created tkg cluster

Result:

Click HERE for a step by step demonstration.

Deploying Containers in TKG

Developer Persona

Once the Tanzu Kubernetes Cluster has been deployed, the developer will manage it just like any other Kubernetes instance. All the Kubernetes and vSphere features and capabilities are available to the developer.

We can now login to the TKG cluster using the following command:

kubectl-vsphere login --server=<ip> --insecure-skip-tls-verify --tanzu-kubernetes-cluster-namespace=<namespace> --tanzu-kubernetes-cluster-name=<tkg cluster>

In our case,

$ kubectl-vsphere login --server=https://152.17.31.129 --insecure-skip-tls-verify --tanzu-kubernetes-cluster-namespace=ns01 --tanzu-kubernetes-cluster-name=tkgcluster1


Username: administrator@vsphere.local
Password:
Logged in successfully.


You have access to the following contexts:
   152.17.31.129
   ns01
   tkgcluster1


If the context you wish to use is not in this list, you may need to try
logging in again later, or contact your cluster administrator.

To change context, use `kubectl config use-context <workload name>`

We can see that the context 'tkgcluster1', i.e. our TKG cluster is now available.

To switch to this context, we issue the command:

$ kubectl config use-context tkgcluster1

Now we can issue our usual K8s commands on this context. To see our TKG nodes, we issue the command:

$ kubectl get nodes
NAME                                        STATUS   ROLES    AGE   VERSION
tkgcluster1-control-plane-g5mgc             Ready    master   76m   v1.16.8+vmware.1
tkgcluster1-workers-swqxc-c86bf7684-5jdth   Ready    <none>   72m   v1.16.8+vmware.1
tkgcluster1-workers-swqxc-c86bf7684-7hfdc   Ready    <none>   72m   v1.16.8+vmware.1
tkgcluster1-workers-swqxc-c86bf7684-g6vks   Ready    <none>   72m   v1.16.8+vmware.1

At this point the developer can deploy application workloads to Tanzu Kubernetes clusters using Kubernetes constructs such as pods, services, persistent volumes, stateful sets, and deployments.

Stretching VCF Management and Workload Domains

Stretched Cluster Prerequisites

The process of stretching Cloud Foundation workload domains initiates a vSAN stretched cluster task. Rather than running this task within a managed vSAN cluster, this process is initiated by SDDC Manager, allowing SDDC Manager’s aware of this topology change. The same prerequisites that apply to vSAN stretched clusters also apply to Cloud Foundation stretched clusters.

Stretching Cloud Foundation workload domains allows for the extension of a domain across two availability zones (AZ) running on distinct physical infrastructure. Although there is no distance limitation, key requirements include:

  • Latency below 5ms round trip time (RTT) between each availability zone
  • At least 10Gbps of bandwidth between availability zones

Additionally, prior to stretching a cluster in a VI workload domain, the management domain cluster must be stretched first. vCenter Servers for all workload domains are hosted within the management domain. Hence, the management domain must be stretched to protect against availability zone failure, ensuring that supporting SDDC components may continue to manage the workload domain

Each stretched cluster requires a vSAN witness appliance in a third location. The witness should not share infrastructure dependencies with either availability zone; deployment of the witness to either availability zone it is associated with is not supported. The maximum latency between the vSAN witness appliance and the vSAN hosts is 200ms round trip time (RTT). This appliance is currently not part of SDDC Manager workflows; it should be deployed manually, and upgraded separately from SDDC LCM process. TCP and UDP ports must be permitted for witness traffic between the witness host and the vSAN cluster data nodes; see KB article 52959.

An in-depth list of requirements may be found on the “Deployment for Multiple Availability Zones” document; please review this document prior to any attempt to stretch Cloud Foundation workload domains.

Each AZ must have an equal number of hosts in order to ensure sufficient resources are available in case of an availability zone outage.

License Verification

Prior to stretching VCF workload domains, please verify that licenses are not expired and that the correct license type for each product is entered within SDDC Manager.

vSAN Licensing

Stretching a workload domain in VCF requires that vSAN Enterprise or Enterprise Plus licensing is present within SDDC Manager in order to stretch vSAN clusters. Refer to KB 70328 for information about a known licensing issue.

 

VLAN Configuration Requirements

The management VLAN, vSAN VLAN, and vMotion VLAN must be stretched between each availability zone. VLAN IDs must be identical at each availability zone.

Availability Zone Network Configurations

Each availability zone must have its own vSAN, vMotion and VXLAN VLAN networks.

Any VMs on an external network must be on an NSX virtual wire. If they are on a separate VLAN, that VLAN must be stretched as well.

L3 Routing for vSAN

vSAN Witness management and vSAN Witness traffic may utilize Layer 3 networks. Additional configuration may be required such as Witness Traffic Separation (WTS) and well as static routing. Please consult storagehub.vmware.com for further details.

Stretching Workload Domains

The Management workload domain must be stretched prior to stretching any VI workload domains. The vCenter servers for each workload domain are placed within the management domain cluster. Therefore, the management domain must be protected against availability zone failures to ensure management of the workload domains remains available.

After the Management workload domain has been successfully stretched, it is possible to apply stretched cluster configurations to other VI workload domains that are managed by the Cloud Foundation instance. The process of stretching VI workload domains is the same as the process that was previously used to stretch the Management workload domain.

Network Pool Creation

Prior to stretching the management domain, a network pool must be created for vMotion and storage networks.

The subnet in a network pool cannot overlap the subnet of another pool. IP ranges cannot be edited after the network pool has been created, so please ensure the correct IP address range is entered.

To create the network pool:

  • From SDDC Manager Dashboard, click Administration > Network Settings
  • Click ‘+ Create Network Pool’

  • Enter a name for the network pool
  • Select the storage network type
  • Provide the following information for vMotion and the selected storage network type
    1. VLAN ID between 1-4094
    2. MTU between 1500-9216  N.B. Make sure any physical switch traffic overhead is accounted for
  • In the Network field, enter a subnet IP address
  • Enter the subnet mask
  • Enter the default gateway
  • Enter an IP address range for hosts to be associated with this network pool

Commission Hosts

Hosts are added to the Cloud Foundation inventory via the commissioning workflow. Hosts may be added individually or use a JSON template to add multiple hosts at once. For additional details and requirements, refer to section 4.1.1 of the VCF Admin Guide document.

In order to stretch the VCF management domain, hosts equivalent in number to those presently in the management domain cluster must be commissioned. These hosts will be used to construct the second availability zone (AZ2). 

Associate Hosts to Network Pool

During the commissioning process, the network pool previously created for AZ2 must be associated with the hosts being provisioned for the stretched management domain cluster in AZ2.

Verify Host Health

Verify that all hosts commissioned are free of errors and are healthy prior to stretching the management domain.

Deploy vSAN Witness

Deploying the vSAN witness is a critical dependency supporting stretched management domains. The witness host may be a physical ESXi host, or the VMware-provided virtual witness appliance may be used (preferred). Please refer to vSAN witness information in StorageHub for further details.

The vSAN witness host/appliance must be located in a third location outside of either availability zone it is associated with.  Wherever the witness host/appliance is located, it should not share infrastructure dependencies with either availability zone. Due to its relatively relaxed latency requirement of 200ms RTT, the witness may even be hosted in the cloud. Witness traffic may utilize either Layer 2 or Layer 3 connectivity. Note that witness traffic is not encrypted, as it only contains non-sensitive metadata.

It is important to highlight that as of the VCF 4.0 release, witness deployment and lifecycle management are currently not part of any SDDC manager workflows. Therefore, the witness host/appliance must be deployed and upgraded independently from any SDDC Manager automation or management.

Please refer to StorageHub for detailed instructions for deployment of the witness appliance.

SDDC Manager Configuration

In VCF 4.0 the stretch cluster operation is completed using the API in the SDDC Manager Developer Center. To perform the stretch cluster operation complete the following tasks. 

Retrieve the IDs of the hosts in the second network. Host IDs are retrieved by completing the following steps. 

On the SDDC Manager Dashboard, click Developer Center | API Explorer

  1. Under the APIs for managing Hosts, click GET /v1/hosts
  2. Click Execute to fetch the hosts information. 
  3. Click Download to download the JSON file. 
  4. Retrieve the Hosts IDS from the JSON file for hosts.

Retrieve the Cluster ID

  1. On the SDDC Manager Dashboard, click Developer Center | API Explorer
  2. Under APIs for managing Clusters, click GET /v1/clusters. 
  3. Click Execute to get the JSON file for the cluster information. 
  4. Click Download to download the JSON file. 
  5. Retrieve the Cluster ID from the JSON file.

Prepare the JSON file to trigger stretch cluster validation

  1. On the SDDC Manager Dashboard page, click Developer Center | API Explorer 
  2. Under APIs for managing Clusters, click POST /v1/clusters/{ID}/validations
  3. Under the clusterUpdateSpec, click Cluster Update Data ClusterOperationSpecValidation{…} 
  4. Update the downloaded update JSON file to keep only stretch related information. Below is an example of the update JSON file. 
{ 
“clusterUpdateSpec”: {     
    "clusterStretchSpec": { 
        "hostSpecs": [ { 
            "id": "2c1744dc-6cb1-4225-9195-5cbd2b893be6", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"        },  { 
            "id": "6b38c2ea-0429-4c04-8d2d-40a1e3559714", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"         }, { 
            "id": "5b704db6-27f2-4c87-839d-95f6f84e2fd0", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"         }, { 
            "id": "5333f34f-f41a-44e4-ac5d-8568485ab241", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX" 
        } ], 
        "secondaryAzOverlayVlanId": 1624, 
        "witnessSpec": { 
            "fqdn": "sfo03m01vsanw01.sfo.rainpole.local", 
            "vsanCidr": "172.17.13.0/24", 
            "vsanIp": "172.17.13.201" 
        } 
    } 
  } }

Execute the validate stretch cluster API

  1. From the API explorer under APIs for managing Clusters, select POST /v1/clusters/{id}/validations. 
  2. Update the Cluster UID on the ID (required) and Host UID JSON file on the ClusterOperationSpecValidation fields. 
  3. Click Execute to execute the Stretch Cluster Workflow 
  4. You will see the Validation result in the Response area. 
  5. Make sure the validation result is successful, if unsuccessful, correct any errors and retry. 

Prepare the JSON payload to trigger stretch cluster API

  1. Under APIs for managing Clusters, click Patch /v1/clusters/{id}
  2. Under clusterUpdateSpec, click on Cluster Update Data ClusterUpdateSpec{…} 
  3. Click the Download arrow to download the JSON file. 
  4. Update the Downloaded Patch update JSON file to keep only stretch cluster related information. Below is an example.
{
    "clusterStretchSpec": { 
        "hostSpecs": [ { 
            "id": "2c1744dc-6cb1-4225-9195-5cbd2b893be6", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"         }, { 
            "id": "6b38c2ea-0429-4c04-8d2d-40a1e3559714", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"         }, { 
            "id": "5b704db6-27f2-4c87-839d-95f6f84e2fd0", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"         }, { 
            "id": "5333f34f-f41a-44e4-ac5d-8568485ab241", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX" 
        } ], 
        "secondaryAzOverlayVlanId": 1624, 
        "witnessSpec": { 
            "fqdn": "sfo03m01vsanw01.sfo.rainpole.local", 
            "vsanCidr": "172.17.13.0/24", 
            "vsanIp": "172.17.13.201" 
        } 
    } 

Execute the Validate Stretch Cluster API

  1. On the SDDC Manager Dashboard page, click Developer Center > API Explorer
  2. UnderAPIs for managing ClustersPOST /v1/clusters/{id}/validations.
  3. Update the Cluster UID on id(required) and Host UID JSON file on ClusterOperationSpecValidation fields.
  4. Click Execute, to execute the Stretch Cluster Workflow. 
  5. You should see the Validation result in the Response area.
  6. Make sure that the validation result is successful, if not, correct the errors and retry.

Prepare the JSON payload to rigger stretch cluster API

  1. On the SDDC Manager Dashboard, click Developer Center > API Explorer
  2. Under APIs for managing Clusters ,click Patch /v1/clusters/{id}
  3. Under clusterUpdateSpec , click on Cluster Update Data ClusterUpdateSpec{ ... } 
  4. Click Download arrow icon, to download the Json file.
  5. Update the Downloaded Patch update Json file to keep only stretch related information, similar to the below sample (replace the actual host id/vSphere license keys)
{ 
    "clusterStretchSpec": { 
        "hostSpecs": [ { 
            "id": "2c1744dc-6cb1-4225-9195-5cbd2b893be6", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"         }, { 
            "id": "6b38c2ea-0429-4c04-8d2d-40a1e3559714", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"         }, { 
            "id": "5b704db6-27f2-4c87-839d-95f6f84e2fd0", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX"        },  { 
            "id": "5333f34f-f41a-44e4-ac5d-8568485ab241", 
            "licenseKey": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXX" 
        } ], 
        "secondaryAzOverlayVlanId": 1624, 
        "witnessSpec": { 
            "fqdn": "sfo03m01vsanw01.sfo.rainpole.local", 
            "vsanCidr": "172.17.13.0/24", 
            "vsanIp": "172.17.13.201" 
        } 
    } 
}

Execute Stretch Cluster API

  1. On the SDDC Manager Dashboard, click Developer Center > API Explorer
  2. UnderAPIs for managing Clusters ,click Patch /v1/clusters/{id}.
  3. Update the Cluster UID on id(required) and Host UID JSON file on ClusterUpdateSpec fields.
  4. ClickExecute, to execute the Stretch Cluster Workflow. 
  5. You should see the task created in SDDC manager UI.

Check vSAN Health

While the cluster is being stretched, monitor the state of the task from the SDDC Manager Dashboard. When the task completes successfully, check the health of the vSAN cluster and validate that stretched cluster operations are working correctly by logging in to the vCenter UI associated with the workload domain.

To check the vSAN Health page: 

  • On the home page, click Host and Clusters and then select the stretched cluster. 
  • Click Monitor > vSAN > Health 
  • Click Retest 
  • Troubleshoot any warnings or errors

Refresh vSAN Storage Policies and Check Compliance

It is imperative to check the vSAN storage policy compliance to ensure all objects achieve a state of compliance.

To check vSAN storage policies:

  • On the vCenter home page, click Policies and Profiles > VM Storage Policies > vSANDefault Storage Policy 
  • Select the policy associated with the vCenter Server for the stretched cluster
  • Click Monitor > VMs and Virtual Disks
  • Click Refresh 
  • Click Trigger VM storage policy compliance check
  • Check the Compliance Status column for each VM component 
  • Troubleshoot any warnings or errors

Lifecycle Management (LCM)

Lifecycle Management Overview

The Lifecycle Management (LCM) feature of VMware Cloud Foundation enables automatic updating of both the Cloud Foundation software components (SDDC Manager, HMS, and LCM) as well as the VMware SDDC Components such as vCenter Server, ESXi, vSAN and NSX.

Lifecycle Management in SDDC Manager may be applied to the entire infrastructure or to a specific workload domain. The process is designed to be non-disruptive to tenant virtual machines. As new software updates become available, the SDDC Manager provides notifications to VCF Administrators, who may review the update details and, at a time convenient to them, download and schedule the updates.

This module demonstrates usage of the Cloud Foundation Lifecycle Management feature to upgrade from VMware Cloud Foundation 4.0.

Bundle Types

Cloud Foundation utilizes two types of bundles for Lifecycle Management: Upgrade Bundles, and Install Bundles.

Upgrade Bundles

An upgrade bundle contains patches and other software necessary to update VCF software components. In most cases, an upgrade bundle must be applied to the management domain before it may be applied to workload domains.

Some upgrade bundles are cumulative bundles. In cases where a workload domain is multiple versions behind the target version, cumulative bundles allow Cloud Foundation to directly upgrade to the target version (rather than requiring the installation of multiple bundles in a sequential progression). Cumulative bundles are only available for vCenter Server and ESXi.

Install Bundles

Install bundles contain software necessary to deploy new instances of Cloud Foundation components. For instance, VI workload domain install bundles are used to deploy more recent versions of the software components that were not present in the initial Cloud Foundation BOM; these install bundles include software for vCenter Server and NSX-T Data Center.

Downloading Bundles

If SDDC Manager is configured with 'My VMware' credentials, Lifecycle Management automatically polls the VMware software depot to access software bundles. SDDC Manager will prompt administrators when a bundle is available and ready for download.

If SDDC Manager does not have Internet connectivity, software bundles may either be acquired via HTTP(S) proxy, or through a manual download and transfer process.

This guide demonstrates procedures for automatically downloading bundles, and manually downloading and transferring bundles. For the procedure to download bundles with a proxy server, please refer to the VMware Cloud Foundation Upgrade Guide.

Configure Credentials

Login to SDDC Manager. 

On the left navigation pane navigate to Administration > Repository Settings. From the My VMware Account Authentication wizard, enter valid My VMware credentials.

Once My VMware credentials are validated the Repository settings will display as ‘Active’.

In some environments, it may be necessary to configure SDDC Manager to utilize a HTTP(S) proxy.

Download Bundles

After registering My VMware credentials, navigate to Repository > Bundle Management.

Locate and click ‘Schedule for Download’ or ‘Download Now’ to obtain the VMware Software Install Bundle - vRealize Suite Lifecycle Manager.

Other bundles may be downloaded as well, but the above bundle is required for configuration steps that follow.

Deploy vRealize Lifecycle Manager

After bundles have been downloaded successfully, Navigate to Administration   > vRealize Suite. 

Select ‘Deploy’ from the landing page. The install wizard will launch with vRealize Suite Lifecycle Manager Installation Prerequisites.

Select the ‘Select All’ option, then and click ‘Begin’.

DNS, reverse DNS, and NTP should already be set. Click ‘Next’:

Note: If AVN was selected during bring up, vRealize Suite Lifecycle Manager will be deployed in the xregion-segment (xRegion network). If not, this should be set manually after bring up or follow instructions on KB78608 to use a VLAN backed network.

Add FQDN of the appliance, the system password and root appliance password:

Click ‘Next’ 

Review and finish to deploy appliance.

Monitor Deployment Progress

vRealize Suite Deployment may be monitored from the Dashboard and Tasks views in SDDC Manager:

Please wait for deployment to complete before proceeding.

Validate Deployment

Once vRealize Lifecycle Manager  has been successfully deployed, connect to vRealize Lifecycle Manager by following the blue hyperlink.  

Login using the System Admin login -  admin@localhost:

From vRealize Suite Lifecycle Manager, the entire vRealize Suite can be managed and updated.
 

Installing Upgrade Bundles

Completing an upgrade of all the components of an SDDC without Cloud Foundation requires careful planning and execution. Cloud Foundation’s ability to orchestrate a non-disruptive upgrade of SDDC components is a key benefit of the Cloud Foundation platform.

When new updates or software packages are available, a notification will appear in the SDDC Manager interface:

 

Clicking on the ‘View Updates’ navigates to the Lifecycle Management interface. The Lifecycle Management interface will display available bundles. 

SDDC Manager will automatically complete all the tasks required to install this update. When the first update in this series of updates are completed successfully, the other updates may be completed using the same steps until components are updated to the latest version.

Composable Infrastructure (Redfish API) Integration

Composable Infrastructure (Redfish API) Integration Overview

Beginning with version 3.8, Cloud Foundation supports integration with software-defined Composable Infrastructure, allowing for dynamic composition and decomposition of physical system resources via SDDC Manager. This integration currently supports HPE Synergy and Dell MX Composable Infrastructure solutions.  This integration leverages each platform’s Redfish API.

HPE Synergy Integration

To enable infrastructure composability features, deploy the HPE OneView Connector server.

Procedure:

  • Deploy Linux server (physical or VM)
  • Install HPE OneView connector for VCF on the Linux server
  • Complete bring-up SDDC Manager if not already done
  • Increase queue capacity for the thread pool
    • Connect to SDDC Manager via SSH using the vcf account
    • Escalate to root privileges with su
    • Open the file application-prod.properties:
vi /opt/vmware/vcf/operationsmanager/config/application-prod.properties
  • Update the queue capacity line:
om.executor.queuecapacity=300
  • Save and close the file
  • If using a self-signed certificate, import the Redfish certificate from the OneView Connector server:
    • SSH in to SDDC Manager using the vcf account
    • Enter su to escalate to root
    • Import certificate from Redfishto SDDC Manager:
/opt/vmware/vcf/commonsvcs/scripts/cert-fetch-import-refresh.sh --ip=<redfish-ip> --port=<SSL/TLS port> --service-restart=operationsmanager
  • Restart the SDDC Operations Manager service:
systemctl restart operationsmanager

Wait a few minutes for the service to restart, then connect Cloud Foundation to the composability translation layer:

 

Dell MX Integration

Dell MX Composable Infrastructure does not require a separate server instance to be deployed, as the Redfish API translation layer is integrated into the MX management module.

Certificates

A signed certificate is necessary in order to establish a connection with the OME Modular interface. The FQDN should be added to DNS as this is included in the certificate. Note that the certificate presented by the MX platform must have a CN that matches the FQDN of the MX management module; VCF will not connect if the default self-signed certificate (CN=localhost) is used.

The certificate CSR can be generated from the OME Modular Interface on the MX7000.

  1. Log in to the OME interface
  2. Select Application Settings from the main menu
  3. Security -> Certificates
  4. Generate a Certificate Signing Request 
  5. Then upload the certificate when it is available

Configure Translation Layer

The translation layer must be configured prior to connecting the SDDC Manager to the composable infrastructure platform.

Procedure:

  • Increase queue capacity for the thread pool
    • Connect to SDDC Manager via SSH using the vcf account
    • Escalate to root privileges with su
    • Open the file application-prod.properties:
vi /opt/vmware/vcf/operationsmanager/config/application-prod.properties
  • Update the queue capacity line:
om.executor.queuecapacity=300
  • Save and close the file
  • If using a self-signed certificate, import the Redfish certificate from the MX MSM to SDDC Manager. 
    • SSH in to SDDC Manager using the vcf account
    • Enter su to escalate to root
    • Import certificate from Redfishto SDDC Manager:
/opt/vmware/vcf/commonsvcs/scripts/cert-fetch-import-refresh.sh --ip=<MSM-ip> --port=<SSL/TLS port> --service-restart=operationsmanager
  • Restart the SDDC Operations Manager service:
systemctl restart operationsmanager
  • Wait a few minutes for the service to restart
  • From SDDC Manager, click Administration > Composable Infrastructure
  • Enter the URL for the Redfish translation layer

 

  • Enter username and password for Redfish translation layer

  • Click Connect 

Composable resources will now be visible within the VCF UI. 

Filter Tags

  • Automation
  • Lifecycle Management
  • Modern Applications
  • Networking
  • Storage
  • Upgrade
  • Advanced
  • Proof of Concept
  • Document
  • Cloud Foundation 4
  • Kubernetes
  • Management Domain
  • vSphere Lifecycle Manager (vLCM)
  • Workload Domain
  • Design
  • Deploy
  • Manage