November 24, 2021

Additional level of resilience by vSAN Stretched Clusters for VMware Cloud on AWS

Stretched cluster SDDC powered by VMC on AWS offers a great availability strategy. Customers can choose to use that extra layer of resilience to satisfy their need for protection against unexpected host-level failures or an entire AZ failure within an Amazon region.

Many companies are choosing VMC on AWS for their application workloads because it offers a simple, secure, and cost-effective transition to modern cloud-based infrastructure. With only a few clicks, admins can deploy and run their modern composite applications on a single, easy-to-manage platform. Additionally, VMC on AWS ensures a higher level of resilience and higher protection against ransomware attacks.

VMC on AWS Infrastructure

How does VMC on AWS achieve а higher level of resilience? To answer this question, we should first examine Amazon’s infrastructure. Amazon’s global infrastructure is broken up into regions.  Each Region supports the services for a given geography.  Within each Region, Amazon builds isolated and redundant islands of infrastructure called Availability Zones (AZ). Availability Zone is a concept similar to the fault domain concept within vSAN.  When VMware deploys a vSphere Cluster as part of the VMware Cloud on AWS managed service, all hosts for a given cluster are placed into a single AZ.  As a result, VMC on AWS SDDC Clusters are vulnerable to the incredibly rare but real threat of an AZ failure.  To negate this consideration, Amazon recommends deploying a given service across multiple Availability Zones, utilizing network failover to mitigate any failures.

vSAN Stretched cluster SDDC

vSAN Stretched cluster SDDC provides that efficient way to ensure the data availability of the workloads deployed in VMC on AWS. The hosts of the Stretched cluster SDDC are evenly split between 2 AZs within an AWS Region, with a hidden “witness” host in a 3rd AZ. Both sites are redundantly storing data, and the hidden third site, where the witness resides, is used to store witness components. The role of the witness is to step in as a tiebreaker in case of a “split-brain” scenario. Having these isolated and redundant islands of infrastructure enables the Stretched cluster SDDC to survive the loss of an AZ.

SC_AWS

Stretched cluster SDDC deployment process

The Stretched cluster topology can be selected at the first step of deploying an SDDC in VMC on AWS. Important to emphasize here, is the fact that customers cannot change the deployment later, meaning that a standard deployment cannot be converted into a stretched topology and vise-versa. The customer must choose the AWS Region, the host type, the SDDC name, and the number of hosts to be deployed.

Next step is to specify the AWS account to associate the SDDC with.

The cloud admin also selects which VPC subnet should be linked to the tenant workload logical network. Exactly two Subnets should be selected to serve the two different availability zones. The first one will be the “preferred” site in vSAN, and the second subnet will be used as the “non-preferred” site.

After completing these easy steps, the SDDC Stretched cluster will be deployed and ready to be managed by the customer.

How to manage a stretched cluster on VMC on AWS

Logging into your newly deployed Multi-AZ SDDC vCenter instance will display a vSAN stretched cluster with an equal number of data nodes in each AZ, and one witness node, located outside of the cluster. The vSAN Fault domains section provides more insight on the configuration of the nodes by Fault domains/AZs. Important to highlight here is that both AZs are active, and workloads can be running in both AZs. Of course, they should not use more than 50% of the capacity of each AZ. Having this rule enabled ensures there will be enough capacity in case there’s an entire AZ failure.

Every VM deployed in a stretched cluster SDDC can be ensured against a host loss within the AZ and across the AZs since data is redundantly placed on both levels. This higher availability is achieved by configuring a “Dual site mirroring (stretched cluster)” storage policy on a per VM basis. The VM data is synchronously written on both sites, and in case of a site failure, the VM will be simply restarted on the alternate AZ by vSphere HA, assuring minimal downtime. Similarly, the cloud admin can specify how vSAN should store the data within each fault domain (AZ) using the Failures to Tolerate setting.

VMC_VM_dataplacement

Of course, not all VMs’ data needs to be replicated across sites. Therefore, customers can simply apply a site affinity rule and have this specific VM’s data available only at the AZ, where the VM is running. Customers can use the storage policy-based management to tune the VM or VMDK availability and change it dynamically when there’s a need to do so.

VMC_AWS_storage_policy

Summary

Stretched cluster SDDC powered by VMC on AWS offers a great availability strategy. Customers can choose to use that extra layer of resilience to satisfy their need for protection against unexpected host-level failures or an entire AZ failure within an Amazon region.

Resources:

  1. vSAN Stretched Cluster Guide
  2. VMware Cloud on AWS: Stretched Clusters 
  3. Additional level of resilience by vSAN Stretched Clusters for VMware Cloud on AWS

Filter Tags

VMware Cloud on AWS vSAN vSAN Stretched Cluster Blog