Nested fault domain for 2 Node cluster deployments

September 21, 2021

Companies with many branch offices and remote offices are in search of a scalable and easy to maintain solution suitable for their edge deployments. Often, no trained admins are available at the edge location, thus, troubleshooting, replacements, hardware, and software upgrades might take longer than customers can afford. 2 Node clusters require two ESXi nodes per cluster located at the remote office and one witness host/appliance, deployed at the main data center. This affordable configuration can maintain data availability even if one of the hosts in the cluster becomes unavailable. vSAN 7 U3 provides even higher resiliency by introducing the nested fault domain feature for 2 Node clusters to help ROBO businesses have broader control in case of more than one failure.

nested fault domains 2-node cluster

This new feature is built on the concept of fault domains, where each host or a group of hosts can store redundantly VM object replicas. In a 2 Node cluster configuration, fault domains can be created on a per disk-group level, enabling disk-group based data replication. Meaning, each of the two data nodes can host multiple object replicas. Thanks to that secondary level of resilience the 2 Node cluster can ensure data availability in the event of more than one device failure. For instance, one host failure and an additional device or disk group failure, will not impact the data availability of the VMs having a nested fault domain policy applied. The vSAN demo below shows the object replication process across disks groups and across hosts.

A minimum of 3 disk groups per host will be required to serve the nested level of fault tolerance. For example, in case we have failures to tolerate “FTT = 1 Failure – RAID- 1”, vSAN will need at least one disk group per each of the two data replicas, and one disk group for the witness component.  A new type of SPBM policy - “Host mirroring - 2 node cluster” is created to enable the replication inside a single host in a 2 Node cluster. The principles of this data placement logic are similar to the ones used for vSAN stretched clusters.  The primary level of resilience in stretched clusters is on a per-site level, while in a 2 Node cluster it is on a per-host level.  With this new nested fault domain feature, a secondary level of protection for 2 Node clusters is also available like the one for stretched clusters, but here it is on a per-disk group level, instead on a per-host level as for stretched clusters.

Nested fault domain policy

A few things need to be highlighted here, like the fact that RAID-6 is not supported since the maximum number of disk groups that can be created is 5, and 6 is the minimum required to apply RAID-6. If RAID-0 has been initially applied, a secondary level of resilience will not be supported. An efficient way for the admin to balance resiliency and performance is to apply different policies depending on the needs of the corresponding VMs or VM objects.

Summary

The Nested fault domain for 2 Node clusters is a software-based feature, and it is easy to manage using the well-known SPBM management pane. Just a simple policy change can ensure a higher level of resilience for the data stored at the edge, resulting in less time for troubleshooting and cost savings for companies with many remote offices and branch offices. 

Associated Content

From the action bar MORE button.

Filter Tags

Blog