Nested Fault Domains in vSAN
Nested Fault Domains
Introduction
The majority of workloads require resilience to drive and host failure in an HCI cluster. HCI powered by vSAN naturally addresses this requirement with a number of options for fault tolerance:
- RAID-1 mirroring that tolerates one, two, or three drive or host failures
- RAID-5 erasure coding that tolerates one drive or host failure
- RAID-6 erasure coding that tolerates two drive or host failures
The Fault Tolerance Method (mirroring or erasure coding) and the number of Failures to Tolerate or “FTT” rules in a storage policy determine how data is distributed across hosts to achieve compliance with the storage policy. For example, a virtual disk with a RAID-1 mirroring, FTT=1 storage policy assigned will have two mirrored copies of the virtual disk. The copies reside on separate hosts to allow the data to remain accessible if one drive or one host fails.
A vSAN stretched cluster architecture takes this concept to the next level. Stretched clusters provide redundancy across sites and, optionally, within each site of the stretched cluster. As an example, a virtual disk (VMDK) is assigned a storage policy that mirrors (RAID-1) the virtual disk across sites and protects each copy of the virtual disk within a site using erasure coding (RAID-5). This is sometimes referred to as multi-level replication or “nested fault domains.”
Diagram 1. vSAN stretched cluster
This combination of vSAN stretched cluster architecture and storage policy options enables a higher level of resilience against failure within a site (drive and host failure), as well as, an entire site failure.
Problem
Standard vSAN clusters provide a single level of replication across individual hosts or groups of hosts called vSAN fault domains. It is common for administrators to install vSAN hosts across server racks and configure the group of hosts in each rack as fault domain. vSAN distributes components that make up objects such as virtual disks across racks (fault domains) to enable resilience against rack failure. However, storage policies in standard vSAN clusters do not provide an option to enable additional redundancy within a fault domain. This is sometimes referred to a single-level replication.
A vSAN stretched cluster is capable of nested fault domains. However, vSAN stretched clusters are limited to three fault domains – two sites that each consist of multiple hosts for running virtual machine workloads and a third site (fault domain) that runs the vSAN Witness Host. There is no option with stretched clusters to configure more than two fault domains for running workloads.
Many organizations would benefit from higher levels of resilience offered by multi-level replication without implementing a stretched cluster configuration. This brings up the need for nested fault domains in standard (non-stretched) vSAN clusters.
Solution
IMPORTANT: The following solution is a new vSAN feature that is only supported through a Request for Product Qualification (RPQ) process. It should not be used in a production setting until this RPQ process is completed. Contact your VMware account team for details on completing an RPQ.
vSAN 6.7 Update 1 introduces the ability to create nested fault domains in a standard vSAN cluster. This enables vSAN objects to achieve higher levels of resilience against a variety of failure scenarios. The following diagram show one possible scenario. There are three server racks each containing three vSAN hosts. A vSAN fault domain is configured for each rack. Nested fault domains would enable a vSAN object such as a virtual disk to remain accessible when there are simultaneous failures such as loss of power to an entire rack (fault domain) and host failure in another rack.
Diagram 2. Rack failure and host failure
A closer look at how vSAN creates and distributes components across fault domains and hosts within a fault domain shows how this resilience is achieved. Building on the example above, the next diagram shows one possible component distribution for a 100GB virtual disk when it is protected by a storage policy where the Fault Domain Failures to Tolerate (FDFTT)=1, Failures to Tolerate (FTT)=1 for hosts within each fault domain, and RAID-1 mirroring is used for both levels. The virtual disk object has two mirrored copies at the fault domain level. Each of the copies at the fault domain level have two copies at the host level. Witness components are distributed across other hosts in the cluster to achieve quorum, as needed.
Diagram 3. Nest fault domain (FD) component placement
Component count and placement can vary from the scenario shown above. As an example, objects larger than 255GB are mirrored at the fault domain level and then striped (RAID-0) and mirrored at the host level. This is because the maximum vSAN component size is 255GB. It is also possible to specify RAID-5 erasure coding at the host level. As you can see, the creation and proper placement of vSAN components can get complex. Fortunately, vSAN takes care of creating and placing the necessary components to achieve resilience. An administrator simply assigns a storage policy to the virtual machine and vSAN handles the rest.
It is important to understand that this higher level of resilience comes at a cost in the form of higher capacity consumption. The following formulas can be used to calculate maximum raw capacity consumption based on a nested fault domain storage policy:
FDFTT=1, FTT=1 with RAID-1 mirroring: Object size x 4
FDFTT=1, FTT=1 with RAID-5 erasure coding: Object size x 2.66
As an example, a 100GB virtual disk can consume up 400GB of raw capacity if RAID-1 mirroring is specified at both the fault domain and host levels. RAID-5 erasure coding at the host level reduces this number to 266GB.
Note: The numbers above do not factor in potential reductions in raw capacity consumption as a result of thin-provisioning (vSAN Object Space Reservation set to 0%, which is the default setting) or enabling vSAN Deduplication and Compression.
Nested fault domains require at least nine hosts when using RAID-1 mirroring at the fault domain and host levels (minimum of three fault domains x minimum of three hosts per fault domain). 12 or more hosts are required when using RAID-5 erasure coding at the host level (minimum of three fault domains x minimum of four hosts per fault domain). The fault domain level only supports RAID-1 mirroring at this time. RAID-5 erasure coding is only supported at the host level. FDFTT=1 and FTT=1 are the only nested fault domain rules in a standard (non-stretched) vSAN cluster that are currently supported. Values higher than one for these rules are not supported when using nested fault domains.
As with any standard (non-stretched) vSAN cluster, there are specific network requirements. These can be found in the vSAN Planning and Deployment documentation.
Configuring Hosts for Nested Fault Domains
It is recommended to plan and create vSAN fault domains prior to enabling the nested fault domains feature. See Create a New Fault Domain in vSAN Cluster in the vSAN 6.7 documentation.
Nested fault domains are enabled using this command line advanced setting on each host:
esxcfg-advcfg -s 1 /VSAN/GenericNestedFD
This setting must be applied to every host in the cluster before creating a nested fault domain storage policy. A host reboot is not required for the setting to take effect. If this advanced setting is not set to 1 on every host in the cluster, an “Operation failed” error message will likely be generated when attempting to assign nested fault domain policy to a virtual machine.
Use this command to get the current setting:
esxcfg-advcfg -g /VSAN/GenericNestedFD
This command disables nested fault domain support on the host:
esxcfg-advcfg -s 0 /VSAN/GenericNestedFD
After the advanced setting has been enabled on all hosts in the cluster, a nested fault domain storage policy can be created and assigned to virtual machines. The “none – standard cluster with nested fault domains” option becomes available in the “Site disaster tolerance” drop down menu when configuring a storage policy. “Fault domain failures to tolerate” should be set to “1 failure – RAID-1 (Mirroring)” and “Failures to tolerate” should be set to “1 failure – RAID-1 (Mirroring)” or “1 failure – RAID-5 (Erasure Coding).”
Diagram 4. Nest fault domain storage policy rules
It might take some time for vSAN to achieve compliance with the new policy after it is assigned. That is because vSAN typically has to create additional copies of the components and redistribute them across fault domains and hosts within each fault domain. This resync activity can be monitored in the vSphere Client by clicking on the cluster > Monitor > Resyncing Objects. Considering objects with a nested fault domain policy assigned consume more capacity, it is best to modify the assignments a few virtual machines at a time and actively monitor the amount of free space on the vSAN datastore. Do not allow the vSAN datastore to run out of free space.
Conclusion
vSAN nested fault domains is a new feature in vSAN 6.7 U1 that provides higher levels of resilience in a standard vSAN cluster. This is achieved by mirroring vSAN data across fault domains such as server racks and applying mirroring or erasure coding within each fault domain. This method of redundancy consumes additional raw capacity but offers protection against more complex failure scenarios such as the loss of one fault domain and a host in another fault domain simultaneously. The nested fault domains feature is currently supported only through an RPQ process. Contact your VMware account team for details on completing an RPQ.