Understanding the vSAN Witness Host
What is a vSAN witness host? Why do we need it? How does it work? What would have happened in the case of a vSAN stretched cluster failure scenario without a witness host? We will give answers to these frequently asked questions in the following paragraphs. Additionally, we’ll cover some more insights around the witness host configuration, witness lifecycle, what are the options for deployment, and are there any specifics when deployed on VMC on AWS or VMware Cloud Foundation (VCF). Let’s dive into the details.
What’s a vSAN Witness host?
The vSAN witness host is basically an ESXi host that does not store VM data but only VM metadata, or in other terms, it stores the witness components for each VM object from a Stretched cluster or a 2-node cluster configuration. Each vSAN witness host can be deployed as a witness appliance. The witness appliance can be deployed using an OVA template and does not require an additional vSphere license. This witness appliance VM is treated as an ESXi host and will be represented in blue in the vSphere user interface.
Why do we need a witness host and how does it work?
Whenever there’s a risk of a failure of an entire site, vSAN Stretched cluster is the recommended configuration to alleviate this type of temporary workload interruption. vSAN Stretched cluster resolves the site availability issue by building on top of the concept of a fault domain. The Fault domain concept represents a specific vSAN approach where it treats each host as an independent failure domain, and it makes sure that each component of a given VM object is placed onto a separate host for redundancy. Stretched cluster implements the same model by extending it on a per site level and introducing the witness host as a tiebreaker in case of a failure scenario. Each VM object should have at least one complete replica on each site to satisfy a policy consisting of “Site disaster tolerance =1”, which is the default policy for this type of cluster. Then, depending on the number of Failures to tolerate, each object might have additional replicas or parity components placed on different hosts/devices inside each site. The user has the following options for local protection: for a Stretched cluster – 1-3 failures/ RAID-1,5,6, and for a 2-node cluster – 1 failure / RAID-1. See availability levels in Fig. 2. below.
Fig.2. Availability levels in a stretched cluster storage policy
When a policy consisting of Site Disaster Tolerance of Dual Site Mirroring is assigned to an object, vSAN commits any writes from one object synchronously to both, the preferred and non-preferred sites, while reading locally from whichever site the VM resides within.
So, let’s get back to the main question – Why do we need a witness host? As we mentioned, it will serve as a tiebreaker in a split-brain scenario, where the witness host will determine which site is the authoritative one.
What happens in case of a site failure and no witness host?
A split-brain scenario may happen when an actual fault occurs within one of the sites, and the sites cannot decide independently which one is the surviving one. This event may result in an application being active in both sites and might break it. vSAN removes this possibility by enabling the witness host to provide additional votes to form a quorum with the remaining site and elect the healthy site where the VM should be vMotioned and restarted.
Witness host or appliance recommendations
Now that we know how crucial the role of the witness host is, we can look at some of the most important recommendations concerning its configuration and deployment. For example, each witness appliance or witness host should be deployed outside of the Stretched cluster or the 2-Node cluster, but it should be part of the same vCenter. The general recommendation is to place the vSAN Witness Host in a different datacenter.
Fig.4. vSAN Witness Host in a separate datacenter
Another golden rule says that the witness host should be the same version and build as the vSAN data nodes because although the vSAN Witness Host is not part of the vSphere Cluster, it is contributing disks to the vSAN Cluster.
Keep in mind that the physical witness host doesn’t store any of the VM’s data. In case you need those hardware resources for your VM’s data, instead of using a dedicated physical ESXi host, you can deploy a vSAN witness appliance. The deployment process is the same as deploying a VM from a template. The witness appliance is simply a pre-configured virtual machine that runs ESXi and is distributed as an OVA template. Before you start the deployment process you should consider the number of VMs you’ll need to run in your stretched/2-node cluster because the size of the witness will reflect on the maximum number of VMs you can run in one cluster. In vSAN 7 Update 2 VMware introduced a new shared witness option for the 2-node cluster configuration. It offers the possibility to share one single vSAN witness host or appliance with up to 64 2-node clusters. This feature led to the creation of a new “extra-large” option. Take look at the quick sizing guide table in this blog post on shared witness.
The vSAN Witness Host is a stateless appliance that can be easily replaced in the vSphere client. Like most stateless apps, there is no need to implement data protection and recovery plans. Look at this section of the 2 Node cluster guide for more details.
VCF stretched cluster witness
The VCF stretched cluster does not make an exception and it also requires a witness host. The witness should not share infrastructure dependencies with either availability zone (AZ), and the deployment of the witness to either availability zone is not supported.
Note that there’s a list of networks that must be stretched between the two availability zones where the hosts will be located. The following is the list of the witness networks to be stretched:
- ID of witness host
- FQDN of witness host
- vSAN subnet CIDR of witness host
vSAN stretched cluster witness on VMC on AWS
The Stretched Clusters’ concept remains the same when configured on VMC on AWS, it protects against the loss of a single AZ by stretching the vSphere cluster across two Availability Zones within a single region. The witness host is deployed in a third availability zone and again will provide a quorum in case of a split-brain scenario. In the case of a witness failure, all VMs will remain running in place, which applies to a vSAN stretched cluster running on-premises as well. What’s specific about the Stretched cluster running on VMC on AWS, is that the automated support system will deploy a new witness appliance into the cluster if there’s an unexcepted witness host failure. vSAN will rebuild any missing cross-site witness components as soon as the new healthy witness becomes part of the stretched cluster configuration.
The witness host or appliance is a vital part of the vSAN Stretched cluster, and of the vSAN 2-Node cluster. It stores all the meta-data required in case of a site failure and serves as a tiebreaker. On-premises, at the Edge or in the cloud, it can be easily deployed as part of the vSAN stretched cluster configuration to help maintain higher availability of your data.
- vSAN Stretched cluster guide
- vSAN Operations guide
- vSAN Interactive infographic
- Shared Witness for 2-Node vSAN Deployments
- Upgrading vSAN 2-node Clusters with a Shared Witness from 7U1 to 7U2