Achieve higher levels of resiliency with VMware Virtual Watchdog in a Clustered Setup

VMware virtual watchdog introduced in vSphere 7 ensures that all the VMs in a clustered setup (RHEL HA, MS SQL Failover Cluster Instances, etc.) can overcome application crashes or Operating System (OS) related faults. It does so by restarting the VM when it detects an application or guest operating system crash when they fail to reset the watchdog timer. After restarting, it also updates the guest operating system, informing it that a crash caused the restart.

Overview

My colleague Niels Hagoort has written a fantastic blog describing VMware virtual watchdog in detail, and this blog will further expand on how to enable this service.. Here, we will present a step-by-step procedure for adding and activating the virtual watchdog device in Linux and Windows-based VMs.

Consider a three-node (VM) cluster running any application. If one of the nodes goes to a hung state, the availability reduces to 2. In an unmonitored cluster, the availability might reduce even further if there is maintenance or patching or an underlying system error causing further degradation. Therefore ensuring that nodes overcome application or OS-related faults is critical in a clustered setup.

It is recommended to add VMware virtual watchdog device to every node(VM) in a clustered setup.

The virtual watchdog device is exposed to the guest operating system (GOS) through BIOS/EFI ACPI tables.

Pre-requisites to enable VMware virtual watchdog

VM hardware version 17 or above.
VM must be powered off to add VMware virtual watchdog device.
Following guest operating systems are supported.

Windows Server operating system 2003 onwards
Ubuntu 18.04 onwards, RHEL 7.6/Centos 7.6 onwards (kernel must be 4.9+)

watchdog package must be installed on the Linux VMs.

Adding VMware virtual watchdog to a Linux VM

Step 1: Add the virtual watchdog device from the Edit settings wizard. The virtual watchdog device also supports “Start with BIOS/EFI” mode. Selecting this option starts the watchdog timer as soon as the VM is powered on. It is recommended to leave this option unselected to have more control on the watchdog timer (It can be started after the boot process and application configuration.)

Step 2: Log in to the VM and install the watchdog package

For RHEL/CentOS, run the yum install watchdog command to start the installation.

For Ubuntu/Debian, run the apt-get install watchdog command to start the installation.

Step 3: After installation, run the following commands to start, enable and check the status of the watchdog service

#systemctl start watchdog
#systemctl enable watchdog
#systemctl status watchdog

Step 4: Verify that the watchdog module ( wdat_wt) has been successfully loaded by running the following command:

# lsmod|grep wdat_wdt

Step 5: Open the watchdog.conf file( present under /etc directory) and add the following 2 lines:

watchdog-device = /dev/watchdog
watchdog-timeout = 10

watchdog-timeout configuration setting specifies how many seconds apart the application or OS pings the timer. If the watchdog timer doesn't get pinged within this timeframe, it detects that there is an anomaly and restarts the VM.

After editing watchdog.conf file, restart the watchdog service by running the following command

systemctl restart watchdog

Step 6: Verify that the virtual watchdog device shows up as running from the Edit settings wizard of the VM.

Installation of virtual watchdog via VM’s configuration file (.VMX file)

To add VWDT, add vwdt.present = “TRUE” in the configuration file of the VM.
To remove VWDT, add vwdt.present = “FALSE”, or remove the configuration setting - vwdt.present from the configuration file of the VM.

Above mentioned configuration settings add the virtual watchdog device to the VM. However, steps related to adding and configuring a watchdog in the guest operating system still need to be performed in the VM.

Adding VMware virtual watchdog to a Windows VM

Windows server operating system does not require any extra software installation/configuration to enable the virtual watchdog. The virtual watchdog can be added and enabled from the Edit settings wizard of the VM.

Logging

Whenever a virtual watchdog restarts a VM, the vmware.log of the VM, logs virtual watchdog activities with a “VWDT:” prefix.

Conclusion

Virtual watchdog comes quite handy, especially during a maintenance window when the infrastructure experiences a planned reduction in availability. During such a window, an application crash or OS-related fault in the remaining VMs could be catastrophic. Customers should take advantage of the virtual watchdog feature to increase resilience to failure by ensuring VMs respond with restart according to the watchdog timeout.