Automated Recovery for Workloads with Persistent Memory (PMEM)
Persistent Memory, using Non-Volatile Memory (NVDIMM) is a great capability that combines the speed of global system memory with persistent storage capabilities. PMEM is byte-addressable, and sits on the memory bus like with 'normal' memory modules. The persistent bit meaning that stored data survives power-cycles of the host.
PMEM helps a wide variety of workloads with optimal performance, like SAP HANA deployments. vSphere already has extensive support for PMEM in previous versions. However, a constraint was that PMEM was not supported with vSphere High Availability (HA), which helps to recover workloads in the event of host failure in a cluster.
Automated Recovery for PMEM Workloads
That has changed with vSphere 7 Update 2. Workloads utilizing PMEM as NVDIMM can now be protected by vSphere HA, which is important as workloads using PMEM are typically mission-critical applications.
By default, vSphere HA will not attempt to register and power-on the virtual machine on a surviving host in the cluster. The vSphere HA option for PMEM needs to be explicitly enabled on a per virtual machines basis. When enabled, vSphere HA helps to automatically restart the virtual machine on another host, that is equipped with PMEM, with a new/empty NVDIMM. An empty NVDIMM because, in the event of host failure, PMEM data on NVDIMM cannot be recovered on another host.
vSphere HA admission control will make sure that there is sufficient PMEM capacity is available for failover. It's also good to note that vSphere HA is not supported with vPMEMDisk, an option where PMEM is used as a datastore. In that scenario, there's benefits from a performance perspective, but it's not utilizing PMEM in a byte addressable way. More info is found here.
How to Configure
For vSphere HA support for PMEM to be enabled, virtual machines need to run VM hardware version 19. When a NVDIMM device is added, or re-configured, a new option is provided to 'Allow failover on another host for all NVDIMM devices'.
Once enabled, it's stated in vSphere Client that 'On host failure, HA will restart virtual machine on another host with new, empty NVDIMMs.'