Host Upgrade Time Challenges
During an ESXi host upgrade, a large amount of time is spent live-migrating virtual machines (VMs) away from the host, that is being upgraded, to another host. Once the upgrade is done, the evacuated VMs can be load balanced and migrated back to the host as needed. The total time for this operation can range from several minutes to several hours depending on workload size, network bandwidth and spare capacity available for migrating the VMs off the host (either due to a lack of network bandwidth or spare capacity). The time taken to suspend to disk is still orders of magnitude greater than keeping the VMs in memory while the upgrade is performed. Powering off VMs is the fastest but then the VM state is lost, typically not what customers are opting for.
Helping to Reduce Host Upgrade Time
Suspend-to-Memory (STM) is a new option for the desired state image management in vSphere Lifecycle Manager (vLCM) with vSphere 7 Update 2. It suspends the VM states into local host memory during upgrades/remediation. Because the VM states are stored in global system memory, we can't power-cycle the ESXi host. That's why Suspend-to-Memory is depending on Quick Boot, a feature released with vSphere 6.7 Update 1. Once the host is upgraded and the VMkernel is restarted using Quick Boot, the VMs are resumed and continue to run as they did before.
Note: Quick Boot needs the host to be supported for it. Check KB Article 52477 for host compatibility. With each (patch) release of ESXi, Quick Boot support is expanded for more systems.
The default VM power state in the cluster image remediation settings is set to ‘Do not change power state’. In that scenario, if a host is put into maintenance mode for upgrades, vSphere will live-migrate workloads from the host if vMotion is configured. Live-migrations can take up time as discussed above, strongly depending on workload characteristics and sizing.
When Suspend-to-Memory is the best option, customers need to explicitly configure it in the Lifecycle Manager remediation settings for images. Suspend-to-Memory is only applicable for vLCM enabled clusters, using the desired state image management. With Suspend-to-Memory enabled and host remediation starts, there's a check if adequate memory is available to suspend VM states in memory. If not, the suspend to memory option will fail and the host is not placed in maintenance mode.
Good use-cases for suspend to memory are environments where vMotion is not possible, or feasible, even though the vMotion logic has been greatly improved in vSphere 7. If customers benefit from zero workload evacuations being done (maybe because of large virtual machine footprints) during host remediation, and workloads can be ‘offline’ during ESXi host upgrades, it's a perfect option to reduce the overall ESXi host upgrade time.
Suspend-to-Memory has vSAN support and works with vSphere Tanzu and NSX-T.
Check out this quick demo that shows what Suspend-to-Memory looks like in the vSphere Client, and how to configure it.