Tips for Patching VMware vSphere & Cloud Infrastructure

Introduction

This is a collection of techniques and ideas that have been used to ensure success of patching and upgrades. These ideas aren't just technical, they are also people & process improvements, too. Patching is often as much a process and communications issue as it is a technical task!

Disclaimer

This document is intended to provide general guidance for organizations that are considering VMware solutions. The information contained in this document is for educational and informational purposes only. This document is not intended to provide advice and is provided “AS IS.”  VMware makes no claims, promises, or guarantees about the accuracy, completeness, or adequacy of the information contained herein. Organizations should engage appropriate legal, business, technical, and audit expertise within their specific organization for review of requirements and effectiveness of implementations.

Tips

Virtualization and cloud administrators understand that patching vCenter Server doesn't impact workloads, and that vMotion can move workloads seamlessly so that ESXi can be patched. Other people in your organization may not know this, though. Taking time to explain this to change managers, risk assessors, managers, and so on can pay dividends in having patching and upgrades approved.

Establish maintenance windows for virtualization infrastructure components so that downstream application administrators and others know when you are doing maintenance and may vMotion workloads to other hosts.

Learn the ITIL concepts of a standard, normal, and emergency change, and use them where appropriate. For example, a standard change is a routine change. You likely make all sorts of adjustments to environments that qualify as a standard change, such as deploying a new VM, because they don’t impact operations. Emergency changes are things that have to be done as soon as possible, such as applying a patch for a critical security advisory. Normal changes are everything else and are often scheduled inside a regular maintenance window. Using these terms helps other parts of the organization understand the work you are doing and its importance. Do not be afraid to use the concept of emergency changes for critical security updates, as it helps organizations align, assess the risk collectively, and prioritize the work.

Ensure that you have your the vCenter Server Appliance (VCSA) root & administrator@vsphere.local account passwords stored correctly and are not locked out. By default, the VCSA root account locks itself after 90 days, which may be an unwanted surprise if you need it in an emergency. Prior to patching, verify that these accounts work correctly, recovering the passwords if needed (which sometimes takes a restart of vCenter Server). Change them after the work is done.

Ensure that time settings are correct on all components in your environment, including ESXi, vCenter Server, SDDC Manager, other appliances, storage arrays, and network switches. Many issues on systems can be traced to incorrect time synchronization. The maintainers of the open-source NTP software suggest configuring four NTP servers (but never two – if one is wrong you’ll never know which one!).

Ensure that there is an A (forward) and PTR (reverse) record for vCenter Server, ESXi, and other appliances in DNS, and that they resolve to each other. This means that the A record resolves to the correct IP address, and that IP address has a PTR record that resolves to that A record. This may seem like a basic check, but it only takes a few seconds and sometimes it reveals unwanted (and unsupported) changes. PTR records are required for vCenter Server and other appliances; omitting them is not an accepted security practice.

Ensure that vCenter Server’s file-based backup & restore is configured and generating scheduled output. You can configure this through the Virtual Appliance Management Interface (VAMI) on port 5480/tcp on the VCSA.

Take a snapshot of the VCSA prior to an update, and preferably from the ESXi host client after the VCSA has been shut down gracefully & cleanly. Snapshots have performance impacts, so ensure that you delete it soon after the upgrade is verified.

An experienced sysadmin once suggested that if it has been a while since a system has been restarted it is often a good idea to simply restart it in place, let it come back up again and prove that it’s working well. Otherwise, you won’t be able to tell whether a problem was pre-existing or because of the work that just happened. An extra reboot does add management interface downtime, but if corrective action is needed it shortens the time to restore the service.

If vSphere HA has been configured with a custom isolation address (das.isolationaddress) ensure that it is NOT set to the vCenter Server itself, or that multiple addresses are specified (das.isolationaddress0 through das.isolationaddress9) so that one address becoming unavailable does not trigger HA failover.

Where possible, minimize the number of plugins installed in vCenter Server. Modern zero-trust security architecture practices discourage connecting systems in these ways, as these types of interconnections allow attackers greater opportunity. Fewer things installed also means fewer things to worry about from a compatibility perspective, making upgrades and patching much less work.

Minimize additional installed VIBs on ESXi, and where possible, use "stock" VMware ESXi versions instead of OEM customized ones. This helps avoid issues with VIB version conflicts that can arise from vendor packages. vSphere Lifecycle Manager makes it easy to add OEM driver packages and additional OEM software if you truly need it, including on a piecemeal basis. Remember that, from a security perspective, less is more when it comes to installing extra software. Don't install things you don't absolutely need.

Use DRS groups & rules to keep vCenter Server and other important VMs (KMS, DNS, AD) on a particular ESXi host (use a “should” rule, see below). If there is an issue with the environment you will be able to find the VCSA easily, using the Host Client to repair or restart it. Ensure that a management workstation can get to that host client interface on that ESXi host, and that you can log in.

Use “should” DRS affinity rules where possible. By using “should” you enable VMs to be moved automatically by DRS for normal host patching.

vCenter Server should always be updated before ESXi when there are patches for both.

Seeing, or not seeing, weird things after you update vSphere? Clear your browser cache as part of your update process to ensure that the latest vSphere Client components download and display properly.

Filter Tags

Security ESXi ESXi 6.5 ESXi 6.7 ESXi 7 vCenter Server vCenter Server 6.5 vCenter Server 6.7 vCenter Server 7 vSphere vSphere 6.5 vSphere 6.7 vSphere 7 Document Intermediate