]

Type

  • Document

Level

  • Intermediate

Category

  • Operational Tutorial

Product

  • vSAN 6.7

vSAN Operations Guide

Introduction

VMware vSAN provides enterprise-class storage that is robust, flexible, powerful, and easy to use. vSAN aggregates locally attached storage devices to create a storage solution that can run at the edge, the core, or the cloud—all easily managed by vCenter. vSAN is integrated directly into the hypervisor. It provides abilities and integration that is unlike traditional three-tier architectures.

While vSAN-powered clusters share many similarities to vSphere clusters in a three-tiered architecture, the unique abilities and architecture of vSAN means that some operational practices and recommendations may be different than that of traditional environments.

This document provides concise, practical guidance in the day-to-day operations of vSAN-powered clusters. It augments the step-by-step instructions found in VMware Docs, KB articles, and detailed guidance found core.vmware.com.  This operations guide is not intended to be "how-to" documentation. It offers general guidance and recommendations applicable to a large majority of environments. Requirements unique to a specific environment may dictate slightly different operational practices, thus the reason for no single "best practice." New topics may be added periodically. Please check to ensure the latest copy is used.

The guidance provided in this document reflects recommendations in accordance with the latest version of vSAN at the time of this writing: vSAN 7 Update 1 (U1). New features in vSAN will often impact operational recommendations. When guidance differs based on recent changes introduced to vSAN, it will be noted.  The guidance will not retain an ongoing history of practices for previous versions of vSAN.

Section 1: Cluster

Create a vSAN Cluster

Since vSAN is a cluster-based solution, creating a cluster is the first logical step in the deployment of the solution. Unlike traditional three-tier architectures, vSAN storage is treated as a resource of the cluster, which offers unique capabilities in cluster design. More information on these concepts can be found at: vSAN Cluster Design – Large Clusters Versus Small Clusters.

Create a vSphere cluster

The first step of creating a vSAN cluster is creating a vSphere cluster.

  • Right-click a data center and select New Cluster.
  • Type a name for the cluster in the Name box.
  • Configure VMware Distributed Resource Scheduler (DRS), vSphere High Availability (HA), and vSAN for the cluster, and click OK.

FIGURE 1-1: Available configuration options when creating a new vSAN cluster

Adding hosts to a vSphere cluster

The second step is to add hosts to the newly created cluster.  There are two methods available. The traditional method is to right-click on the cluster and select Add hosts. The new streamlined method is to use the Cluster Quickstart wizard. The Cluster Quickstart wizard can be found by clicking on the existing cluster and selecting: Configure → Configuration → Quickstart.  Hosts can be added by using the Add hosts wizard

  • On the Add hosts page, enter information for new hosts, or click Existing hosts and select from hosts listed in the inventory.
  • On the Host summary page, verify the host settings.
  • On the Ready to complete page, click Finish.

FIGURE 1-2: Adding more than one host at a time using the Add hosts wizard

The selected hosts are placed into maintenance mode and added to the cluster. When you complete the Quickstart configuration, the hosts exit maintenance mode. Note that if you are running vCenter Server on a host in the cluster, you do not need to place the host into maintenance mode as you add it to a cluster using the Quickstart workflow. The host that contains the vCenter Server virtual machine (VM) must be running VMware ESXi 6.5 EP2 or later. The same host can also be running a Platform Services Controller. All other VMs on the host must be powered off.

Recommendation: Take advantage of vSAN’s flexibility. The initial sizing of a cluster does not need to be perfect. The value of vSAN is that you have the flexibility to scale up, scale out, or reconfigure as needed.

Verify vSAN health checks

Once the hosts are added to the cluster, the vSAN health checks verify that the host has the necessary drivers and firmware.  Note that if time synchronization fails, the next step allows you to bulk configure Network Time Protocol (NTP) on the hosts.

FIGURE 1-3: vSAN health checks performed during the time of adding hosts to a cluster

Cluster configuration

The third and final step to Quickstart is cluster configuration. On the Cluster configuration card, click Configure to open the Cluster configuration wizard.

  • On the Configure the distributed switches page, enter networking settings, including distributed switches, port groups, and physical adapters. Network I/O Control is automatically created on all switches created. Make sure to upgrade existing switches if using a brownfield vDS.
    • In the port groups section, select a distributed switch to use for VMware vSphere vMotion and a distributed switch to use for the vSAN network.
    • In the physical adapters section, select a distributed switch for each physical network adapter. You must assign each distributed switch to at least one physical adapter. This mapping of physical network interface cards (NICs) to the distributed switches is applied to all hosts in the cluster. If you are using an existing distributed switch, the physical adapter selection can match the mapping of the distributed switch.
  • On the vMotion and vSAN traffic page, it is strongly encouraged to provide dedicated VLANs and broadcast domains for added security and isolation of these traffic classes.
  • On the Advanced options page, enter information for cluster settings, including DRS, HA, vSAN, host options, and Enhanced vMotion Compatibility (EVC). Setup is the ideal time to configure encryption, deduplication, and compression. Configuring these settings upfront reduces the need to enable them at a later time by moving or disrupting data.
  • Enable EVC for the most current generation of processors that the hosts in the cluster supports. The EVC and CPU Compatibility FAQ contains more information on this topic.
    • On the Claim disks page, select disks on each host for cache and capacity.
    • (Optional) On the Create fault domains page, define fault domains for hosts that can fail together. For more information about fault domains, see “Managing Fault Domains in vSAN Clusters” in “Administering VMware vSAN.”
    • On the Ready to complete page, verify the cluster settings, and click Finish.

Summary

Creating vSAN clusters is not unlike the creation of a vSphere cluster in a three-tier architecture. Both use the “Cluster Quickstart” feature built into vCenter, which offers the ability to easily scale the cluster as needed.

Pre-Flight Check Prior to Introducing Cluster into Production

Introducing a new vSAN cluster into production is technically a very simple process. Features such as Cluster Quickstart and vSAN health checks help provide guidance to ensure proper configuration, while VM migrations to a production cluster can be transparent to the consumers of those VMs. Supplement the introduction of a new vSAN cluster into production with additional steps to ensure that, once the system is powering production workloads, you get the expected outcomes.

Preparation

Preparation helps reduce potential issues when VMs rely on the services provided by the cluster. It also helps establish a troubleshooting baseline. The following may be helpful in a cluster deployment workflow:

  • Have the steps in the “vSAN Performance Evaluation Checklist” in the Proof of Concept (PoC) guide been followed? While this document focuses on adhering to recommended practices during the evaluation of the performance of vSAN during a PoC, it provides valuable guidance for any cluster entering into production.
  • Will VMs in this vSAN cluster require different storage policies than are used in other clusters? See “Using Storage Policies in Environments with More Than One vSAN Cluster” for more information.
  • What is the intention of this cluster? And what data services reflect those intentions? Are its VMs primarily focused on performance or space efficiency? Do VMs need to be encrypted on this cluster? Generally, cluster-wide data services are best enabled or disabled at the time the cluster is provisioned.
  • Has host count in cluster size been sufficiently considered? Perhaps you planned on introducing a new 24-node cluster to the environment. You may want to evaluate whether a single cluster or multiple clusters are the correct fit. While this can be changed later, evaluating at initial deployment is most efficient. See “vSAN Cluster Design—Large Clusters Versus Small Clusters” on core.vmware.com.

Recommendation: Always run a synthetic test (HCIBench) as described in the vSAN Performance Evaluation Checklist prior to introducing the system into production. This can verify that the cluster behaves as expected and can be used for future comparisons should an issue arise, such as network card firmware hampering performance. See step 1 in the “Troubleshooting vSAN Performance” document for more information.

Summary

Design verification and vSAN cluster configuration can help reduce post-deployment issues or unnecessary changes. Follow the guidance found in the vSAN Performance Evaluation Checklist in the PoC guide for any cluster entering production. It contains the information necessary to deploy the cluster with confidence, and for potential troubleshooting needs.

Maintenance Work on L2/L3 Switching on Production Cluster

Redundant configuration

VMware vSAN recommends configuring redundant switches and either NIC teaming or failover so that the loss of one switch or path does not permanently cause a switch outage.

FIGURE 1-4: Virtual Switch and port group configuration

Health checks

Prior to performing maintenance, review the vSAN networking health checks. Health checks tied to connectivity, latency, or cluster partitions can help identify situations where one of the two paths is not configured correctly, or is experiencing a health issue.

FIGURE 1-5: Network-related health checks in the vSAN health UI in vCenter

Understanding the nature of the maintenance can also help you understand what health alarms to expect. Basic switch patching can sometimes be performed non-disruptively. Switch upgrades that can be performed as an in-service software upgrade (ISSU) may not be noticeable, while physically replacing a switch may lead to a number of connectivity alarms. Discuss the options with your networking vendor.

Testing failure impacts

It is a good idea to simulate a path failure on a single host (disable a single port) before taking a full switch offline. If VMs on that host become unresponsive, or if HA is triggered, this may imply an issue with pathing that should be resolved prior to switch removal or reboot.

Controlled maintenance

If fault domains are used with multiple racks of hosts using different switches, consider limiting maintenance to a single fault domain and verify its health before continuing on. For stretched clusters, limit maintenance to one side at a time to reduce potential impacts.

Summary

In a vSAN environment, configuration of virtual switches, and the respective uplinks used follows practices commonly recommended in traditional three-tier architectures. With the added responsibility of serving as the storage fabric, ensuring that the proper configuration is in place will help the abilities of vSAN to perform as expected.

Configuring Fault Domains

Each host in a vSAN cluster is an implicit fault domain by default. vSAN distributes data across fault domains (hosts) to provide resilience against drive and host failure. This is sufficient to provide the right combination of resilience and flexibility for data placement in a cluster in the majority of environments. There are use cases that call for fault domain definitions spanning across multiple hosts. Examples include protection against server rack failure, such as rack power supplies and top-of-rack networking switches.

vSAN includes the ability to configure explicit fault domains that include multiple hosts. vSAN distributes data across these fault domains to provide resilience against larger domain failure—an entire server rack, for example.

vSAN requires a minimum of three fault domains. At least one additional fault domain is recommended to ease data resynchronization in the event of unplanned downtime, or planned downtime such as host maintenance and upgrades. The diagram below shows a vSAN cluster with 24 hosts. These hosts are evenly distributed across six server racks.

FIGURE 1-6: A conceptual illustration of a vSAN cluster using 24 hosts and 6 explicit fault domains

With the example above, you would configure six fault domains—one for each rack—to help maintain access to data in the event of an entire server rack failure. This process takes only a few minutes using the vSphere Client. “Managing Fault Domains in vSAN Clusters” contains detailed steps for configuring fault domains in vSAN, and recommendations for "Designing and Sizing vSAN Fault Domains" are also available. The “Design and Operation Considerations When Using vSAN Fault Domains” post offers practical guidance for some of the most commonly asked questions when designing for vSAN fault domains.

Recommendation: Prior to deploying a vSAN cluster using explicit fault domains, ensure that rack-level redundancy is a requirement of the organization. Fault domains can increase the considerations in design and management, thus determining the actual requirement up front can result in a design that reflects the actual needs of the organization.

vSAN is also capable of delivering multi-level replication or “nested fault domains.” This is already fully supported with vSAN stretched cluster architectures. Nested fault domains provide an additional level of resilience at the expense of higher-capacity consumption. Redundant data is distributed across fault domains and within fault domains to provide this increased resilience to drive, host, and fault domain outages. Note that some features are not available when using vSAN's explicit fault domains.  For example, the new reserved capacity functionality in the UI of vSAN 7 U1 is not supported in a topology that uses fault domains such as a stretched cluster, or a standard vSAN cluster using explicit fault domains.

You can precisely manage the balance of resilience and capacity consumption based on application and business requirements using per-VM storage policies. Nested fault domains are currently supported through a Request for Product Qualification (RPQ) process for standard (non-stretched) clusters. 

Summary

Standard vSAN clusters using the explicit “Fault Domains” feature offers tremendous levels of flexibility to meet the levels of resilience required by an organization. They can introduce different operational and design considerations than that of a standard vSAN cluster not using this feature. Becoming familiar with these considerations will help you determine if they are a good fit for your organization.

Migrate Hybrid Cluster to All-Flash vSAN Cluster

In some cases, an administrator may want to migrate a vSAN cluster built initially with spinning disks to an all-flash based vSAN cluster. The information below describes some of the considerations for an in-place migration.

Planning the process

Review the supported process steps to cover this action. Identify if the disk controllers and cache devices currently in use can be reused for all-flash. Note that there may be newer driver/firmware certified for the controller for all-flash usage. Check the VMware Compatibility Guide (VCG) for vSAN for more information.

Confirm that the cluster has sufficient capacity once the migration is complete without requiring the use of deduplication and compression (DD&C) "compression only" or RAID-5/6. If space efficiency features are required, consider migrating some VMs outside the cluster until the conversion is completed. It is recommended to replace disk groups with the same or more capacity as part of the migration process if done in place.

Identify if you will be converting disk group by disk group, or host by host. If there is limited free capacity on the existing cluster, migrating disk group by disk group requires less slack space. If migrating host by host, other actions (such as patching controller firmware and patching ESXi) can be included in this workflow to reduce the number of host evacuations required. Review existing Storage Policy Based Management (SPBM) policies for cache reservation usage. This policy is not supported on all-flash and leads to health alarms and failed provisioning. See “Unable to provision linked-clone pools on a vSAN all-flash cluster” for an example of this behavior.

FIGURE 1-7: An unsupported policy rule when transitioning from hybrid to all-flash vSAN

After the migration

Identify what new data services will be enabled. You will need to do the full migration first before you enable any data services (dedupe/compression / RAID-5/6). The creation of new policies and migrating VMs is recommended over changing existing RAID-1 policies.

Summary

Migrating from hybrid to all-flash vSAN can yield significant performance improvements as well as unlock space efficiency capabilities.  It is critical to review the hardware requirements and plan out the process.

Section 2: Network

Configuring NIOC for vSAN Bandwidth Management Using Shared Uplinks

vSphere Network I/O Control (NIOC) version 3 introduces a mechanism to reserve bandwidth for system traffic based on the capacity of the physical adapters on a host. It enables fine-grained resource control at the VM network adapter level, similar to the model used for allocating CPU and memory resources. NIOC is only supported on the VMware Distributed Switch (vDS) and is enabled per switch.

Planning the process

It is recommended to not enable limits. Limits artificially restrict vSAN traffic even when bandwidth is available. Reservations should also be avoided because reservations do not yield free bandwidth back for non-VMkernel port uses. On a 10Gbps interface uplink, a 9Gbps vSAN reservation would result in only 1Gbps of traffic available for VMs even when vSAN is not passing traffic.

FIGURE 2-1: Setting shares in NIOC to balance network resources under contention

Shares are the recommended way to prioritize traffic for VMware vSAN. Raise the vSAN shares to “High.”

FIGURE 2-2: An example of a configuration of shares for a vSAN-powered cluster

Other network quality of service (QoS) options

It is worth noting that NIOC only provides shaping services on the host’s physical interfaces. It does not provide prioritization in switch-to-switch links and does not have awareness of contention caused by over saturated leaf/spine uplinks, or data center–to–data center links for stretched clustering. Tagging a dedicated vSAN VLAN with class of service or DSCP can provide end-to-end prioritization. Discuss these options with your networking teams, and switch vendors for optimal configuration guidance.

Summary

Storage traffic needs low-latency reliable transport end to end. NIOC can provide a simple setup and powerful protection for vSAN traffic.

Creating and Using Jumbo Frames in vSAN Clusters

Jumbo frames are Ethernet frames larger than 1,500 bytes of payload. The most common jumbo configuration is a payload size of 9,000, although modern switches can often go up to 9,216 bytes.

Planning the process

Consult with your switch vendor and identify if jumbo frames are supported and what maximum transmission units (MTUs) are available. If multiple switch vendors are involved in the configuration, be aware they measure payload overhead in different ways in their configuration. Also identify if a larger MTU is needed to handle encapsulation such as VxLAN. Identify all configuration points that must be changed to support jumbo frames end to end. If Witness Traffic Separation is in use, be aware that an MTU of 1,500 may be required for the connection to the witness.

Implementing the change

Start the changes with the physical switch and distributed switch. To avoid dropped packets, make the change last to the VMkernel port adapters used for vSAN.

FIGURE 2-3: Changing the MTU size of virtual distributed switch (VDS)

Validation

The final step is to verify connectivity. To assist with this, vSAN: MTU check (ping with large packet size) will perform a ping test with large packet sizes from each host to all other hosts to verify connectivity end to end.

FIGURE 2-4: Verifying connectivity using the vSAN MTU check health check.

Summary

Jumbo frames can reduce overhead on NICs, and switch application-specific integrated circuits (ASICs). While modern NIC offload technologies can reduce this overhead, this can help improve CPU overhead associated with throughput and improve performance. The largest gains in performance for this should be expected on older, more basic NICs with fewer offload capabilities.

Create and Manage Broadcast Domains for Multiple vSAN Clusters

It is recommended, when possible, to dedicate unique broadcast domains (or collections of routed broadcast domains for Layer 3 designs) for vSAN. Benefits to unique broadcast domains include:

  • Fault isolation—Spanning tree, configuration mistakes, entering duplicate IP address, and other failures can cause a broadcast domain to fail, or failures to propagate across a broadcast domain.
  • Security—While vSAN hosts have automatic firewall rules created to reduce attack surface, data over the vSAN network is not encrypted unless by higher-level solutions (VM encryption, for example). To reduce the attack surface, restrict the broadcast domain to only contain VMkernel ports dedicated to the vSAN cluster. Dedicating isolated broadcast domains per cluster helps ensure security barriers between clusters.

Planning the process

  • There are a number of ways to isolate broadcast domains.
  • The most basic is physically dedicated and isolated interfaces and switching.
  • The most commonly chosen is to tag VLANs onto the port groups used by the vSAN VMkernel ports. Prior to this, configure the switches between the hosts to carry this VLAN for these ports.
    • Other encapsulation methods for carrying VLANs between routed segments (ECMP fabrics, VxLAN) are supported.
    • NSX-V may not be used for vSAN or storage VMkernel port encapsulation.
    • NSX-T may be used with VLAN backed port groups. (subject to versions.  NSX-T 2.2 offers notable improvements in support of vSAN environments.)

Implementing the change

The first step is to configure the VLAN on the port group. This can also be set up when the VDS and port groups are created using the Cluster Quickstart.

FIGURE 2-5: Configuring a port group to use a new VLAN

Validation

A number of built-in health checks can help identify if a configuration problem exists, preventing the hosts from connecting. To ensure proper functionality, all vSAN hosts must be able to communicate. If they cannot, a vSAN cluster splits into multiple partitions (i.e., subgroups of hosts that can communicate but not to other subgroups). When that happens, vSAN objects might become unavailable until the network misconfiguration is resolved. To help troubleshoot host isolation, the vSAN network health checks can detect these partitions and ping failures between hosts.

Recommendation: VLAN design and management does require some levels of discipline and structure. Discuss with your network team the importance of having discrete VLANs for your vSAN clusters up front, so that it lays the groundwork for future requests.

FIGURE 2-6: Validating that changes pass network health checks

Summary

Configuring discrete broadcast domains for each respective cluster is a recommended practice for vSAN deployment and management. This helps meet levels of fault isolation and security with no negative trade-off.

Change IP Addresses of Hosts Participating in vSAN Cluster

vSAN requires networking between all hosts in the cluster for VMs to access storage and maintain the availability of storage. Operationally migrating IP addresses of storage networks need extensive care to prevent loss of connectivity to storage or loss of quorum to objects.

Planning the process

Identify if you do this as an online process or as a disruptive offline process (powering off all VMs). If disruptive, make sure to power off all VMs following the cluster shutdown guidance.

Implementing the change

If new VMkernel ports are used prior to removing old ones, a number of techniques can be used to validate networking and test hosts before removing the original VMkernel ports.

  • Use vmkping to source pings between the new VMkernel ports.
  • Put hosts into maintenance mode, or evacuate VMs before removing the original vSAN VMkernel port.
  • Check the vSAN object health alarms to confirm that the cluster is at full health once the original VMkernel port has
  • been removed.
  • Once the host has left maintenance mode, vSphere vMotion® a test VM to the host and confirm that no health alarms are alerting
  • before continuing to the next host.

Validation

Before restoring the host to service, confirm that networking and object health is returning normal health.

Migrate vSAN traffic to different VMkernel port

There are cases where the vSAN network needs to be migrated from to a different segment. For example, the implementation of a new network infrastructure or the migration of vSAN standard cluster (non-routed network) to a vSAN stretched cluster (routed network). Recommendations and guidance on this procedure is given below.

Prerequisites

Check vSAN Skyline Health to verify there are no issues. This is recommended before performing any planned maintenance operations on a vSAN cluster. Any issues discovered should be resolved before proceeding with the planned maintenance.

Set up the new network configuration on your vSAN hosts. This procedure will vary based on your environment. Consult "vSphere Networking" in the vSphere section of VMware Docs https://docs.vmware.com for the version of vSphere you are running.

Ensure that the new vSAN network subnet does not overlap with the existing one. vSphere will not allow the vSAN service to run simultaneously on two VMkernel ports on the same subnet. Attempting to do this using esxcli will produce an error like the one shown below.

esxcli vsan network ip add -i vmk2

Failed to add vmk2 to CMMDS: Unable to complete Sysinfo operation. Please see the VMkernel log file for more details.

Vob Stack: [vob.vsan.net.update.failed.badparam]: Failed to ADD vmknic vmk2 with vSAN because a parameter is incorrect.

Note that you might see warnings in vSAN Skyline Health as you add new VMkernel adapters with the vSAN service--specifically, the "vSAN: Basic (unicast) connectivity check" and "vSAN: MTU check (ping with large packet size)" health checks, as shown below. This is expected if the vSAN service on one host is not able to communicate with other hosts in the vSAN cluster. These warnings should be resolved after the new VMkernel adapters for vSAN have been added and configured correctly on all hosts in the cluster. Use the "Retest" button in vSAN Skyline Health to refresh the health checks status.

FIGURE 2-7: vSAN Health warnings

Use vmkping to verify the VMkernel adapter for the new vSAN network can ping the same VMkernel adapters on other hosts. This VMware Knowledge Base article provides guidance on using vmkping to test connectivity: https://kb.vmware.com/s/article/1003728

  1. Shut down all running virtual machines that are using the vSAN datastore. This will minimize traffic between vSAN nodes and ensure all changes are committed to the virtual disks before the migration occurs.
  2. After configuring the new vSAN network on every host in the vSAN cluster, verify the vSAN service is running on both VMkernel adapters. This can be seen in the UI by checking the Port Properties for both VMkernel adapters in the UI or by running esxcli vsan network list. You should see an output similar to the text below.

[root@host01:~] esxcli vsan network list Interface

VmkNic Name: vmk1

...

Traffic Type: vsan

Interface

VmkNic Name: vmk2

...

Traffic Type: vsan

  1. Click the "Retest" button in vSAN Skyline Health to verify there are no warnings while the vSAN service is enabled on both VMkernel adapters on every host. If there are warnings, it is most likely because one of more hosts do not have the vSAN service enabled on both VMkernel adapters. Troubleshoot the issue and use the "Retest" option in vSAN Skyline Health until all issues are resolved.
  2. Disable the vSAN service on the old VMkernel adapters.
  3. Click the "Retest" button in vSAN Skyline Health to verify there are no warnings.
  4. Power on the virtual machines.

Recommendation: While it is possible to perform this migration when VMs on the vSAN datastore are powered on, it is NOT recommended and should only be considered in scenarios where shutting down the workloads running on vSAN is not possible.

Summary

Migrating the vSAN VMkernel port is a supported practice that when done properly, can be accomplished successfully quickly and with a predictable outcome.

Also see VMware Knowledge Base article 76162: How to Non-Disruptively Change the VLAN for vSAN Data in a Production Environment

Section 3: Storage Devices

Adding Capacity Devices to Existing Disk Groups

Expanding a vSAN cluster is a non-disruptive operation. Administrators can add new disks, replace capacity disks with larger disks, or simply replace failed drives without disrupting ongoing operations.

FIGURE 3-1: Adding a capacity device to an existing disk group

When you configure vSAN to claim disks in manual mode, you can add additional local devices to existing disk groups. Keep in mind vSAN only consumes local, empty disks. Remote disks, such as SAN LUNs, and local disks with partitions cannot be used and won’t be visible. If you add a used device that contains residual data or partition information, you must first clean the device. Read information about removing partition information from devices. You can also run the host_wipe_vsan_disks command in Ruby vSphere Console (RVC) to format the device.

If performance is a primary concern, avoid adding capacity devices without increasing the cache, which reduces your cache-to-capacity ratio. Consider adding the new storage devices to a new disk group that includes an additional cache device.

Recommendation: For optimal results on all-flash vSAN clusters with DD&C enabled, remove the disk group first and then recreate to include the new storage devices.

After adding disks to an existing cluster, verify that the vSAN Disk Balance health check is green. If the Disk Balance health check issues a warning, perform a manual rebalance during off-peak hours.

Summary

Scale up a vSAN cluster by adding new storage devices either to a new disk group or to an existing disk group. Always verify storage devices are on the VMware Compatibility Guide. If adding to an existing disk group, consider the cache-to-capacity ratio, and always monitor the Disk Balance health check to ensure the cluster is balanced.

Adding Additional Devices in New Disk Group

vSAN architecture consists of two tiers:

  • A cache tier for read caching and write buffering
  • A capacity tier for persistent storage

This two-tier design offers supreme performance to VMs while ensuring data is written to devices in the most efficient way possible. vSAN uses a logical construct called disk groups to manage the relationship between capacity devices and their cache tier.

FIGURE 3-2: Disk groups in a single vSAN host

A few things to understand about disk groups:

  • Each host that contributes storage in a vSAN cluster contains at least 1 disk group.
  • Disk groups contain at most 1 cache device and between 1 to 7 capacity devices.
  • A vSAN host can have at most 5 disk groups, each containing up to 7 capacity devices, resulting in a maximum of 35 capacity
  • devices for each host.
  • Whether the configuration is hybrid or all-flash, the cache device must be a flash device.
  • In a hybrid configuration, the cache device is used by vSAN as both a read cache (70%) and a write buffer (30%).
  • In an all-flash configuration, 100% of the cache device is dedicated as a write buffer.

When you create a disk group, consider the ratio of flash cache to consumed capacity. The ratio depends on the requirements and workload of the cluster. For a hybrid cluster, consider using at least 10% of flash cache to consumed capacity ratio (not including replicas, such as mirrors). For guidance on determining the cache ratio for all-flash clusters, refer to the blog posts Designing vSAN Disk Groups - All Flash Cache Ratio Update and Write Buffer Sizing in vSAN when Using the Very Latest Hardware.

Recommendation: While vSAN requires at least one disk group per host contributing storage in a cluster, consider using more than one disk group per host.

Summary

Scale up a vSAN cluster by adding new storage devices either to a new disk group or to an existing disk group. Be sure to check the VMware Compatibility Guide for a list of supported PCIe flash devices, SSDs, and NVMe devices.

Recreating a Disk Group

Disk groups form the basic construct that is pooled together to create the vSAN datastore. They may need to be recreated in some situations. It is most commonly done to remove stale data from the existing disks or as part of a troubleshooting effort.

The Recreate Disk Group process can be invoked by traversing to the Cluster→ Configure → Disk Management, as shown in FIGURE 3-3.

FIGURE 3-3: Recreating a vSAN disk group in the vCenter UI

The detailed procedure is described here.

vSAN automates the backend workflow of recreating the disk groups. Nonetheless, it is useful to understand the steps involved. Recreating a disk group involves:

  • Evacuating data (wholly or partially) or deleting the existing data on disk
  • Removing disk group from the vSAN cluster
  • Rebuilding the disk groups and claiming the disks

The administrator can choose to migrate data from the disk group through Full data migration or the Ensure accessibility option. The third option, No data migration, simply purges the data and may cause some VMs to become inaccessible. For each of the chosen options, an assessment is performed to validate the impact on the objects’ compliance and determine if there is sufficient capacity to perform the intended migration.

Recommendation: “Ensure accessibility” validates that all objects are protected sufficiently and only moves components that are not protected to other disk groups in the cluster. This limits the migration to the minimal and “necessary” data to ensure VM availability elsewhere in the cluster.  Selecting "Full data migration" ensures that all data is removed from the host or disk group(s) in question.

Summary

Recreating a disk group simplifies a multi-step process of removing a disk group, creating a new disk group, and adding disks back into one automated workflow. It also has guardrails in place to safely migrate data elsewhere in the cluster prior to rebuild.

Remove a Capacity Device

vSAN architecture comprises a cache tier and a capacity tier to optimize performance. A combination of one cache device and up to seven capacity devices make up a disk group. There are common scenarios, such as hardware upgrades or failures, where disks may need to be removed from a disk group for replacement. While the process of replacing a device is relatively easy, exercising caution throughout the process will help ensure that there is not a misunderstanding in the device replacement process.  In particular, ensure the following:

  • Ensure that the physical device desired to be removed is correctly identified in the host.  Server vendors may have different methods for matching up their physical device with what is represented in the vCenter Server UI.  This may even vary depending on the form factor of the server and/or chassis enclosure.  This can be confusing especially if there are different teams responsible for hardware and software 
  • Entering the host into maintenance mode.  This will ensure that there is no data being actively served by the host.  If you wish to also ensure all VM objects meet their respective storage policy compliance, you may wish to choose a "full evacuation" when entering into maintenance mode and waiting until resynchronization are complete.  Selecting "full evacuation" will migrate all data to the other hosts in the cluster.  Selecting "Ensure Accessibility" will also suffice if you wait until resynchronization begin and complete after the default 60-minute timeout window.

vSAN also incorporates a mechanism to proactively detect unhealthy disks and unmount them. This can happen if a device exhibits some anomalies. It allows an administrator to validate the anomaly and remove or replace the affected device.

Remove a capacity device by going to Cluster → Configure → Disk Management. On clicking a disk group, the associated devices are listed in the bottom pane, as shown:

FIGURE 3-4: Removing a vSAN capacity disk in the vCenter UI

Recommendation: If the device is being removed permanently, perform Full data migration. This ensures that objects remain compliant with the respective storage policies. Use LED indicators to identify the appropriate device that needs to be removed from the physical hardware.

An abrupt device removal would cause an all-paths-down (APD) or permanent-device-loss (PDL) situation. In such cases, vSAN would trigger error-handling mechanisms to remediate the failure.

Recommendation:  Maintain a runbook procedure that reflects the steps based on your server vendor.  The guidance provided here does not include any step-by-step instructions for the replacement of devices based on the server hardware.

Summary

vSAN eases maintenance activities such as hardware upgrades by abstracting disk removal workflow in the UI. The guardrails to assess object impact and usage of LED indicators minimize the possibility of user errors. Effectively, the entire set of capacity devices can be removed, replaced, or upgraded in a cluster with zero downtime.

Remove a Disk Group

vSAN enables an administrator to granularly control the addition and removal of a disk group from a vSAN datastore. This allows for greater flexibility and agility to carry out maintenance tasks non-disruptively. Removing a disk group effectively reduces the corresponding capacity from the vSAN datastore. Prior to removing a disk group, ensure there is sufficient capacity in the cluster to accommodate the migrated data.

Initiate removing a disk group by traversing to Cluster → Configure → Disk Management. On clicking a disk group, the Remove this disk group option is enabled in the UI, as shown:

FIGURE 3-5: Removing a vSAN disk group in the vCenter U

The administrator can choose from:

  • Full data migration
  • Ensure accessibility
  • No data migration

Full data migration would evacuate the disk group completely. Ensure accessibility moves unprotected components. No data migration would not migrate any data and removes the disk group directly.

Recommendation: Full data migration is recommended to evacuate the disk group. This ensures that objects remain compliant with the respective storage policies.

Modifying disk group composition or carrying out maintenance tasks would likely cause an imbalance in data distribution across the cluster. This is an interim condition because some hosts may contribute more capacity than others. To achieve optimal performance, restore the cluster to the identical hardware configuration across hosts.

Summary

The ability to manage a disk group individually provides a modular approach to sizing and capacity management in a vSAN cluster. The entire set of disk groups in a vSAN cluster can be removed, replaced, or upgraded without any intrusion to the workloads running on the cluster.

Section 4: vSAN Datastore

Maintaining Sufficient Free Space for Resynchronizations

vSAN requires free space set aside for operations such as host maintenance mode data evacuation, component rebuilds and rebalancing operations. This free space also accounts for capacity needed in the event of a host outage. Activities such as rebuilds and rebalancing can temporarily consume additional raw capacity. While a host is in maintenance mode, it reduces the total amount of raw capacity a cluster has. The local drives do not contribute to vSAN datastore capacity until the host exits maintenance mode.

The requirements and operational guidance for free space fall into two categories.

  • All vSAN versions prior to vSAN 7 U1.  This free space required for these transient operations was referred to as "slack space."  The limitations of vSAN in versions prior to vSAN 7 U1 meant that there was a generalized recommendation of free space as a percentage of the cluster (25-30%), regardless of the cluster size.  See the post “Revisiting vSAN’s Free Capacity Recommendations” (in versions prior to vSAN 7 U1) for a more detailed understanding of slack space.
  • All vSAN versions including and after vSAN 7 U1.  The free space required for these transient operations are now referred to as "Reserved Capacity."  This is comprised of two elements:  "Operations Reserve" and "Host Rebuild Reserve."  (The term "slack space" is no longer applicable for vSAN 7 U1 and later).  Significant improvements were included in vSAN 7 U1 reduce the required capacity necessary for efficient vSAN operations.  See the post "Effective Capacity Management with vSAN 7 U1" for a more detailed understanding of "Operations Reserve."

What is the recommended amount of free capacity needed for environments running vSAN 7 U1 and later?  The actual amount is highly dependent on the configuration of the cluster.  When sizing a new cluster, the  vSAN Sizer has this logic built in.  Do not use any manually created spreadsheets or calculators, as these will no longer accurately calculate the free capacity requirements for vSAN 7 U1 and later.  For existing environments, turning on the "Enable Capacity Reserve" option (found in the "Configure" screen of the vSAN cluster capacity view) will provide the actual capacity needed for a cluster.

Recommendation: The "reserved capacity" functionality is an optional toggle that is not enabled in a vSAN cluster by default for new or for existing clusters that were upgraded.  To ensure sufficient free capacity to meet your requirements, it is recommended to turn it on if your vSAN topology and configuration supports it.

Transient space for policy changes

Two cases where storage policy changes can temporarily consume more capacity:

  • When a new policy requires a change in component number and/or layout is assigned to a VM
  • When an existing storage policy that is assigned to one or more VMs is modified

In both cases, vSAN uses the additional capacity to make the necessary changes to components to comply with the assigned storage policy. Consider the following example.

A 100GB virtual disk is assigned a storage policy that includes a rule of Failures to Tolerate (FTT) = 1 using RAID-1 mirroring. vSAN creates two full mirrors (“replicas”) of the virtual disk and places them on separate hosts. Each replica consists of one component. There is also a Witness component created, but Witness components are very small—typically around 2MB. The two replicas for the 100GB virtual disk objects consume up to 200GB of raw capacity. A new storage policy is created: FTT=1 using RAID-5/6 erasure coding. The new policy is assigned to that same 100GB virtual disk. vSAN copies the mirrored components to a new set distributed in a RAID-5 erasure coding configuration. Data integrity and availability are maintained as the mirrored components continue to serve reads and writes while the new RAID-5 set is built.

This naturally consumes additional raw capacity as the new components are built. Once the new components are built, I/O is transferred to them and the mirrored components are deleted. The new RAID-5 components consume up to 133GB of raw capacity. This means all components for this object could consume up to 333GB of raw capacity before the resynchronization is complete and the RAID-1 mirrored components are deleted. After the RAID-1 components are deleted, the capacity consumed by these components is automatically freed for other use.

FIGURE 4-1: Illustrating the temporary use of free space as an object’s storage policy is changed

As you can imagine, performing this storage policy change on multiple VMs concurrently could cause a considerable amount of additional raw capacity to be consumed. Likewise, if a storage policy assigned to many VMs is modified, more capacity could be needed to make the necessary changes. This is one more reason to maintain sufficient slack space in a vSAN cluster. Especially if changes occur frequently or impact multiple VMs at the same time.

vSAN rebalancing

When one or more storage devices are more than 80% used, vSAN automatically initiates a reactive rebalance of the data across vSAN storage devices to bring it below 80%. This rebalancing generates additional I/O in the vSAN cluster. Maintaining the appropriate amount of free space minimizes the need for rebalancing while accommodating temporary fluctuations in use due to the activities mentioned above.

Summary

Running with a level of free space is not a new concept in infrastructure design. For all versions of vSAN up to and including vSAN 7, VMware recommends that disk capacity maintain a slack space of 25–30% to avoid excessive rebuild and rebalance operations. For all clusters running vSAN 7 U1 and later, VMware recommends using the vSAN sizer to accurately calculate the required capacity needed for transient operations and host failures.

Maintaining Sufficient Space for Host Failures

vSAN needs free space for operations such as host maintenance mode data evacuation, component rebuilds, rebalancing operations, and VM snapshots. Activities such as rebuilds and rebalancing can temporarily consume additional raw capacity.

The ability to restore an object to its desired level of compliance for protection is a primary vSAN duty. When an object is reported as absent (e.g., disk or host failure), the object remains available but not in a redundant state. vSAN identifies components that go absent and begins a repair process to satisfy the original protection policy. Having enough free space is important for rebuilding failed hosts and devices.

The requirements and operational guidance for free space fall into two categories.

  • All vSAN versions prior to vSAN 7 U1.  This free space required for these transient operations was referred to as "slack space."  The limitations of vSAN in versions prior to vSAN 7 U1 meant that there was a generalized recommendation of free space as a percentage of the cluster (25-30%), regardless of the cluster size.  See the post “Revisiting vSAN’s Free Capacity Recommendations” (in versions prior to vSAN 7 U1) for a more detailed understanding of slack space.
  • All vSAN versions including and after vSAN 7 U1.  The free space required for these transient operations are now referred to as "Reserved Capacity."  This is comprised of two elements:  "Operations Reserve" and "Host Rebuild Reserve."  (The term "slack space" is no longer applicable for vSAN 7 U1 and later).  Significant improvements were included in vSAN 7 U1 reduce the required capacity necessary for efficient vSAN operations.  See the post "Effective Capacity Management with vSAN 7 U1" for a more detailed understanding of "Operations Reserve."

With the "Reserved Capacity" function new in vSAN 7 U1, the "Host Rebuild Reserve" is responsible for ensuring the appropriate amount of N+1 free capacity should a sustained host failure occur.  Unlike previous editions of vSAN, the Host Rebuild Reserve is proportional to the size of the vSAN cluster.  Larger vSAN clusters will require proportionally less host rebuild reserve than smaller clusters.  Using the vSAN Sizer will calculate this value for new clusters, and enabling the feature (found in the "Configure" screen of the vSAN cluster capacity view) will provide the actual host rebuild reserve capacity needed for a cluster.

Recommendation: The "reserved capacity" functionality is an optional toggle that is not enabled in a vSAN cluster by default for new or for existing clusters that were upgraded.  To ensure sufficient free capacity to meet your requirements, it is recommended to turn it on if your vSAN topology and configuration supports it.

FIGURE 4-2: Illustrating how free space is critical for repairs, rebuilds, and other types of resynchronization traffic

Summary

Running with a level of free space is not a new concept in infrastructure design. For all versions of vSAN up to and including vSAN 7, VMware recommends that disk capacity maintain a slack space of 25–30% to avoid excessive rebuild and rebalance operations. For all clusters running vSAN 7 U1 and later, VMware recommends using the vSAN sizer to accurately calculate the required capacity needed for transient operations and host failures.

Automatic Rebalancing in a vSAN Cluster

vSAN 6.7 U3 introduced a new method for automatically rebalancing data in a vSAN cluster. Some customers have found it curious that this feature is disabled by default in 6.7 U3 as well as vSAN 7. Should it be enabled in a VCF or vSAN environment, and if so, why is it disabled by default? Let's explore what this feature is, how it works, and learn if it should be enabled.

Rebalancing in vSAN Explained

The nature of a distributed storage system means that data will be spread across participating nodes. vSAN manages all of this for you. Its cluster-level object manager is not only responsible for the initial placement of data, but ongoing adjustments to ensure that the data continues to adhere to the prescribed storage policy. Data can become imbalanced for many reasons: Storage policy changes, host or disk group evacuations, adding hosts, object repairs, or overall data growth.

vSAN's built-in logic is designed to take a conservative approach when it comes to rebalancing. It wants to avoid moving data unnecessarily. This would consume resources during the resynchronization process and may result in no material improvement. Similar to DRS in vSphere, the goal of vSAN's rebalancing is not to strive for perfect symmetry of capacity or load across hosts, but to adjust data placement to reduce the potential of contention of resources. Accessing balanced data will result in better performance as it reduces the potential of reduced performance due to resource contention.

vSAN offers two basic forms of rebalancing:

  • Reactive Rebalancing. This occurs when vSAN detects any storage device that is near or at 80% capacity utilization and will attempt to move some of the data to other devices that fall below this threshold. A more appropriate name for this might be "Capacity Constrained Rebalancing." This feature has always been an automated, non-adjustable capability.
  • Proactive Rebalancing. This occurs when vSAN detects any storage device is consuming a disproportionate amount of its capacity in comparison to other devices. By default, vSAN looks for any device that shows a delta of 30% or greater capacity usage than any other device. A more suitable name for this might be "Capacity Symmetry Rebalancing." Prior to vSAN 6.7 U3, this feature was a manual operation but has since been in introduced as an automated, adjustable capability.****

Rebalancing activity only applies to the discrete devices (or disk groups) in question, and not the entire cluster. In other words, if vSAN detects a condition that is above the described thresholds, it will move the minimum amount of data from those disks or disk groups to achieve the desired result. It does not arbitrarily shuffle all of the data across the cluster. Both forms of rebalancing are based entirely off of capacity usage conditions, not load or activity of the devices.

The described data movement by vSAN will never violate the storage policies prescribed to the objects. vSAN's cluster-level object manager handles all of this so that you don't have to.

Manual Versus Automated Operations

Before vSAN 6.7 U3, Proactive Rebalancing was a manual operation. If it detected a large variance, it would trigger a health alert condition in the UI, which would then present a "Rebalance Disks" button to remediate the condition. If clicked, a rebalance task would occur at an arbitrary time within the next 24 hours. Earlier editions of vSAN didn't have the proper controls in place to provide this as an automated feature. Clicking on the "Rebalance Disks" left some users uncertain if and when anything would occur. With the advancement of a new scheduler and Adaptive Resync introduced in 6.7, as well as all-new logic introduced in 6.7 U3 to calculate resynchronization completion times, VMware changed this feature to be an automated process.

The toggle for enabling or disabling this cluster-level feature can be found in vCenter, under Configure > vSAN > Services > Advanced options > "Automatic Rebalance" as shown in Figure 4-3.

FIGURE 4-3: Configuring "Automatic Rebalance" in the "Advanced Options" of the cluster

Recommendation: Keep the "Rebalancing Threshold %" entry to the default value of 30. Decreasing this value could increase the amount of resynchronization traffic and cause unnecessary rebalancing for no functional benefit.

The "vSAN Disk Balance" health check was also changed to accommodate this new capability. If vSAN detects an imbalance that meets or exceeds a threshold while automatic rebalance is disabled, it will provide the ability to enable the automatic rebalancing, as shown in Figure 4-4. The less-sophisticated manual rebalance operation is no longer available.

FIGURE 4-4: Remediating the health check condition when Automatic Rebalancing is disabled.

Once the Automatic Rebalance feature is enabled, the health check alarm for this balancing will no longer trigger, and rebalance activity will occur automatically.

Accommodating All Environments and Conditions

The primary objective of proactive rebalancing was to more evenly distribute the data across the discrete devices to achieve a balanced distribution of resources, and thus, improved performance. Whether the cluster is small or large, automatic rebalancing through the described hypervisor enhancements addresses the need for the balance of capacity devices in a scalable, sustainable way.

Other approaches are wrought with challenges that could easily cause the very issue that a user is trying to avoid. For example, implementing a time window for rebalancing tasks would assume that the associated resyncs would always impact performance – which is untrue. It would also assume the scheduled window would always be sufficiently long enough to accommodate the resyncs, which would be difficult to guarantee. This type of approach would delay resyncs unnecessarily by artificial constraints, increase operational complexity, and potentially decrease performance.

Should Automatic Rebalancing Be Enabled?

Yes, it is recommended to enable the automatic rebalancing feature on your vSAN clusters. When the feature was added in 6.7 U3, VMware wanted to introduce the capability slowly to customer environments and remains this way in vSAN 7. With the optimizations made to our scheduler and resynchronizations in recent editions, the feature will likely end up enabled by default at some point. There may be a few rare cases in which one might want to temporarily disable automatic rebalancing on the cluster. Adding a large number of additional hosts to an existing cluster in a short amount of time might be one of those possibilities, as well as perhaps nested lab environments that are used for basic testing. In most cases, automatic rebalancing should be enabled.

Viewing Rebalancing Activity

The design of vSAN's rebalancing logic emphasizes a minimal amount of data movement to achieve the desired result. How often are resynchronizations as the result of rebalancing occurring in your environment? The answer can be easily found in the disk group performance metrics of the host. Rebalance activity will show up under the "rebalance read" and "rebalance write" metrics An administrator can easily view the VM performance during this time to determine if there was any impact on guest VM latency. Thanks to Adaptive Resync, even under the worst of circumstances, the impact on the VM will be minimal. In production environments, you may find that proactive rebalancing does not occur very often.

Summary

The automatic rebalancing feature found in VCF environments powered by vSAN 6.7 U3 and later, is a powerful new way to ensure optimal performance through the proper balance of resources and can be enabled without hesitation.

Managing Orphaned Datastore Objects

vSAN is an object-based datastore. The objects typically represent entities such as Virtual Machines, Performance history database, iSCSI objects, Persistent volumes, and vSphere Replication Data. An object may inadvertently lose its association with a valid entity and become orphaned. Objects in this state are termed as orphaned or unassociated objects. While orphaned objects do not critically impact the environment, they contribute to unaccounted capacity and skew reporting.

Common causes for orphaned objects include but not limited to:

  • Objects that were created manually instead of using vCenter or an ESXi host
  • Improper deletion of a virtual machine such as deleting files through a command-line interface(CLI)
  • Using vSAN datastore to store non-standard entities such as ISO images
  • Manage files directly through vSAN datastore browser
  • Residual objects caused by incorrect snapshot consolidation or removal by 3rd party utilities

Identification and Validation

Unassociated objects can be ascertained through command-line utilities such as Ruby vSphere Console(RVC) and Go-based vSphere CLI(GOVC). RVC is embedded as part of the vCenter Server Appliance(vCSA). GOVC is a single static binary that is available in GitHub and can be installed across different OS platforms.

Here are the steps to identify the specific objects,

RVC

Command Syntax: vsan.obj_status_report -t <pathToCluster>

Sample Command and Output:

>vsan.obj_status_report /localhost/vSAN-DC/computers/vSAN-Cluster/ -t

2020-03-19 06:05:29 +0000: Querying all VMs on vSAN .

Histogram of component health for possibly orphaned objects

+-------------------------------------+------------------------------+

|Num Healthy Comps / Total Num Comps | Num objects with such status |

+-------------------------------------+------------------------------+

+-------------------------------------+------------------------------+

Total orphans: 0

GOVC

Command Syntax: govc datastore.vsan.dom.ls -ds <datastorename> -l -o

Sample Command: govc datastore.vsan.dom.ls -ds vsanDatastore -l -o

<Command does not return an output if no unassociated objects are found>

Additional Reference to this task can be found at KB 70726

Recommendation: Contact VMware Technical Support to help validate and delete unassociated objects. Incorrect detection and deletion of unassociated objects may lead to loss of data.

Summary

Multiple reasons can cause objects to become unassociated from a valid entity. The existence of unassociated objects does not critically affect the production workloads. However, these objects could gradually consume significant capacity leading to operational issues. Command-line utilities help identify such objects and, to a certain extent, also help in understanding the root cause. While the CLI utilities also enable the deletion of unassociated objects, it is recommended to contact VMware Technical Support to assist with the process.

Section 5: Storage Policy Operations

Operational Approaches of Using SPBM in an Environment

The flexibility of SPBM allows administrators to easily manage their data center in an outcome-oriented manner. The administrator determines the various storage requirements for the VM, and assigns them as rules inside a policy. vSAN takes care of the rest, ensuring compliance of the policy.

FIGURE 5-1: Multiple rules apply to a single policy, and a single policy applies to a group of VMs, a single VM, or a single VMDK

This form of management is quite different than commonly found with traditional storage. This level of flexibility introduces the ability to prescriptively address changing needs for applications. These new capabilities should be part of how IT meets the needs of the applications and the owners that request them.

When using SPBM for vSAN, the following guidance will help operationalize this new management technique in the best way possible.

  • Don’t hesitate to be prescriptive with storage policies if needed. If an SQL server—or perhaps just the virtual machine disk (VMDK) of the SQL server—serving transaction logs needs higher protection, create and assign a storage policy for this need. The storage policy model exists for this very reason.
  • Refrain from unnecessary complexity. Take an “as needed” approach for storage policy rules. Storage policy rules such as limits for input/output operations per second (IOPS) allow you to apply limits to a wide variety of systems quickly and easily, but may restrict performance unnecessarily. See “Using Workload Limiting Policies” in this section for more information.
  • Be mindful of the physical capabilities of your hosts and network in determining what policy settings should be used as a default starting point for VMs in a cluster. The capabilities of the hosts and network play a significant part in vSAN’s performance. In an on-premises environment where hardware specifications may be modest, a more performance-focused RAID-1-based policy might make sense. In a VMware Cloud (VMC) on Amazon Web Services (AWS) environment, where host and network specifications are top-tier but capacity comes at a premium, it might make more sense to have a RAID-5-based policy to take advantage of its space efficiency.
  • Be mindful of the physical capabilities of your cluster. Storage policies allow you to define various levels of protection and space efficiency for VMs. Some higher levels of resilience and space efficiency may require more hosts than are in the cluster. Review the cluster capabilities before assigning a storage policy that may not be achievable due to a limited number of hosts.
  • Monitor system behavior before and after storage policy changes. With the vSAN performance service, you can easily monitor VM performance before and after a storage policy change to see if it meets the requirements of the application owners. This is how to quantify how much of a performance impact may occur on a VM. See the section “Monitoring vSAN Performance” for more information.

Recommendation: Do not change the vSAN policy known as the “default storage policy.” It represents the default policy for all vSAN clusters managed by that vCenter server. If the default policy specifies a higher layer of protection, smaller clusters may not be able to comply.

Storage policies can always be adjusted without interruption to the VM. Some storage policy changes will initiate resynchronization to adjust the data to adhere to the new policy settings. See the topic “Storage Policy Practices to Minimize Resynchronization Activities” for more information.

Storage policies are not additive. You cannot apply multiple policies to one object. Remember that a single storage policy is a collection of storage policy rules applied to a group of VMs, a single VM, or even a single VMDK.

Recommendation: Use some form of a naming convention for your storage policies. A single vCenter server houses storage policies for all clusters that it manages. As the usefulness of storage policies grows in an organization, naming conventions can help reduce potential confusion. See the topic “Managing a Large Number of Storage Policies.”

Summary

Become familiar with using vSAN storage policies in an environment so administration teams can use storage policies with confidence. Implement some of the recommended practices outlined here and in other storage policy related topics for a more efficient, predictable outcome for changes made to an infrastructure and the VMs it powers.

Creating a vSAN Storage Policy

SPBM from VMware enables precise control of storage services. Like other storage solutions, vSAN provides services such as availability levels, capacity consumption, and stripe widths for performance.

Each VM deployed to a vSAN datastore is assigned at least one storage policy that defines VM storage requirements, such as performance and availability. If you do not assign a storage policy when provisioning a VM, vSAN assigns the Default Storage Policy. This policy has a level of FTT set to 1, a single disk stripe per object, and a thin-provisioned virtual disk.

FIGURE 5-2: Setting policy rules within a vSAN storage policy

The following is a detailed list of all the possible vSAN storage policy rules.

When you know the storage requirements of your VMs, you can create a storage policy referencing capabilities the datastore advertises. Create several policies to capture different types or classes of requirements. When determining the use of RAID-1 versus RAID-5/6, consider the following:

  • RAID-1 mirroring requires fewer I/O operations to the storage devices, so it can provide better performance. For example, a cluster resynchronization takes less time to complete with RAID-1. It is, however, a full mirror copy of the object, meaning it requires twice the size of the virtual disk.
  • RAID-5 or RAID-6 erasure coding can provide the same level of data protection as RAID-1 mirroring while using less storage capacity.
  • RAID-5 or RAID-6 erasure coding does not support an FTT = 3.
  • Consider these guidelines when configuring RAID-5 or RAID-6 erasure coding in a vSAN cluster.

Summary

Before creating VM storage policies, it is important to understand how capabilities affect the consumption of storage in the vSAN cluster. Find more information about designing and sizing of storage policies on core.vmware.com.

Managing a Large Number of Storage Policies

The flexibility of SPBM allows administrators to easily manage their data center in an outcome-oriented manner. The administrator determines the storage requirements for the VM, assigns them as rules in a policy, and lets vSAN ensure compliance of the policy.

Depending on the need, an environment may require a few storage policies, or dozens. Before deciding what works best for your organization, let’s review a few characteristics of storage policies with SPBM.

  • A maximum of 1,024 SPBM policies can exist per vCenter server.
  • A storage policy is stored and managed per server but can be applied to in one or more clusters.
  • A storage policy can define one or many rules (around performance, availability, and space efficiency, for example).
  • Storage policies are not additive. Apply only one policy (with one or more rules) per object.
  • A storage policy can be applied to a group of VMs, a single VM, or even a single VMDK.
  • A storage policy name can consist of up to 80 characters.
  • A storage policy name is not the true identifier. Storage policies use a unique identifier for system management.

With a high level of flexibility, users are often faced with the decision of how best to name policies and apply them to their environments.

Storage policy naming considerations

Policy names are most effective when they include two descriptors: intention and scope.

  • The intention is what the policy aims to achieve. Perhaps the intention is to apply high-performing mirroring using RAID-1, with an increased level of protection by using an FTT level of 2.
  • The scope is where the policy will be applied. Maybe the scope is a server farm hosting the company ERP solution, or perhaps it is just the respective VMDKs holding databases in a specific cluster.

Let’s examine the policies in FIGURE 5-3.

FIGURE 5-3: A listing of storage policies managed by vCenter

  • CLO1-R1-FTT1: CL01 (Cluster 1) R1 (RAID-1 Mirror) FTT=1 (Failures to Tolerate = 1)
  • CLO1-R1-FTT2-SW6: CL01 (Cluster 1) R1 (RAID-1 Mirror) FTT=2 (Failures to Tolerate = 2) SW6 (Stripe Width = 6)
  • CLO1-R5-FTT2-SW6: CL01 (Cluster 1) R5 (RAID-5 Mirror) FTT=2 (Failures to Tolerate = 2) SW6 (Stripe Width = 6)

Recommendation: Avoid using and changing the default vSAN storage policy. If a RAID-1 FTT=1 policy is desired, simply clone the default storage policy. Create and clone storage policies as needed.

Determine the realistic needs of the organization to find the best storage policy naming conventions for an environment. A few questions to ask yourself:

  • What is the size of the environment?
  • Are there multiple clusters? How many?
  • Are there stretched clusters?
  • Is the preference to indicate actual performance/protection settings within names, or to adopt a gold/silver/bronze approach
  • Are application-specific storage policies needed?
  • Are VMDK-specific storage policies needed?
  • What type of delimiter works best (spaces, hyphens, periods)? What works best in conjunction with scripting?
  • Are there specific departments or business units that need representation in a storage policy name?
  • Who is the intended audience? Virtualization administrators? Application owners? Automation teams? This can impact the level of detail you provide in a policy name.

The answers to these questions will help determine how to name storage policies, and the level of sophistication used.

Summary

An administrator has tremendous flexibility in determining what policies are applied, where they are applied, and how they are named. Having an approach to naming conventions for policies that drive the infrastructure will allow you to make changes to your environment with confidence.

Storage Policy Practices to Improve Resynchronization Management in vSAN

SPBM allows administrators to change the desired requirements of VMs at any time without interrupting the VM. This is extremely powerful and allows IT to accommodate change more quickly.
Some vSAN storage policy rules change how data is placed across a vSAN datastore. This change in data placement temporarily creates resynchronization traffic so that the data complies with the new or adjusted storage policy. Storage policy rules that influence data placement include:

  • Site disaster tolerance (any changes to the options below)
    • None—Standard cluster
    • None—Standard cluster with nested fault domains
    • Dual site mirroring (stretched cluster)
    • None—Keep data on preferred (stretched cluster)
    • None—Keep data on non-preferred (stretched cluster)
    • None—Stretched cluster
  • FTT (any changes to the options below)
    • No data redundancy
    • 1 failure—RAID-1 (mirroring)
    • 1 failure—RAID-5 (erasure coding)
    • 2 failures—RAID-1 (mirroring)
    • 2 failures—RAID-6 (erasure coding)
    • 3 failures—RAID-1 (mirroring)
    • Number of disk stripes per object

This means that if a VM’s storage policy is changed, or a VM is assigned a new storage policy with one of the rules above different than the current policy rules used, it will generate resynchronization traffic so that the data can comply with the new policy definition. When a large number of objects have their storage policy adjusted, the selection order is arbitrary and cannot be controlled by the
end user.

Recommendation: Use the VMs view in vCenter to view storage policy compliance. When a VM is assigned a new policy, or has its existing policy changed, vCenter will report it as “noncompliant” during the period it is resynchronizing. This is expected behavior.

Recommendations for policy changes for VMs

Since resynchronizations can be triggered by adjustments to existing storage policies, or by applying a new storage policy, the following are recommended.

  • To minimize the influence of backend resynchronization traffic on frontend VM traffic, ensure the environment is running vSAN 6.7 or later. vSAN 6.7 introduced Adaptive Resync, which improves the resource management and priority of these different traffic types and can alleviate some of the challenges of large quantities of resynchronization activities.
  • For changes to a large quantity of VMs, consider creating a new storage policy, and apply the VMs to that new storage policy in batches. This helps reduce a backlog of pending resynchronization activity, and helps the administrator provide order on which VMs have their policies changed first. Once the batches are complete, the old policy can be removed, and the new policy can be renamed to the previous policy name. Note that in vSAN 6.7 U3 and later, this recommendation is unnecessary, as vSAN will perform the resynchronizations in batches to better regulate the number of VMs resynchronizing, and the amount of free space temporarily used.
  • Avoid changing an existing policy, unless the administrator is very aware of what VMs it affects. Remember that a storage policy is a construct of a vCenter server, so that storage policy may be used by other VMs in other clusters. See the topic “Using Storage Policies in Environments with More Than One vSAN Cluster” for more information.
  • If there are host failures, or any other condition that may have generated resynchronization traffic, refrain from changing storage policies at that time.

Visibility of resynchronization activity can be found in vCenter or vRealize Operations. vCenter presents it in the form of resynchronization IOPS, throughput, and latency, and does so per disk group for each host. This can offer a precise level of detail but does not provide an overall view of resynchronization activity across the vSAN cluster.

FIGURE 5-4: Resynchronization IOPS, throughput, and latency of a disk group in vCenter, courtesy of the vSAN performance service

vRealize Operations can offer a unique, cluster-wide view of resynchronization activity by showing a burn down rate—or, rather, the amount of resynchronization activity (by data and by object count) that remains to be completed. This is an extremely powerful view to better understand the magnitude of resynchronization events occurring.

FIGURE 5-5: Resynchronization burn down rates across the entire vSAN cluster using a vRealize Operations dashboard

For more information on the capabilities of vRealize Operations to show the total amount of resynchronization activity in the cluster, see the blog post “Better Visibility with New vSAN Metrics in vR Ops 7.0.”

Recommendation: Do not attempt to throttle resynchronizations using the manual slider bar provided in the vCenter UI. This is a feature that predates Adaptive Resync and should only be used under the advisement of GSS in selected corner cases.

Summary

Resynchronizations are a natural result of applying new storage policies or changing an existing storage policy to one or more VMs. While vSAN manages much of this for the administrator, the recommendations above provide better operational understanding in how to best manage policy changes.

Using Workload Limiting Policies (IOPS Limits) on vSAN-Powered Workloads

The IOPS limits storage policy rule found in vSAN is a simple and flexible way to limit the amount of resources that a VMDK can use. IOPS limits can be applied to a few select VMs, or applied broadly to VMs in a cluster. While easy to enable, there are specific considerations in how performance metrics will be rendered when IOPS-limit rules are enforced.

Note that for VMs running in a vSAN environment, IOPS limits are enforced exclusively through storage policies. VMDK-specific IOPS limits through Storage I/O Control (SIOC) have no effect.

Understanding how IOPS limits are enforced

The rationale behind capping one or more VMDKs within a VM with an artificial IOPS limit is simple. Since the busiest VMs aren’t always the most important, IOPS limits can curtail a “noisy neighbor” consuming disproportionate resources. This can free these resources and help ensure more predictable performance across the cluster.

Measuring and throttling I/O payload using just the IOPS metric has its challenges. I/O sizes can vary dramatically, typically ranging from 4KB to 1MB in size. This means that one I/O could be 256 times the size of another, with one taking much more effort to process. When enforcing IOPS limits, vSAN uses a weighted measurement of I/O.

When applying an IOPS-limit rule to an object within vSAN, the vSAN I/O scheduler “normalizes” the size in 32KB increments. This means that an I/O under 32KB is seen as one I/O, an I/O under 64KB is seen as two, and so on. This provides a better-weighted representation of various I/O sizes in the data stream and is the same normalization increment used when imposing limits for VMs running on non-vSAN-based storage (SIOC v1).

Note that vSAN uses its own scheduler for all I/O processing and control, and thus does not use SIOC for any I/O control. For vSAN-powered VMs, normalized IOPS can be viewed adjacent to vSCSI IOPS at the VMDK level, as shown in FIGURE 5-6. When workloads use large I/O sizes, the normalized IOPS metric may be significantly higher than the IOPS observed at the vSCSI layer.

FIGURE 5-6: Viewing normalized IOPS versus vSCSI IOPS on a VMDK

This normalization measurement occurs just as I/Os are entering in the top layer of the vSAN storage stack from the vSCSI layer. Because of this, I/Os coming from or going to the vSAN caching layer, the capacity tier, or client cache on the host are accounted for in the same way. Enforcement of IOPS limits only apply to I/Os from guest VM activity. Traffic as the result of resynchronization and cloning is not subject to the IOPS-limit rule. Reads and writes are accounted for in an equal manner, which is why they are combined into a single normalized IOPS metric as shown in FIGURE 5-6. When IOPS limits are applied to an object using a storage policy rule, there is no change in behavior if demand does not meet or exceed the limit defined. When the number of I/Os exceed the defined threshold, vSAN enforces the rule by delaying the I/Os so the rate does not exceed the established threshold. Under these circumstances, the time to wait for completion (latency) of an I/O is longer.

Viewing enforced IOPS limits using the vSAN performance service

When a VM exceeds an applied IOPS-limit policy rule, any period that the IOPS limit is being enforced shows up as increased levels of latency on the guest VMDK. This is expected behavior. Figure 5-7 demonstrates the change in IOPS, and the associated latency under three conditions:

  • No IOPS-limits rule
  • IOPS limit of 200 enforced
  • IOPS limit of 400 enforced

FIGURE 5-7: Observing enforced IOPS limits on a single VMDK, and the associated vSCSI latency

Note that, in FIGURE 5-7, the latency introduced reflects the degree IOPS need to be suppressed to achieve the limit. Suppressing the workload less results in lower latency. For this workload, suppressing the maximum IOPS to 200 introduces two to three times the amount of latency when compared to capping the IOPS to 400.

Latency introduced by IOPS limits shows up elsewhere. Observed latencies increase at the VM level, the host level, the cluster level, and even with applications like vRealize Operations. This is important to consider, especially if the primary motivation for using IOPS limits was to reduce latency for other VMs. When rendering latency, the vSAN performance service does not distinguish whether latency came from contention in the storage stack or latency from enforcement of IOPS limits. This is consistent with other forms of limit-based flow control mechanisms.

IOPS limits applied to some VMs can affect VMs that do not use the storage policy rule. FIGURE 5-8 shows a VM with no IOPS limits applied, yet the overall I/O was reduced during the same period as the VM shown in FIGURE 5-7. How does this happen? In this case, the VM shown in FIGURE 5-7 is copying files to and from the VM shown in FIGURE 5-7. Since it is interacting with a VM using IOPS limits, it is being constrained by that VM. Note here that unlike the VM shown in FIGURE 5-7, the VM shown in FIGURE 5-8 does not have any significant increase in vSCSI latency because the reduction in I/O is forced by the other VM in this interaction, and not by a policy applied to this VM.

FIGURE 5-8: A VM not using an IOPS-limit rule being affected by a VM using an IOPS-limit rule

It is easy to see how IOPS limits could have secondary impacts to multi-tiered applications or systems that regularly interact with each other. Unfortunately, this reduction in performance could go easily undetected, as latency would not be the leading indicator of a performance issue.

Note that in vSAN 7 U1, latency as a result of enforced IOPS limits can be easily identified in the UI.  The graphs will now show a highlighted yellow region for the time periods in which latency is as a result of the IOPS enforcement.

Recommendation: Avoid using the IOPS-limit rule simply because of its ease of use. Use it prescriptively, and test the results on the impact of the VM and any dependent VMs. Artificially imposing IOPS limits can introduce secondary impacts that may be difficult to monitor and troubleshoot.

Summary

IOPS limits can be applied across some or all VMs in a vSAN cluster as a way to cap resources, and allow for growth in performance at a later date. However, VM objects controlled by IOPS-limit policy rules enforce this limit by introducing latency to the VM to not exceed the limit. This can be misleading to a user viewing performance metrics unaware that IOPS limits may be in use. It is recommended to fully understand the implications of an enforced IOPS-limit storage policy rule on the VMs, and weigh that against VMs able to complete tasks more quickly at the cost of temporarily using a higher level of IOPS.

Much of this information was originally posted under “Performance Metrics When Using IOPS Limits with vSAN—What You Need to Know” and here to assist in the effort of operationalizing vSAN. For more information on understanding and troubleshooting performance in vSAN, see the recently released white paper, “Troubleshooting vSAN Performance,” on core.vmware.com.

Using Space-Efficient Storage Policies (Erasure Coding) with Clusters Running DD&C

VMware vSAN offers two types of space efficiency techniques for all-flash vSAN clusters. Both types can be used together or individually and have their own unique traits. Understanding the differences between behavior and operations helps administrators determine what settings may be the most appropriate for an environment.

DD&C is an opportunistic space efficiency feature enabled as a service at the cluster level. The amount of savings is based on the type of data and the physical makeup of the cluster. vSAN automatically looks for opportunities to deduplicate and compress the data per disk group as it is destaged from the write buffer to the capacity tier of the disk group. DD&C could be best summarized by the following:

  • Offers an easy “set it and forget it” option for additional space savings across the cluster
  • Small bursts of I/O do not see an increase in latency in the guest VM
  • No guaranteed level of space savings
  • Comes at the cost of additional processing effort to destage the data

Recommendation: Since DD&C is a cluster-based service, make the decision for its use per cluster. It may be suitable for some environments and not others.

RAID-5 and RAID-6 erasure codes are a data-placement technique that stripe the data with parity across a series of nodes. This offers a guaranteed level of space efficiency while maintaining resilience when compared to simplistic RAID-1 mirroring. Unlike DD&C, RAID-5/6 can be assigned to a group of VMs, a single VM, or even a single VMDK through a storage policy. RAID-5/6 could be best summarized by the following:

  • Guaranteed level of space savings
  • Prescriptive assignment of space efficiency where it is needed most
  • I/O amplification for all writes, impacting latency
  • May strain network more than RAID-1 mirroring

Impacts when using the features together

The information below outlines considerations to be mindful of when determining the tradeoffs of using both space efficiency techniques together.

  • Reduced levels of advertised deduplication ratios. It is not uncommon to see vSAN’s advertised DD&C ratio reduced when combining RAID-5/6 with DD&C, versus combining RAID-1 with DD&C. This is because data placed in a RAID-5/6 stripe is inherently more space efficient, translating to fewer potential duplicate blocks. Note that while one might find the advertised DD&C ratios reduced when using RAID-5/6 erasure coding, many times the effective overall space saved may increase even more. This is described in detail in the “Analyzing Capacity Utilization with vRealize Operations” section of “vRealize Operations and Log Insight in vSAN Environments.” This is one reason the DD&C ratio alone should not be looked used to understand space efficiency savings.
  • Hardware specification changes amount of impact. Whether the space efficiency options are used together or in isolation, each method can place additional resources on the hardware. This is described in more detail below.
  • Workload changes amount of impact. For any storage system, write operations are more resource intensive to process than read operations. The differences between not running any space efficiency techniques, to running both space efficiency techniques is highly dependent on how write intensive the workloads are. Sustained writes can be a performance challenge for any storage system. Space efficiency techniques add to this challenge and compound when used in combination. Infrastructure elements most affected by the features are noted below.
  • DD&C. This process occurs during the destaging process from the buffer device to the capacity tier. vSAN consumes more CPU resources during the destaging process, slowing the destaging or drain rate. Slowing the rate for sustained write workloads effectively increases the buffer fill rate. When the buffer meets numerous fullness thresholds, it slows the rate of write acknowledgments sent back to the VM, increasing latency. Large buffers, multiple disk groups, and faster capacity devices in the disk groups can help reduce the level of impact.
  • RAID-5/6 erasure coding. vSAN uses more CPU and network resources during the initial write from the guest VM to the buffer, as the I/O amplification increase significantly. The dependence on sufficient network performance increases when compared to RAID-1, as vSAN must wait for the completion of the writes to more nodes across the network prior to sending the write acknowledgment back to the guest. Physical network throughput and latency become critical. Fast storage devices for write buf fees (NVMe-based), multiple disk groups, and high-quality switchgear offering higher throughput (25Gb or higher) and lower latency networking can help alleviate this.

Since the underlying hardware can influence the degree of performance impact, determining the use of RAID-1 mirroring versus RAID-5/6 erasure coding should be evaluated case by case. For instance, clusters running a large number of VMs using RAID-5/6 may see a much less significant impact in performance using 25Gb networking than the same VMs running with 10Gb networking. Clusters using disk groups composed of extremely fast storage devices can bode all VMs, but in particular, VMs using RAID-1 mirroring. See the topic “Operational Approaches of Using SPBM in an Environment” for more information.

Recommendation: Observe the guest read/write I/O latency as seen by the VM after a VM or VMDK has had a change to a storage policy to/from RAID-5/6. This provides a good “before vs. after” to see if the physical hardware can meet some of the performance requirements. Be sure to observe latencies when there is actual I/O activity occurring. Latency measurements during periods of no I/O activity are not meaningful.

vSAN 7 U1 introduces a new cluster based "compression only" space efficiency feature.  This offers a level of space efficiency that is suitable for a wider variety of workloads, and minimizes performance impact.  More information on this feature, and if it is right for your environment can be found on the blog post:  "Space Efficiency Using the New "Compression only" Option in vSAN 7 U1

Testing

One or both space efficiency techniques can be tested with no interruption in uptime. Each possesses a different level of burden on the system to change.

  • DD&C can be toggled off, but to do so, vSAN must perform a rolling evacuation of data from each disk group on a host to reformat the disk group. This is generally why it is recommended to decide on the use of DD&C prior to the deployment of a cluster.
  • RAID-5/6 erasure coding can be changed to a RAID-1 mirror by assigning a new policy to the VMs using the erasure coding scheme. This will, however, create resynchronization traffic for the changed systems and use slack space to achieve the change. See the topic “Storage Policy Practices to Improve Resynchronization Management in vSAN” for more information.

Recommendation: Any time you go from using space efficiency techniques to not using them, make sure there is sufficient free space in the cluster. Avoiding full-cluster scenarios is an important part of vSAN management.

Summary

The cluster-level feature of DD&C, as well as RAID-5/6 erasure coding applied using storage policies, offers flexibility for the administrator. The decision process for determining the use of DD&C should be done per cluster, and the decision process for RAID-5/6 should be done per VM or VMDK.

Using Number of Disk Stripes Per Object on vSAN-Powered Workloads

The number of disk stripes per object storage policy rule aims to improve the performance of vSAN by distributing object data across more capacity devices. Sometimes referred to as “stripe width,” it breaks the object components into smaller chunks on the capacity devices so I/O can occur at a higher level of parallelism. When it should be used, and to what degree it improves performance, depend on a number of factors.

FIGURE 5-9: The “number of disk stripes per object” policy rule with an object using a RAID-1 mirror, and how it will impact object component placement

How many devices the object is spread across depends on the value given in the policy rule. A valid number is between 1 and 12. When an object component uses a stripe width of 1, it resides on at least one capacity device. When an object component uses a stripe width of 2, it is split into two components, residing on at least two devices.  When using stripe width with RAID-5/6 erasure codes, the behavior will depend on the version of vSAN used.  

  • In versions prior to vSAN 7 U1, the stripe itself did not contribute to the stripe width.  A RAID-5 object with a stripe width of 1 would have a total of 4 components spread across 4 hosts. A RAID-5 object with a stripe width of 2 would have a total of 8 components spread across 4 or more hosts.  A RAID-5 object with a stripe width of 4 would have a total of 16 components spread across 4 or more hosts.
  • In vSAN 7 U1 and later, the stripe itself contributes to the stripe width count.  A RAID-5 object width a stripe width of 4 will have 4 components spread across 4 hosts.  A RAID-5 object with a tripe width of 8 would have a total of 8 components spread across 4 or more hosts.

Up until vSAN 7 U1, components of the same stripe would strive to reside in the same disk group.  From vSAN 7 U1 forward, components of the same stripe will strive to reside on different disk groups to improve performance.

The storage policy rule simply defines the minimum. vSAN may choose to split the object components even further if the object exceeds 255GB (the maximum size of a component), uses a RAID-5/6 erasure coding policy, or vSAN needs to split the components for rebalancing objectives.  New to vSAN 7 U1, there will also be limitations of stripe width settings for objects beyond 2TB in size.  The implemented maximum will be 3 for objects greater than 2TB.  Meaning that the first 2TB will be subject to the stripe width defined, with the rest of stripe using a stripe width of 3.  

Setting the stripe width can improve reads and writes but in different ways. Performance improves if the following conditions exist:

  • Writes: Writes committed to the write buffer are constrained by the physical capabilities of the devices to receive data from the buffer quickly enough. The slow rate of destaging eventually leads to full write buffers, increasing latency seen by the guest VMs.
  • Reads: Read requests of uncached data (“cache misses”) coming directly from capacity devices—whether spinning disk on a hybrid system or flash devices on an all-flash vSAN cluster. All-flash clusters read data directly from the capacity devices unless the data still resides in the buffer. Hybrid systems have a dedicated allocation on the cache device for reads, but fetches data directly from the capacity devices if uncached. This increases latency seen by the guest VMs.

The degree of improvement associated with the stripe width value depends heavily on the underlying infrastructure, the application in use, and the type of workflow. To improve the performance of writes, vSAN hosts that use disk groups with a large performance delta (e.g., NVMe buffer and SATA SSD for capacity, or SAS buffer and spinning disk for capacity) see the most potential for improvement. While systems such as vSAN clusters running NVMe at the buffer and the capacity tier would likely not see any improvement.

Depending on the constraints of the environment, the most improvement may come from increasing the stripe width from 1 to 2–4. Stripe width values beyond that generally offer diminishing returns and increase data placement challenges for vSAN. Note that stripe width increase improves performance only if it addresses the constraining element of the storage system.

Recommendation: Keep all storage policies to a default number of disk stripes per object of 1. To experiment with stripe width settings, create a new policy and apply it to the discrete workload to evaluate the results, increasing the stripe width incrementally by 1 unit, then view the results. Note that changing the stripe width will rebuild the components, causing resynchronization traffic.

The impact on placement flexibility from increasing stripe width

Increasing the stripe width may make object component placement decisions more challenging. Storage policies define the levels of FTT, which can be set from 0 to 3. When FTT is greater than 0, the redundant data must be placed on different hosts to maintain redundancy in the event of a failure—a type of anti-affinity rule. Increasing the stripe width means the object components are forced onto another device in the same disk group, another device in a different disk group on the same host, or on another host. Spreading the data onto additional hosts can make it more challenging for vSAN to honor both the stripe width rule and the FTT. Keep the stripe width set to 1 to maximize flexibility in vSAN’s data placement options.

When to leave the stripe width setting to a default of 1

  • Initially for all cluster types and all conditions.
  • Clusters with DD&C enabled. Object data placed on disk groups using DD&C sprinkles the data across the capacity devices in the disk group. This is effectively an implied stripe width (but does not show up in the UI indicating otherwise), and therefore setting the stripe width value has no bearing on how the data is placed on the capacity devices.
  • Smaller clusters.
  • Hosts using fewer capacity devices.
  • All-flash vSAN clusters with no large performance delta between the write buffer and capacity devices, and are meeting performance expectations as seen by the guest VM.
  • You want to maximize the data placement options for vSAN.

When to explore the use of increasing stripe width

  • Hybrid based vSAN clusters not meeting performance objectives.
  • Clusters with DD&C disabled and not meeting performance objectives.
  • Larger clusters not meeting performance objectives.
  • Clusters with a larger amount of capacity devices and not meeting performance objectives.
  • All-flash vSAN clusters not meeting performance objectives (for reasons such as sustained or streaming writes) and with a large performance delta between the write buffer and capacity devices.

Recommendation: The proper stripe width should not be based on a calculation of variables such as number of hosts, capacity devices, and disk group. While those are factors to consider, the proper stripe width (beyond 1) should always be a reflection of testing the results on a discrete workload, and understanding the tradeoffs in data placement flexibility.

Summary

VMware recommends leaving the “Number of disk stripes per object” storage policy rule to the default value of 1. While increasing the stripe width may improve performance in very specific conditions, it should only be implemented after testing against discrete workloads.

Operational Impacts of Using Different Types of Storage Policies

Using different types of storage policies across a vSAN cluster is a great example of a simplified but tailored management approach to meet VM requirements and is highly encouraged. Understanding the operational impacts of different types of storage policies against other VMs and the requirements of the cluster is important and described in more detail below.

What to look for

VMs using one policy may have a performance impact over VMs using another policy. For example, imagine a 6-node cluster using 10GbE networking and powering 400 VMs, 390 of them using a RAID-1 mirroring policy and 10 of them running a RAID-5 policy.

At some point, an administrator decides to change 300 of those VMs to the more network-reliant RAID-5 policy. With 310 of the 400 VMs using the more network-intensive RAID-5 policy, this is more likely to impact the remaining 90 VMs running the less network-intensive RAID-1 policy, as it may run into higher contending conditions on that 10GbE connection. This symptom and remediation steps are described in detail in the “Troubleshooting vSAN Performance” document under “Adjust storage policy settings on non-targeted VM(s) to reduce I/O amplification across cluster” on core.vmware.com.

The attributes of a storage policy can change the requirements of a cluster. For example, in the same 6-node cluster described above, an evaluation of business requirements has determined that 399 of the 400 VMs can be protected with a level of FTT of 1, and the remaining VM needs an FTT of 3. The cluster host count can easily comply with the minimum host count requirements associated with policies using an FTT=1, but 6 hosts is not sufficient running a policy with an FTT=3. In this case, the cluster would have to be increased to 7 hosts to meet the absolute minimum requirement, or 8 hosts to meet the preferable minimum requirement for VMs using storage policies with an FTT=3. It only takes one object assignment.

FIGURE 5-10: Minimum host requirements of storage policies (not including N+x sizing)

Other less frequently used storage policy rules can also impact data placement options for vSAN. Stripe width is one of those storage policy rules. Using a policy with an assigned stripe width rule of greater than 1 can make object component placement decisions more challenging, especially if it was designed to be used in another vSAN cluster with different physical characteristics. See the topic “Using Number of Disk Stripes Per Object on vSAN-Powered Workloads” for more information.

Summary

The flexibility of SPBM should be exploited in any vSAN cluster. Accommodating for the behaviors described above helps reduce any unexpected operational behaviors and streamlines management of the cluster and the VMs it powers.

Using Storage Policies in Environments with More Than One vSAN Cluster

The flexibility of SPBM allows administrators to easily manage their data center in an outcome-oriented manner. The administrator determines the various storage requirements for the VM, and assigns them as rules inside a policy. vSAN takes care of the rest:ensuring compliance of the policy.

While storage policies can be applied at a VM or even VMDK level, they are a construct of a vCenter server. vSAN storage policies are created and saved in vCenter, and a policy can be applied to any vSAN-powered VM or VMDK managed by that vCenter server. Since existing storage policies can be easily changed, the concern is that an administrator may be unaware of the potential impact of changing an existing policy used by VMs across multiple clusters.

Recommendation: If an administrator wants to change policy rules assigned to a VM or group of VMs, it is best to apply those VMs to a storage policy already created, or create a new policy if necessary. Changing the rules of an existing policy could have unintended consequences across one or more clusters. This might cause a large amount of resynchronizations as well as storage capacity concerns.

Improving operational practices with storage policies

In many cases, categorizing storage policies into one of three types is an effective way to manage VMs across larger environments:

  • Storage policies intended for all vSAN clusters. These might include simple, generic policies that could be used as an interim policy for the initial deployment of a VM.
  • Storage policies intended for a specific group of vSAN clusters. Storage policies related to VDI clusters, for example, or perhaps numerous branch offices that have very similar needs. Other clusters may have distinctly different intentions and should use their own storage policies.
  • Storage policies intended for a single cluster. These policies might be specially crafted for specific applications within a cluster—or tailored to the design and configuration of a specific cluster. This approach aligns well with the guidance found in the topic “Using Storage Policies in Environments with Both Stretched and Non-Stretched Clusters.” Since a stretched cluster is a cluster-based configuration, storage policies intended for standard vSAN clusters may not work with vSAN stretched clusters.

A blend of these offers the most flexibility while minimizing the number of storage policies created, simplifying ongoing operations.

This approach places additional emphasis on storage policy naming conventions. Applying some form of taxonomy to the storage policy names helps reduce potential issues where operational changes were made without an administrator being aware of the impact. Beginning the policy name with an identifying prefix is one way to address this issue.

Recommendation: When naming storage policies, find the best balance of descriptive, self-documenting storage policies, while not becoming too verbose or complex. This may take a little experimentation to determine what works best for your organization. See the topic, “Managing a Large Number of Storage Policies” for more information.

An example of using storage policies more effectively in a multi-cluster environment can be found in the illustration below. FIGURE 5-11 shows a mix of storage policies that fall under the three categories described earlier.

FIGURE 5-11: A mixture of shared and dedicated storage policies managed by a single vCenter server

These are only examples to demonstrate how storage policies can be applied across a single cluster, or several clusters in a vSAN-powered environment. The topology and business requirement determines what approach makes most sense for an organization.

Summary

For vSAN-powered environments consisting of more than one cluster, using a blend of storage policies that apply to all clusters as well as specific clusters provides the most flexibility for your environment while improving operational simplicity.

Using Storage Policies in Environments with Both Stretched and Non-Stretched Clusters

vSAN stretched clusters are an easy, fast, and flexible way to deliver cluster-level redundancy across sites using a capability built right into vSphere. Since it is enabled at the cluster level, a mix of stretched clusters and non-stretched clusters can easily coexist and all be managed by the same vCenter server. This flexibility can lead to operational decisions in the management of SPBM policies: The rules that govern the performance and protection requirements for your VMs. Storage policies created for standard (non-stretched) vSAN clusters adopt a different behavior when applied to VMs running in a vSAN stretched cluster.

Therefore, creating and using separate, purpose-built storage policies specifically for VMs in stretched clusters is recommended for single, and multi-cluster environments. Let’s go into more detail about this recommendation.

In the preliminary configuration of a storage policy, the wizard prompts for a “site disaster tolerance” type. In a standard vSAN cluster, the option chosen would be “None—standard cluster.” For stretched clusters, this setting would be one of the four available for stretched clusters, with the most common stretched cluster configuration being “Dual site mirroring (stretched cluster).” FIGURE 5-12 shows the options available.

FIGURE 5-12: Assigning a “site disaster tolerance” level in a storage policy

When the “site disaster tolerance” level is set to one of the stretched cluster settings, this defines whether objects are mirrored across sites. When “Dual site mirroring” is selected, this synchronously mirrors the object data using a RAID-1 data placement scheme across sites. RAID-1 mirroring is the only type of data placement scheme used across sites.

The next setting to select is the “Failures to tolerate.” This simply defines the number of failures an object can tolerate while still remaining accessible. As shown in FIGURE 5-12, the list on this wizard provides the FTT levels for the two types of data placement schemes that vSAN uses: RAID-1 mirroring of the data, or the more space-efficient RAID-5/6 erasure coding.

FIGURE 5-13: Assigning an FTT level in the storage policy definition

The FTT setting is conditional to the “site disaster tolerance” setting chosen.

  • When the “site disaster tolerance” setting of “None—standard cluster” is chosen, the FTT level and data placement scheme applies directly to the objects within that standard vSAN cluster. Since there is only one layer of protection in a standard vSAN cluster, this might be considered the primary level of protection.
  • When the “site disaster tolerance” setting is set to one of the stretched cluster options, the primary level of protection is implied (e.g., RAID-1 mirror across sites). The FTT option then refers to a secondary level of protection.

Recommendation: Adopt the terminology used in the most recent editions of vSAN. Older versions of vSAN previously used terms as “Primary Level of Failures to Tolerate” (PFTT) and “Secondary Levels of Failures to Tolerate” (SFTT) to describe the levels of protection. The most recent versions of vSAN have changed how the settings are presented to be more user-friendly.

If one attempts to use a storage policy intended for a standard vSAN cluster and apply it to a VM in a stretched cluster, vSAN may attempt to apply a data placement scheme such as RAID-5 across sites. This is not a valid scheme across sites, and vCenter shows the VM objects as not compliant to the policy. See the blog post: “Use separate SPBM policies for VMs in stretched clusters” for more information on how storage policies convert to and from standard and stretched vSAN clusters.

FIGURE 5-14. Two levels of protection in a stretched cluster, versus one level of protection in a standard cluster

SPBM policy recommendations for stretched clusters

The easiest way to accommodate a mix of stretched and non-stretched vSAN clusters is to have separate policies for stretched clusters. One could have policies exclusive to that specific vSAN stretched cluster, or build specific stretched cluster policies to be applied to multiple stretched clusters. Based on the topology, a blend of both strategies might be most fitting for your environment—perhaps cluster-specific policies for larger purpose-built clusters, along with a single set of policies for all smaller branch offices. Additional policies can easily be created by cloning existing SPBM policies, modifying them, then assigning to the appropriate VMs. Having multiple policies for VMs in stretched and non-stretched clusters is also good for a single cluster environment where you need to tear down and recreate the stretched cluster.

Adjusting existing policies impacts all VMs using the adjusted policy, whether in a stretched or non-stretched cluster. Adjustments in this scenario could introduce unnecessary resynchronization traffic when an administrator is trying to remediate an unexpected policy condition. This is another reason why dedicated SPBM policies for VMs running in stretched clusters are recommended.

Summary

vSAN stretched clusters use SPBM to provide extraordinary levels of flexibility and granularity for any vSAN environment. This is one of the staples behind vSAN’s ease of use. Using separate policies for VMs in stretched clusters is a simple operational practice that can help virtualization administrators become more comfortable with introducing and managing one or more stretched clusters in a vSAN-powered environment.

Section 6: Host and EMM Operations

When to Use Each of the Three Potential Options for EMM

All hosts in a vSAN cluster contribute to a single shared vSAN datastore for that specific cluster. If a host goes offline due to any planned or unplanned process, the overall storage capacity for the cluster is reduced. From the perspective of storage capacity, placing the host in maintenance mode is equivalent to its being offline. During the decommissioning period, the storage devices of the host in maintenance mode won’t be part of the vSAN cluster capacity.

Maintenance mode is mainly used when performing upgrades, patching, hardware maintenance such as replacing a drive, adding or replacing memory, or updating firmware. For network maintenance that has a significant level of disruption in connectivity to the vSAN cluster and other parts of the infrastructure, a cluster shutdown procedure may be most appropriate. Rebooting a host is another reason to use maintenance mode. For even a simple host restart, it is recommended to place the host in maintenance mode.

As stated earlier, placing a given host in maintenance mode impacts the overall storage capacity of the vSAN cluster. Here are some pre-requisites that should be considered before placing a host in decommission mode:

  • It is always better to decommission one host at a time.
  • Maintain sufficient free space for operations such as VM snapshots, component rebuilds, and maintenance mode. 
  • Verify the vSAN health condition of each host.
  • View information about the number of objects that are currently being synchronized in the cluster, the estimated time to finish the resynchronization, the time remaining for the storage objects to fully comply with the assigned storage policy, and so on.
  • Think about changing the settings of the vSAN repair timer if the maintenance is going to take longer than 60 minutes.

If the pre-check results show that a host can be seamlessly placed in maintenance mode, decide on the type of data migration. Take into account the storage policies that have been applied within the cluster. Some migration options might result in a reduced level of availability for some objects. Let’s look at the three potential options for data migration:

Full data migration—Evacuate all components to other hosts in the cluster.

This option maintains compliance with the FTT number but requires more time as all data is migrated from the host going into maintenance mode. It usually takes longer for a host to enter maintenance mode with Full data migration versus Ensure accessibility. Though this option assures the absolute availability of the objects within the cluster, it causes a heavy load of data transfer. This might cause additional latency if the environment is already busy. When it is recommended to use Full data migration:

  • If maintenance is going to take longer than the rebuild timer value.
  • If the host is going to be permanently decommissioned.
  • If you want to maintain the FTT method during the maintenance.

Ensure accessibility (default option)—Instructs vSAN to migrate just enough data to ensure every object is accessible after the host goes into maintenance mode.

vSAN searches only for data with RAID-0 and move/regenerate them on a host different than the one entering in maintenance mode. All the other objects with RAID-1 and higher, should already have at least one copy residing on different host within the cluster. Once the host comes back to operational, the data components left on the host in maintenance mode update with changes that have been applied on the components from the hosts that have been available. Keep in mind the level of availability might be reduced for objects that have components on the host in maintenance mode.

FIGURE 6-1: Understanding the “Ensure accessibility” option when entering a host into maintenance mode

When it is recommended to use Ensure accessibility:

  • This maintenance mode is intended to be used for software upgrades or node reboots. Ensure accessibility gives the opportunity to avoid needless Full data migration, since the host will be back to operational in a short time frame.  It is the most versatile of all EMM options

No data migration—No data is migrated when this option is selected.

A host will typically enter maintenance mode quickly with this option, but there is a risk if any of the objects have a storage policy assigned with PFTT=0. As seen in FIGURE 6-2, both components will be inaccessible. When it is recommended to use No data migration:

  • This option should be applied while some network changes are to be applied. In that specific case, all the nodes from the cluster should be placed in maintenance mode, selecting the “No data migration” option.
  • This option is best for short amounts of planned downtime where all objects are assigned a policy with PFTT=1 or higher, or where downtime of objects with PFTT=0 is acceptable.

Our recommendation is to always build a cluster with the minimum number of hosts n + 1. This configuration allows vSAN to self-heal in the event of a host failure or a host entering in maintenance mode.

FIGURE 6-2: Required and recommended hosts in a cluster when selecting the desired level of failure to tolerate

Summary

Placing a host in maintenance mode is a best practice when there is a need to perform upgrades, patching, hardware maintenance such as replacing a drive, adding or replacing memory, firmware updates, or network maintenance. There are few pre-checks to be made before placing a host in maintenance mode because the storage capacity within the vSAN cluster will be reduced once the host is out of operation. The type of data migration should be selected considering the type of storage policies that have been applied within the cluster to assure data resilience.

Enter a Host into Maintenance Mode in a Standard vSAN Cluster

Since each vSAN host in a cluster contributes to the cluster storage capacity, entering a host into maintenance mode takes on an additional set of tasks when compared to a traditional architecture. For this reason, vSAN administrators are presented three host maintenance mode options:

  • Full data migration—Evacuate all of the components to other hosts in the cluster.
  • Ensure accessibility—Evacuate enough components to ensure that VMs can continue to run, but noncompliant with the respective storage policies.
  • No data migration—Evacuate no components from this host.

FIGURE 6-3: The vSAN data migration options when entering a host into maintenance mode

EMM pre-check simulation

vSAN maintenance mode performs a full simulation of data movement to determine whether the enter maintenance mode (EMM) action will succeed or fail before it even starts. This will prevent unnecessary data movement, and provide a result more quickly to the administrator.

Canceling maintenance mode

The latest version of vSAN improves the ability to cancel all operations related to a previous EMM event. In previous editions of SAN, customers who start an EMM, then cancel it and start again on another host , could introduce unnecessary resynchronization traffic. Previous vSAN versions would stop the management task, but not necessarily stop the queued resynchronization activities. Now, when the cancel operation is initiated, active resynchronizations will likely continue, but all resynchronizations related to that event that are pending in the queue will be canceled.

Note that standard vSAN clusters using the explicit Fault Domains feature may require different operational practices for EMM. In particular, clusters with no N+1 fault domains beyond the storage policies used, and fault domains using a shallow host count: such as 2 hosts per fault domain. These design decisions may increase the operational complexity of EMM practices when cluster capacity utilization is high, which is why they are not recommended. More information can be found on the blog post “Design and Operation Considerations When Using vSAN Fault Domains.”

Summary

When taking a host in a vSAN cluster offline, there are several things to consider, such as how long the host will be offline, and the storage policy rules assigned to the VMs that reside on the host. When entering a host into maintenance mode, the “Ensure accessibility” option should be viewed as the most flexible way to accommodate host updates and restarts, while ensuring that data will remain available—albeit at a potentially reduced level of redundancy.

Enter a Host into Maintenance Mode in a 2-Node Cluster

VMs deployed on vSAN 2-node clusters typically have mirrored data protection, with one copy of data on node 1, a second copy of the data on node 2, and the Witness component placed on the vSAN Witness Host. The vSAN Witness Host can be either a physical ESXi host or a vSAN Witness Appliance.

FIGURE 6-4: A typical topology of vSAN 2-node architecture

In a vSAN 2-node cluster, if a host must enter maintenance mode, there are no other hosts to evacuate data to. As a result, guest VMs are out of compliance and are exposed to potential failure or inaccessibility should an additional failure occur.

Maintenance mode on a data node

  • Full data migration. Not available for 2-node vSAN clusters using the default storage policy, as policy compliance requires two for data and one for the Witness object.
  • Ensure accessibility. The preferred option for two-host or three-host vSAN clusters using the default storage policy. Ensure accessibility guarantees enough components of the vSAN object are available for the object to remain available. Though still accessible, vSAN objects on two- or three-host clusters are no longer policy compliant. When the host is no longer in maintenance mode, objects are rebuilt to ensure policy compliance. During this time, however, vSAN objects are at risk because they become inaccessible if another failure occurs. Any objects that have a non-standard, single-copy storage policy (FTT=0) are moved to an available host in the cluster. If there is insufficient capacity on any alternate hosts in the cluster, the host will not enter maintenance mode.
  • No data migration.This is not a recommended option for vSAN clusters. vSAN objects that use the default vSAN storage policy may continue to be accessible, but vSAN does not ensure their accessibility. Any objects that have a non-standard single-copy storage policy (FTT=0) become inaccessible until the host exits maintenance mode.

Maintenance mode on the vSAN Witness Host

Maintenance mode on the vSAN Witness Host is typically an infrequent event. Different considerations should be taken into account, depending on the type of vSAN Witness Host used.

  • vSAN Witness Appliance (recommended). No VM workloads may run here. The only purpose of this appliance is to provide quorum for the 2-node vSAN cluster. Maintenance mode should be brief, typically associated with updates and patching.
  • Physical ESXi host. While not typical, a physical host may be used as a vSAN Witness Host. This configuration supports VM workloads. Keep in mind that a vSAN Witness Host may not be a member of any vSphere cluster, and as a result, VMs have to be manually moved to an alternate host for them to continue to be available.

When maintenance mode is performed on the vSAN Witness Host, the Witness components cannot be moved to either site. When the Witness Host is put in maintenance mode, it behaves as the No data migration option would on site hosts. It is recommended to check that all VMs are in compliance and there is no ongoing failure, before doing maintenance on the Witness Host.

Note that prior to vLCM found in vSphere 7 and later, VUM required 2-node clusters to have HA disabled before a cluster remediation followed by a re-enable after the upgrade.  vLCM does not require this step.

Recommendation: Before deploying a vSAN 2-node cluster, be sure to read the vSAN 2-Node Guide on the core.vmware.com.

Summary

With a vSAN 2-node cluster, in the event of a node or device failure, a full copy of the VM data is still available on the alternate node. Because the alternate replica and Witness component are still available, the VM remains accessible on the vSAN datastore. If a host must enter maintenance mode, vSAN cannot evacuate data from the host to maintain policy compliance. While the host is in maintenance mode, data is exposed to a potential failure or inaccessibility should an additional failure occur.

Restarting a Host in Maintenance Mode

For typical host restarts with ESXi, most administrators get a feel for roughly how long a host takes to restart, and simply wait for the host to reappear as “connected” in vCenter. This may be one of the many reasons why out-of-band host management isn’t configured, available, or a part of operational practices. However, hosts in a vSAN cluster can take longer to reboot than non-vSAN hosts because they have additional actions to perform during the host reboot process. Many of these additional tasks simply ensure the safety and integrity of data. Incorporating out-of-band console visibility into your operational practices can play an important role for administering a vSAN environment.

Looking at the Direct Console User Interface (DCUI) during a host restart reveals a few vSAN-related activities. The most prominent message, and perhaps the one that may take the most time, is “vSAN: Initializing SSD… Please wait…” similar to what is shown in FIGURE 6-5.

FIGURE 6-5: DCUI showing the “Initializing SSD” status

During this step, vSAN is processing data and digesting the log entries in the buffer to generate all required metadata tables.  More detail on a variety of vSAN initialization activities can be exposed by hitting ALT + F11 or ALT + F12 in the DCUI. For detailed information, read the blog post on monitoring vSAN restarts using DCUI.

Recommendation: Use out-of-band management to view vSphere DCUI during host restarts.

Significant improvements in host restart times were introduced in vSAN 7 U1.  See the post "Performance Improvements in vSAN 7 U1" for more information about this enhancement.

Summary

When entering a host into maintenance mode, there are several things to consider, like how long the host will be in maintenance mode and the data placement scheme assigned by the respective storage policies. View the “Ensure accessibility” option as a flexible way to accommodate host updates and restarts. Planned events (such as maintenance mode activities) and unplanned events (such as host outages) may make the effective storage policy condition different than the assigned policy. vSAN constantly monitors this, and when resources become available to fulfill the rules of the policy, it adjusts the data accordingly. Lastly, incorporate DCUI accessibility via remote management into defined maintenance workflows such as host restarts.

Section 7: Guest VM Operations

Configuring and Using TRIM/UNMAP in vSAN

vSAN supports thin provisioning, which lets you use just as much storage capacity as currently needed in the beginning and then add the required amount of storage space at a later time. Using the vSAN thin provisioning feature, you can create virtual disks in a thin format. For a thin virtual disk, ESXi commits only as much storage space as the disk needs for its initial operations. To use vSAN thin provisioning, set the SPBM policy for Object Space Reservation (OSR) to its default of 0.

One challenge to thin provisioning is that VMDKs, once grown, will not shrink when files within the guest OS are deleted. This problem is amplified by the fact that many file systems always direct new writes into free space. A steady set of writes to the same block of a single small file eventually use significantly more space at the VMDK level. Previous solutions to this required manual intervention and migration with Storage vMotion to external storage, or powering off a VM. To solve this problem, automated TRIM/UNMAP space reclamation was created for vSAN 6.7U1.

Additional information can be found on the “UNMAP/TRIM space reclamation on vSAN” technote.

Planning the process

If implementing this change on a cluster with existing VMs, identify the steps to clean previously non-reclaimed space. In Linux, this can include scheduling file system (FS).Trim to run by timer, or in Windows, running the disk optimization tools or the Optimize-Volume PowerShell command. Identify any operating systems in use that may not natively support TRIM/UNMAP. Identify whether you meet the minimum vSAN version requirements (vSAN 6.7U1).

UNMAP commands do not process through the mirror driver. This means that snapshot consolidation will not commit reclamation to the base disk, and commands will not process when a VM is being migrated with VMware vSphere Storage vMotion. To compensate for this, run asynchronous reclamation after the snapshot or migration to reclaim these unused blocks. This may commonly be seen if using VADP-based backup tools that open a snapshot and coordinate log truncation prior to closing the snapshot. One method to clean up before a snapshot is to use the pre-freeze script.

Identify any VMs that you wish to not reclaim space with. For these VMs you can use a VMX flag disk.scsiUnmapAllowed set to False.

Implementation

#StorageMinute: vSAN Space Reclamation

FIGURE 7-1: Viewing the size of a virtual disk within the vSANDatastore view of Center

Validation

After making the change, reboot a VM and manually trigger space reclaim. Monitor the backend UNMAP throughput and total free capacity in the cluster increasing.

FIGURE 7-2: Viewing TRIM/UNMAP throughput on the host-level vSAN performance metrics

 

Potential Tuning of Workloads After Migration to vSAN

In production environments, it is not uncommon to tune VMs to improve the efficiency or performance of the guest OS or applications running in the VM. Tuning generally comes in two forms:

  • VM tuning—Achieved by adjusting the VM’s virtual hardware settings, or vSAN storage policies.
  • OS/application tuning—Achieved by adjusting OS or application-specific settings inside the guest VM.

The following provides details on the tuning options available, and general recommendations in how and when to make adjustments.

VM Tuning

VM tuning is common in traditional three-tier architectures as well as vSAN. Ensuring sufficient but properly sized virtual resources of compute, memory, and storage has always been important. Additionally, vSAN provides the ability to tune storage performance and availability settings per VM or VMDK through the use of storage policies. VM tuning that is non-vSAN-specific includes, but is not limited to:

  • Virtual CPU
  • Amount of virtual memory
  • Virtual disks
  • Type and number of virtual SCSI controllers
  • Type and number of virtual NICs

FIGURE 7-3: Virtual hardware settings of a VM

Determining the optimal allocation of resources involves monitoring the VM’s performance metrics in vCenter, or augmenting this practice with other tools such as vRealize Operations to determine if there are any identified optimizations for VMs.

Recommendation: For VMs using more than one VMDK, use multiple virtual SCSI adapters. This provides improved parallelism and can achieve better performance. It also allows one to easily use the much more efficient and better performing Paravirtual SCSI controllers on these additional VMDKs assigned to a VM. See Page 38 of the “Troubleshooting vSAN Performance” document for more information.

VM tuning through the use of storage policies that are specific to vSAN performance and availability would include:

  • Level of FTT
  • Data placement scheme used (RAID-1 or RAID-5/6)
  • Number of disk stripes per object (AKA “stripe width”)
  • IOPS limit for object

FIGURE 7-4: Defining the FTT level in a vSAN storage policy

Since all data lives on vSAN as objects, and all objects must have an assigned storage policy, vSAN provides a “vSAN Default Storage Policy” on a vSAN cluster as a way to automatically associate data with a set of rules defining availability and space efficiency. This is set to a default level of FTT=1, using RAID-1 mirroring, and offers the most basic form of resilience for all types of cluster sizes. In practice, an environment should use several storage policies that define different levels of outcomes, and apply them to the VMs as the requirements dictate. See “Operational Approaches of Using SPBM in an Environment ” for more details.

Determining the appropriate level of resilience and space efficiency needed for a given workload is important, as these factors can affect results. Setting a higher level of resilience or a more space-efficient data placement method may reduce the level of performance the environment delivers to the VM. This trade-off is from the effects of I/O amplification and other overhead, described more in “Troubleshooting vSAN Performance.”

The recommendations for storage policy settings may be different based on your environment. For an example, let us compare a vSAN cluster in a private cloud versus vSAN running on VMC on AWS.

  • Private cloud—The standard hardware specification for your hosts and network in a private cloud may be modest. The specification may not be able to meet performance expectations if one were to use the more space efficient but more taxing RAID-5/6 erasure coding. In those cases, it would be best to use a RAID-1 mirror as the default, and look for opportunities to apply RAID-5/6 case by case.
  • VMC on AWS—The standard hardware specification for this environment is high, but consuming large amounts of capacity can be cost-prohibitive. It may make more sense to always start by using VM storage policies that use the more space-efficient RAID-5/6 erasure coding over RAID-1. Then, apply RAID-1 to discrete systems where RAID-5/6 is not meeting the performance requirements. Other storage policy rules that can impact performance and resilience settings are “Number of disk stripes per object” (otherwise known as “stripe width”) and “IOPS limit for object.” More details on these storage policy rules can be found in the “Storage Policy Operations” section of this document.

OS/application tuning

OS/application tuning is generally performed to help the OS or application optimize its behavior to the existing environment and applications. Often you may find this tuning in deployment guides by an application manufacturer, or in a reference architecture. Note: Sometimes, if the recommendations come from a manufacturer, they may not take a virtualized OS or application into account and may have wildly optimistic recommendations.


Recommendation: Avoid over-ambitious OS/application tuning unless explicitly defined by a software manufacturer, or as outlined in a specific reference architecture. Making OS and application adjustments in a non-prescriptive way may add unnecessary complexity and result in undesirable results. If there are optimizations in the OS and application, make the adjustments one at a time and with care. Once the optimizations are made, document their settings for future reference.

Summary

VM tuning, as well as OS/application tuning can sometimes stem from identified bottlenecks. The “Troubleshooting vSAN Performance” document on core.vmware.com provides details on how to isolate the largest contributors to an identified performance issue, and the recommended approach for remediation. This section details specific VM related optimizations that may be suitable for your environment.

Section 8: Data Services

DD&C: Enabling on a New Cluster

When designing a vSAN cluster, it is worth considering from the beginning if you will be using DD&C on the cluster. Enabling DD&C retrospectively is costly from an I/O perspective, as each bit needs to be read from disk, compressed, deduplicated, and written to disk again.

There are also a few design considerations that come into play when dealing with DD&C. As an example, DD&C is only supported on all-flash configurations. More details are available here: “Deduplication and Compression Design Considerations.” Additionally, all the objects provisioned to vSAN (VMs, disks, etc.) need to be thin and with their OSR on the SPBM policy set to 0%.

vSAN aligns its dedupe domain with a disk group. What this means is duplicate data needs to reside within the same disk group to be deduplicated. There are a number of reasons for this, but of utmost importance to vSAN is data integrity and redundancy. If for some reason the deduplication hash table becomes corrupted, it only affects a single copy of the data. (By default, vSAN data is mirrored across different disk groups, each viewed as its own failure domain.) So, deduplicating data this way means no data loss from hash table corruption. Enabling DD&C is easy and is documented in “Enable Deduplication and Compression on a New vSAN Cluster” as well as in FIGURE 8-1 below. Note that this process has slightly different considerations on an existing cluster:

FIGURE 8-1: Using the Configure section to enable DD&C prior to enabling vSAN

Data is deduplicated and compressed on destage from the cache to the capacity tier. This ensures not spending CPU cycles on data that may be transient, short-lived, or otherwise not efficient to dedupe if it were to just be deleted in short order. As such, DD&C savings are not immediate, but rather climb over time as data gets destaged to the capacity disks. More information on using DD&C with vSAN can be found here: “Using Deduplication and Compression.

Introduced in vSAN 7 U1 is the new "Compression only" space efficiency feature.  This may be a more suitable fit for your environment and is a good starting point if you wish to employ some levels of space efficiency, but are not sure of the effectiveness with deduplication in your environment.  See the post: "Space Efficiency using the New "Compression only" Option in vSAN 7 U1" for more details.

Recommendation: If enabling DD&C, consider upgrading to at least vSAN 6.7 U3, if not vSAN 7 U1. Significant improvements have been introduced in these recent versions to improve the performance of clusters running this space efficiency feature.

DD&C: Enabling on an Existing Cluster

Generally, it is recommended to enable DD&C on a vSAN cluster from the beginning, before workloads are placed on it. That said, it is possible to enable DD&C retrospectively with live workloads on the cluster.

The reason to enable DD&C from the start is that, on a cluster with live workloads, every data bit has to be read from the capacity tier, compressed, deduplicated against all other bits on the disk group, and then re-written to disk. This causes a lot of storage I/O on the backend that wouldn’t exist if enabled from the start. (Though it is mitigated to a large extent by our I/O scheduler.) If you do decide to enable DD&C on an existing cluster, the process is much the same as for a new cluster. See here for details: “Enable Deduplication and Compression on Existing vSAN Cluster.”

Introduced in vSAN 7 U1 is the new "Compression only" space efficiency feature.  This may be a more suitable fit for your environment and is a good starting point if you wish to employ some levels of space efficiency, but are not sure of the effectiveness with deduplication in your environment.  See the post: "Space Efficiency using the New "Compression only" Option in vSAN 7 U1" for more details.

When enabling DD&C on a cluster that has live workloads, the workloads are unaffected. All VMs stay up throughout the operation. However, due to the large amount of storage I/O that goes into recreating all the disk groups, it is advised to perform this operation during off-hours.

FIGURE 8-2: Enabling the Deduplication and Compression service in a vSAN cluster

It is also important to note that data is deduplicated and compressed upon destage from the cache to the capacity tier. This ensures not spending CPU cycles on data that may be transient, short-lived, or otherwise inefficient to dedupe (if it were to just be deleted in short order, for example). DD&C savings are immediate but do climb over time as data gets destaged to the capacity disks.

Recommendation: Unless your set of workloads can take advantage of deduplication, consider using the new "Compression only" space efficiency option introduced in vSAN 7 U1 instead of DD&C.  The compression only option is a more efficient and thus, higher performing space efficiency option.  

DD&C: Disabling on an Existing Cluster

Disabling DD&C on an existing vSAN cluster requires careful consideration and planning. Disabling this space-saving technique will increase the total capacity used on your cluster by the amount shown in the Capacity UI. Ensure you have adequate space on the vSAN cluster to account for this space increase to avoid full-cluster scenarios. For example, if the vSAN Capacity view tells you the “Capacity needed if disabled” is 7.5TB, at least that amount needs to be available on your cluster. You also want to account for free space to allow for evacuations, resynchronizations, and some room for data growth on the cluster. 

When DD&C is enabled on a cluster, another thing to be aware of is the backend storage I/O generated by the operation. Each disk group is in turn, one by one destroyed and recreated, data is read, rehydrated (i.e., decompressed and reduplicated), and written to the new disk group.

All VMs remain up during this operation. Because of the large amount of storage I/O that results (the higher your DD&C ratio, the more data needs to be written back to the disks), it is advised that this operation is performed during off -hours. Information on disabling DD&C can be found here: “Disable Deduplication and Compression.” For a visual representation of disabling DD&C on an existing cluster, see FIGURE 8-3.

FIGURE 8-3: Enabling or disabling a vSAN service

DD&C: Review History of Cluster DD&C Savings

You can review the historical DD&C rates, as well as actual storage consumed and saved, in two ways: one is with vRealize Operations, and the other is right within the product itself. In vRealize Operations, you will find the bundled “vSAN operations” dashboard. This lists a number of metrics and trends, some of those being DD&C-related. You can view DD&C ratio, DD&C savings, and actual storage consumed as well as trends to predict time until cluster full and other useful metrics. See FIGURE 8-4 for an example.

FIGURE 8-4: Using vRealize Operations to view DD&C ratios over a long period of time

If you don’t have vRealize Operations, or you simply want to view some of this information in vCenter, navigate to the vSAN cluster in question and click Monitor → Capacity, then click the capacity history tab. This brings you to a UI that lets you change and view a number of things. The default is to look at the previous day’s capacity and DD&C usage and trends, but this can be customized. You have two options to view the historical DD&C ratios and space savings here. One is the default, to use the current day as a reference and view the previous X days—where you define the number of days in the UI. The other option is to click the drop-down and choose Custom.

From here you can choose the reference date and the time period. For example, if you want to view the 30 days from 31 March, you would simply choose 31 March as your reference date and insert 30 as the number of days of history you want to view. A full example of using the capacity history tab on a vSAN cluster can be seen in FIGURE 8-5.

FIGURE 8-5: Viewing the capacity utilization history of a vSAN cluster

D@RE: Enabling on a New Cluster

Enabling data at rest encryption (D@RE) on a vSAN cluster is relatively easy. When vSAN encryption is enabled, any new data written to the cluster is encrypted. Enabling vSAN encryption performs a rolling reformat that copies data to available capacity on another node, removes the now-empty disk group, and then encrypts each device in a newly recreated disk group.

While this process is relatively easy to accomplish, some requirements and considerations must be taken into account.

Requirements to use vSAN encryption include:

  • Licensing—When intending to use vSAN encryption, use vSAN Enterprise Edition. vSAN Enterprise is available in configurations based on per-CPU, per-virtual desktop, or per-ROBO (25-pack) licensing.
  • Key management—When used with vSAN Encryption, any KMIP 1.1-compliant key manager will work and be supported by VMware GSS. There are several key management servers (KMSs) that have been validated by VMware along with their respected vendors. These validated solutions have additional testing and workflows to help with the setup and troubleshooting process that nonvalidated solutions do not have. A list of currently supported KMSs can be found on the VMware Compatibility Guide. A key encryption key (KEK) and a host key are provided to vCenter and each vSAN node. The KEK is used to wrap and unwrap the data encryption key (DEK) on each storage device. The host key is used to encrypt logs on each ESXi host.
  • Connectivity—It is important to understand connectivity when using vSAN encryption. While a KMS “profile” is created in Center, each vSAN host must have its own connectivity to the KMS because hosts directly connect to the KMS using the client configuration created in vCenter. vSAN encryption was designed to work this way to ensure that hosts can boot and provide data access in the event that vCenter is not available.

Enabling vSAN encryption has some settings to be familiar with. It is important to understand where each of these contributes to the enabling process.

  • Erase disks before use—When vSAN encryption is configured, new data is encrypted. Residual data is not encrypted. The rolling reformat mentioned above moves (copies) data to available capacity on another node’s disk group(s). While data is no longer referenced on the now-evacuated disk group, it is not overwritten. This checkbox ensures that data is overwritten, preventing the possibility of using disk tools or other forensic devices to potentially recover unencrypted data. The process of enabling vSAN encryption is significantly longer when selecting this checkbox due to each device being written to.
  • KMS cluster—Selecting the KMS cluster is only possible if the KMS profile has already been created in vCenter. The process for adding a KMS profile can be found on core.vmware.com.
  • Allow reduced redundancy—Consider the rolling reformat process for vSAN encryption (as well as DD&C). Data is moved (copied) from a disk group, the disk group is removed and recreated, and then data may or may not be moved back, depending on parameters such as storage policy rules and availability or capacity.

FIGURE 8-6: Configuring D@RE on a vSAN cluster

vSAN attempts to keep items in compliance with their storage policy. For example, when vSAN mirrors components, those mirrored components must be on separate nodes. In a 2- or 3-node vSAN cluster, where components are already stored in three different locations, the rolling reformat process of enabling or disabling vSAN encryption has nowhere to put data when a disk group is removed and recreated. This setting allows for vSAN to violate the storage policy compliance to perform the rolling reformat. It is important to consider that data redundancy reduces until the process is complete, and all data has been resynchronized.

Recommendations:  Use “Erase disks before use” on clusters that have pre-existing data. This takes significantly longer but ensures no residual data. Use “Allow Reduced Redundancy” in general. This is a required setting for 2- or 3-node clusters, and it allows the process to complete when storage policies may prevent completion.  And finally, enable vSAN encryption at the same time as enabling DD&C. This is to prevent having to perform the rolling disk group reformat multiple times.

Summary

Data at rest encryption gives tremendous flexibility to encrypt all data in a vSAN cluster. Thanks to the architecture of vSAN, this decision can be performed on a per-cluster basis. Administrators can tailor this need to best align with the requirements of the organization.

D@RE: Performing a Shallow Rekey

Key rotation is a strategy often used to prevent long-term use of the same encryption keys. When encryption keys are not rotated on a defined interval, it can be difficult to determine their trustworthiness. Consider the following situation:

  • A contractor sets up and configures an encrypted cluster.
  • The contractor backs up the encryption keys.
  • At a later date, the contractor replaces a potentially failed storage device.

If the encryption keys have not been changed (or rotated), the contractor could possibly decrypt and recover data from the suspected failed storage device. VMware recommends a key rotation strategy that aligns with our customers’ typical security practices.

The two keys used in vSAN encryption include the KEK and the DEK. The KEK is used to encrypt the DEK. Rotating the KEK is quick and easy, without any requirement for data movement. This is referred to as a shallow rekey.

FIGURE 8-7: Performing a shallow rekey for a vSAN Cluster

Clicking “Generate New Encryption Keys” in the vSAN configuration UI, followed by clicking “Generate,” performs a shallow rekey. A request for a new KEK will be generated and sent to the KMS that the cluster is using. Each DEK will be rewrapped using the new KEK.

Shallow rekey operations can also be scripted, and possibly automated using an API call or PowerCLI script. VMware {code} has an example of a PowerCLI script.

Find more information about vSAN encryption on core.vmware.com.

Recommendation: Implement a KEK rotation strategy that aligns with organizational security and compliance requirements.

D@RE: Performing a Deep Rekey

Key rotation is a strategy often used to prevent long-term use of the same encryption keys. When encryption keys are not rotated on a defined interval, it can be difficult to determine their trustworthiness.  Consider the following situation:

  • A contractor has physical access to an encrypted cluster.
  • The contractor removes a physical device in an attempt to get data.
  • A deep rekey is performed before the contractor replaces the drive.
  • The drive with the incorrect encryption sequence throws an error.

If the encryption keys have not been changed (or rotated), the contractor could return the drive without its being detected. A deep rekey updates the KEK and DEK, ensuring both keys used to encrypt and secure data have been changed. vSAN encryption also assigns a DEK generation ID, which ensures that all encrypted storage devices in the cluster have been rekeyed at the same time.

VMware recommends a key rotation strategy that aligns with our customers’ typical security practices. Rotating the DEK is easy, just like rotating the KEK. This is a more time-consuming process, though, as data is moved off devices as they receive a new DEK. This is referred to as a deep rekey.

FIGURE 8-8: Performing a deep rekey for a vSAN Cluster

Clicking “Generate New Encryption Keys” in the vSAN configuration UI and selecting “Also re-encrypt all data on the storage using the new keys,” followed by clicking “Generate” performs a deep rekey. Just as the process of enabling or disabling vSAN encryption, the “Allow Reduced Redundancy” option should be used for 2- or 3-node clusters.

A request for a new KEK is generated and sent to the KMS the cluster is using. Each disk group on a vSAN node evacuates to an alternate storage location (unless using reduced redundancy). When no data resides on the disk group, it will be removed and recreated using the new KEK, with each device receiving a new DEK. As this process cycles through the cluster, some data may be returned to the newly recreated disk group(s).

The UI provides warning that performance will be decreased. This is a result of the resynchronizations that have to occur when evacuating a disk group. Performance impact may not be significant, depending on the cluster configuration, the amount of data, and the workload type. Deep rekey operations can also be scripted, and possibly automated using an API call or PowerCLI script. VMware {code} has an example of a PowerCLI script.

Find more information about vSAN encryption on core.vmware.com.

Recommendations: Implement a DEK rotation strategy that aligns with organizational security and compliance requirements. Be sure to take into account that a deep rekey process requires a rolling reformat.  Finally, use “Allow Reduced Redundancy” in general. This is required for 2- or 3-node clusters, and it allows the process to complete when storage policies may prevent completion.

D@RE: Using D@RE and VM Encryption Together

vSphere provides FIPS 140-2 validated data-at-rest encryption when using per-VM encryption or vSAN datastore level encryption. These features are software-based, with the task of encryption being performed by the CPU. Most server CPUs released in the last decade include the AES-NI instruction set to minimize the additional overhead. More information on AES-NI can be found on Wikipedia.

These two features provide encryption at different points in the stack, and have different pros and cons for using each. Detailed differences and similarities are can be found in the Encryption FAQ. With VM encryption occurring at the VM level, and vSAN encryption occurring at a datastore level, enabling both results in encrypting and decrypting a VM twice. Having encryption performed multiple times is typically not desirable.

The vSAN health check reports when a VM has an encryption policy (for VM encryption) and also resides on an encrypted vSAN cluster. This alert is only cautionary, and both may be used if so desired.

FIGURE 8-9: The vSAN health check reporting the use of multiple encryption types used together

Understanding vSAN Datastore Encryption vs. VMcrypt Encryption” provides additional detail, including performance and space efficiency considerations, as well as recommendations specific to which is the most desirable per use case.  The above alert is common when migrating a VM encrypted by VM encryption to a vSAN datastore. It is typically recommended to disable VM encryption for the VM if it is to reside on an encrypted vSAN cluster. The VM must be powered off to remove VM encryption. Customers wishing to prevent the VM from being unencrypted likely choose to remove VM encryption after it has been moved to an encrypted vSAN datastore.

Recommendation: With encryption being performed at multiple levels, only enable VM encryption on VMs residing on an encrypted vSAN cluster when there is an explicit requirement for it, such as, while migrating an encrypted VM to a vSAN cluster, or before moving a non-encrypted VM from a vSAN cluster

D@RE: Disabling on an Existing Cluster

vSAN data at rest encryption (D@RE) is a cluster-based data service that is similar to deduplication and compression in that it is either enabled, or disabled for the entire vSAN cluster. Disabling vSAN Encryption on a vSAN Cluster is as easy as enabling vSAN Encryption. When vSAN Encryption is disabled, a rolling reformat process occurs again, copying data to available capacity on another node, removes the now empty disk group, and recreates the disk group without encryption. There is no compromise to the data during the process, as all evacuation that occurs on a disk group occurs within the cluster. In other words, there is no need to use swing storage for this process.

Recommendation: Evaluate your overall business requirements when making the decision to enable or disable a cluster-based service like D@RE. This will help reduce unnecessary cluster conversions.

Disabling vSAN Encryption

The process of disabling vSAN Encryption is the opposite of enabling vSAN Encryption, as shown in Figure 8-10.

FIGURE 8-10: Disabling vSAN data at rest encryption

As the disabling of the service occurs, data is moved (copied) from a disk group to another destination. The disk group is removed and recreated, and is ready to accept data in an unencrypted format. vSAN attempts to keep items in compliance with their storage policy. The nature of any type of rolling evacuation means that there is a significant amount of data that will need to move in order to enable or disable the service. Be mindful of this in any operational planning.

Recommendation: Disable encryption at the same time as enabling Deduplication & Compression. This is to prevent having to perform the rolling disk group reformat multiple times.

Some cluster configurations may have limited abilities to move data elsewhere while still maintaining full compliance of the storage policy. This is where the "Allow Reduced Redundancy" option can be helpful. It is an optional checkbox that appears when enabling or disabling any cluster-level data service that requires a rolling reformat. A good example using this feature could be in a 2 or a 3 node cluster, where there are inherently insufficient hosts to maintain full policy resilience during the transition. vSAN will ensure that the data maintains full availability, but at a reduced level of redundancy until the process is complete. Once complete, the data will regain its full resilience prescribed by the storage policy.

Recommendation: Use "Allow Reduced Redundancy" in general. While this is a required setting for 2 or 3 node clusters, it will allow for the process to complete when storage policies may prevent completion.

Summary

Disabling data services in vSAN is as easy and as transparent as enabling them. In the case of data at rest encryption (and deduplication & compression), vSAN will need to perform a rolling reformat of the devices, a task that is automated, but does require significant levels of data movement.

iSCSI: Identification and Management of iSCSI Objects in vSAN Cluster

vSAN iSCSI objects can appear different than other vSAN objects in vSAN reports—usually reporting as “unassociated” because they aren’t mounted directly into a VM as a VMDK—but rather via a VM’s guest OS iSCSI initiator, into which vSAN has no visibility. If you use the vCenter RVC to query vSAN for certain operations, be aware that iSCSI LUN objects as well as vSAN performance service objects (both of which are not directly mounted into VMs) will be listed as “unassociated”—this does NOT mean they are unused or safe to be deleted.

So, how can you tell if objects are in use by the performance service or iSCSI? After logging in to the vCenter server, iSCSI objects or performance management objects could be listed and shown as unassociated when querying with RVC command vsan.obj_status_report.

These objects are not associated with a VM, but they may be valid vSAN iSCSI objects on the vSAN datastore and should not be deleted. If the intention is to delete some other unassociated objects and save space, please contact the VMware GSS team for assistance. The following shows how to identify unassociated objects as vSAN iSCSI objects and verify from the vSphere web client.

Login to vCenter via ssh and launch the RVC, then navigate to the cluster:

root@sc-rdops-vm03-dhcp-93-66 [ ~ ]# rvc administrator@vsphere.local@localhost

Welcome to RVC. Try the ‘help’ command.

0 /

1 localhost/

> cd localhost/

Run the vSAN object status report in RVC:

/localhost> vsan.obj_status_report -t /localhost/VSAN-DC/computers/VSAN-Cluster/…

Histogram of component health for non-orphaned objects

+-------------------------------------+------------------------------+

| Num Healthy Comps / Total Num Comps | Num objects with such status |

+-------------------------------------+------------------------------+

| 3/3 (OK) | 10 |

+-------------------------------------+------------------------------+

Total non-orphans: 10

Histogram of component health for possibly orphaned objects

+-------------------------------------+------------------------------+

| Num Healthy Comps / Total Num Comps | Num objects with such status |

+-------------------------------------+------------------------------+

+-------------------------------------+------------------------------+

Total orphans: 0

Total v9 objects: 10

+-----------------------------------------+---------+---------------------------+

| VM/Object | objects | num healthy / total comps |

+-----------------------------------------+---------+---------------------------+

| Unassociated objects | | |

| a29bad5c-1679-117e-6bee-02004504a3e7 | | 3/3 |

| ce9fad5c-f7ff-9927-9f58-02004583eb69 | | 3/3 |

| a39cad5c-008a-7b61-a630-02004583eb69 | | 3/3 |

| d49fad5c-bace-8ba3-9c7a-02004583eb69 | | 3/3 |

| d09fad5c-1650-1caa-d0f1-02004583eb69 | | 3/3 |

| 66bcad5c-a7b5-1ef9-0999-02004504a3e7 | | 3/3 |

| 169cad5c-6676-063b-f29e-020045bf20e0 | | 3/3 |

| f39bad5c-5546-ff8d-14e1-020045bf20e0 | | 3/3 |

| 199cad5c-e22d-32d7-aede-020045bf20e0 | | 3/3 |

| 1d9cad5c-7202-90f4-0fbf-020045bf20e0 | | 3/3 |

+-----------------------------------------+---------+---------------------------+

Cross-reference the “Unassociated objects” list UUIDs with the vSAN iSCSI objects, as well as the “iSCSI home object” and the “performance management object” in the vSphere web client under vSAN Cluster → Monitor → vSAN → Virtual Objects and compare the UUIDs under the “vSAN UUID” column with those in the “Unassociated objects” report from RVC. If UUIDs appear in both lists, they are NOT safe to remove.

FIGURE 8-10: Enumerated objects, related storage policies, and vSAN UUIDs

Again, if in any doubt, please contact the VMware GSS team for assistance.

Section 9: Stretched Clusters

Convert Standard Cluster to Stretched Cluster

vSAN stretched clusters can be initially configured as stretched, or they can be converted from an existing standard vSAN cluster.

Pre-conversion tasks

  • vSAN Witness Host.  A vSAN Witness Host must be added to vCenter before attempting to convert a vSAN cluster to a vSAN stretched cluster. This can be a physical host or a vSAN Witness Appliance.
  • Configure networking.  Networking requirements for a vSAN stretched cluster must be in place as well. This could include creating static routes as well as configuring Witness Traffic Separation (WTS).
  • Host and policy considerations. When converting to a vSAN stretched cluster, the primary protection of data is across sites using mirroring. vSAN has the ability to protect data within a site, using a secondary protection rule. To meet the requirements of the secondary protection rule, each site must have enough hosts to satisfy the desired secondary protection rule. When choosing host counts that will be configured for each fault domain, consider that vSAN stretched clusters most often mirror data across sites. This is one of the primary benefits of a vSAN stretched cluster: a full copy of data in each site. When converting a standard vSAN cluster to a vSAN stretched cluster, understand that more hosts may be added to continue to use the same storage policies used in the standard vSAN cluster. When the standard vSAN cluster does not have enough hosts to provide the same per-site policy, different storage policies may need to be chosen.

Example: Consider the desire to convert an all-flash 6-node vSAN cluster using an erasure coding storage policy. A regular 6-node all-flash vSAN cluster can support either RAID-5 or RAID-6 storage policies. By splitting this cluster in two when converting it to a vSAN stretched cluster, the (now stretched) cluster cannot satisfy the requirements of a RAID-5 or RAID-6 storage policy. This is because each fault domain only has three member hosts. Mirroring is the only data placement scheme that can be satisfied. Adding an additional host to each fault domain would allow for a RAID-5 secondary rule and erasure coding. Adding three additional hosts to each site would meet the minimum requirements for using RAID-6 as a secondary level of protection in each site. While it isn’t necessary for vSAN stretched clusters to have the same number of hosts in each fault domain, or site, they typically do. Situations where a site locality rule is used could alter the typical symmetrical vSAN stretched cluster configuration.

Conversion

When hosts have been designated for each fault domain, the vSAN Witness Host has been configured, and networking has been validated, the vSAN cluster can be converted to a vSAN stretched cluster. The process for converting a vSAN cluster to a vSAN stretched cluster can be found on core.vmware.com. After converting the vSAN cluster to a vSAN stretched cluster, configure HA, DRS, and VM/host groups accordingly.

  • HA settings
  • DRS settings
  • VM/host groups

Recommendations: Be sure to run through the pre-conversion tasks, as well as deploying a vSAN Witness Host beforehand. Ensure the network is properly configured for vSAN stretched clusters. Determine the storage policy capabilities of the new stretched configuration.
 

Convert Stretched Cluster to Standard Cluster

Moving a vSAN stretched cluster to a traditional vSAN cluster is similar to the reversed process of converting a standard vSAN cluster to a vSAN stretched cluster. Before performing the conversion process, the most important items to consider are:

  • Hosts only reside in two fault domains, but these could be spread across geographically separated locations.
  • A vSAN Witness Host is participating in the vSAN cluster.
  • Per-site DRS affinity rules are likely in place to keep workloads on one fault domain or the other.
  • vSAN stretched cluster-centric networking is in place.

Workloads should not be running during this conversion process.

If the vSAN stretched cluster is located in a single facility—such as each fault domain in a different room or across racks—this process can be completed easily. If the vSAN stretched cluster fault domains are located in geographically separate locations, hosts in the secondary fault domain need to be relocated to the same location as the hosts in the preferred fault domain, which makes the process a bit more complex.

The basic process can be found on the VMware Docs site: “Convert a Stretched Cluster to a Standard vSAN Cluster.” The basic process addresses the removal of the vSAN Witness Host and fault domains. VM/host affinity rules should be removed from the vSphere cluster UI. Any static routing configured to communicate with the vSAN Witness Host should be removed.

Recommendations:Use the vSAN health UI to repair any vSAN objects.  Also be sure to remove any static routing configured to communicate with the vSAN Witness Host.

Replacing Witness Host Appliance on Stretched Cluster

vSAN storage policies with the FTT=1 using mirroring require three nodes / fault domains available to meet storage policy compliance. This means that 2-node vSAN clusters require a third host to contribute to the 2-node cluster. A vSAN Witness Host is used to provide quorum.

The vSAN Witness Host stores Witness components for each vSAN object. These Witness components are used as a tiebreaker to assist with ensuring accessibility to a vSAN object, such as a VMDK, VSWP, or namespace, in the event of a failure or network partition. One node in the preferred site of a stretched vSAN cluster is designated as the master node. A second node, residing in the non-preferred site, is designated as the backup node. When all the nodes in one site or the other cannot communicate with the hosts in the opposite site, that site is “partitioned” from the other.

Preferred and non-preferred sites isolated from each other: If both preferred and non-preferred sites can communicate with the vSAN Witness Host, the cluster will continue to operate VMs that have cross-site storage policies in the preferred site. vSAN objects that have a Site Affinity policy only will continue to operate in the isolated site. This is because Site Affinity policies have no components residing in the preferred or Witness Host sites.

Preferred site failure or complete isolation: If the preferred is completely isolated from the non-preferred site and the vSAN Witness Host, the backup node in the non-preferred site becomes the master with the vSAN Witness Host. Note that either site in a vSAN stretched cluster may be designated “preferred,” and may be changed when required.

The vSAN Witness Host can be a physical ESXi host or a VMware-provided virtual appliance, which is referred to as a vSAN Witness Appliance. Replacing a vSAN Witness Host is not a typical or regular task, but one that could be required in some of the following scenarios:

  • Physical host is being decommissioned
    • Due to ESXi CPU requirements (e.g., upgrading from vSAN 6.6 to 6.7+)
    • Host is at the end of a lease lifecycle
  • vSAN Witness Appliance
    • Has been deleted
    • Has been corrupted
  • Does not meet the vSAN Witness Component count requirement in cases where a cluster has grown in component count.

Replacing a vSAN Witness Host is relatively simple when a suitable vSAN Witness Host has been configured. The fault domains section in the vSAN configuration UI provides a quick and easy mechanism to swap the currently configured vSAN Witness Host with an alternate host for this purpose.

FIGURE 9-1: Viewing the fault domains created as a result of a configured vSAN stretched cluster

More detailed information can be found in the vSAN Stretched Cluster Guide here: “Replacing a Failed Witness Host.”

Configure a Stretched Cluster

When deploying a vSAN stretched cluster, it is important to understand that it is a bit different than a traditional vSAN cluster.

Overview of stretched clusters

A stretched cluster architecture includes one or more nodes in two separate fault domains for availability and a third node, called the vSAN Witness Host, to assist in the event the fault domains are isolated from each other.
Traditional vSAN clusters are typically in a single location. Stretched cluster resources are typically located in two distinct locations, with the tiebreaker node in a third location. Because hosts are in different sites, a few items require additional consideration when compared to traditional vSAN clusters:

  • The vSAN Witness Host
  • Network bandwidth and topology
  • VM deployment and placement
  • Storage policy requirements

vSAN Witness Host

The vSAN Witness Host can be physical or the VMware-provided vSAN Witness Appliance. This “host” must have proper connectivity to vCenter and to each host in the vSAN cluster. Communication with the vSAN Witness Host is always Unicast and requires additional ports between itself and the vSAN cluster. vSAN Witness Host deployment and configuration can be found on core.vmware.com: “Using a vSAN Witness Appliance.”

Network Topology Bandwidth

Inter-site bandwidth must be sized to allow proper bandwidth for the expected workload. By default, data is only written across fault domains in a vSAN stretched cluster. The write I/O profile should determine the required inter-site bandwidth. VMware recommends the inter-site bandwidth to be 1.75x the write bandwidth. This takes data, metadata, and resynchronization bandwidth requirements into account. More detail can be found on core.vmware.com as it relates to inter-site bandwidth: “Bandwidth Calculation.”

Topology

While Layer 2 networking is often used for inter-node vSAN communication, Layer 3 is often used to address the vSAN Witness Host that typically resides in a different location. Traditional vSAN stretched clusters required additional, often complex routing to allow the vSAN nodes and vSAN Witness Host to communicate. Each vSAN interface needed the same MTU setting as every other vSAN interface. Metadata, such as vSAN component placement and fault domain participation, are shared to and from the vSAN Witness Host and vSAN data nodes.

In vSAN 6.7 the WTS feature was introduced for vSAN stretched clusters. This feature directs the communication with the vSAN Witness Host to a different VMkernel port on a vSAN data node, simplifying overall network configuration. vSAN 6.7 Update 1 introduced support for different MTU settings for the data and metadata networks. This benefits customers wishing to deploy a vSAN Witness Host over a slower network while still using jumbo frames for inter-node communication.

VM deployment and placement

Because a vSAN stretched cluster appears to vCenter as a single cluster, no specific intelligence will deploy or keep a VM in one site or another. After deploying a VM on a vSAN stretched cluster, it is important to associate the VM with the site it will normally run in. VM/host groups can be created to have VMs run in one site or the other by using VM/host rules.

FIGURE 9-2: Creating a VM/host rule for use with DRS site affinity settings

Storage policy requirements

In addition, the site-level protection on a per-VM or VMDK basis offered by a stretched cluster configuration, vSAN provides the ability, via assigned storage policy, to prescribe an additional level of local protection. For this additional level of local protection, each site would have its own minimum hosts requirements to meet that desired level of redundancy. This is detailed here: “Per Site Policies.”

Recommendations:  Deploy a vSAN Witness Host/Appliance before attempting to configure the vSAN stretched cluster. Do not configure management and vSAN traffic interfaces on the same network. The exception here is that the management interface can be tagged for vSAN traffic instead of the secondary interface. Choose the proper profile for the number of components expected when deploying a vSAN Witness Appliance. Ensure MTU sizes are uniform across all vSAN interfaces. If using WTS, be certain to configure VMkernel interfaces for Witness traffic to ensure connectivity to the vSAN Witness Host.  And finally, use vmkping to test connectivity when static routes are required.
 

Setting DRS Affinity Settings to Match Site Affinity Settings of Storage Policies

vSAN 6.6 introduced site affinity to allow for data to reside in a single site. The site disaster tolerance rule in the vSphere 6.7 Client, or data locality and primary failures (0) rules in the vSphere 6.5 Web Client, will allow data to reside only on one fault domain or the other.

FIGURE 9-3: Configuring a storage policy rule to be used with site affinity

Using this storage policy configuration will pin only data to the preferred or non-preferred site; it will not pin the VM to either site.  VM/host groups as well as VM/host rules must be configured to ensure that the VM only runs on hosts in the same site as the data.

FIGURE 9-4: Configuring a VM/Host rule for use with DRS site affinity settings

If rules are not configured to prevent the VM from moving to the alternate site, the VM could possibly vMotion to the other site, or restart on the other site as a result of an HA event. If the VM runs in the opposite site from its data, it must traverse the inter-site link for all reads and writes, which is not ideal.

Recommendation: Create VM/host groups and VM/host rules with the “Must run on hosts in group” setting.

For the very latest recommended cluster settings for DRS in a vSAN stretched cluster, refer to Cluster Settings - DRS in the vSAN Stretched Cluster Guide on core.vmware.com.

HA Settings for Stretched Clusters

vSphere HA will restart VMs on alternate hosts in a vSphere cluster when a host is isolated or has failed. HA is used in conjunction with vSAN to ensure that VMs running on partitioned hosts are restarted on hosts actively participating n the cluster.

When configuring HA failures and responses, hosts should be monitored, with VMs being restarted upon a host failure and VMs powered off and restarted when a host is isolated. These settings ensure that VMs are restarted in the event a VM fails or a host is isolated.

FIGURE 9-5: Recommended HA settings for vSAN stretched clusters

Admission control takes available resources into account should one or more hosts fail. In a vSAN stretched cluster, host failover capacity should be set at 50%. The total CPU and memory use for workloads on a vSAN stretched cluster should be never be more than a single site can accommodate. Adhering to this guideline ensures sufficient resources are available should a site become isolated.

FIGURE 9-6: Configuring admission control for a vSAN stretched cluster

Heartbeat datastores are not necessary in vSAN and should be disabled.

FIGURE 9-7: Disabling heartbeat datastores with a vSAN stretched cluster

Advanced options should disable using the default isolation address. An isolation address for each fault domain should be configured.

FIGURE 9-8: Configuring site based isolation addresses for a vSAN stretched cluster

Recommendations: Host failure responses should be set to "Restart VMs" with a response for host isolation set to "Power off and restart VMs"

For the very latest recommended cluster settings for HA in a vSAN stretched cluster, refer to Cluster Settings - vSphere HA in the vSAN Stretched Cluster Guide on core.vmware.com

Creating a Static Route for vSAN in Stretched Cluster Environments

Nodes in a vSAN cluster communicate with each other over a TCP/IP network. vSAN communication uses the default ESXi TCP/IP stack. A default gateway is the address of a designated Layer 3 device used to send TCP/IP traffic to an address outside a Layer 2 network. Standard vSAN cluster configurations often use a Layer 2 network configuration to communicate between nodes. This configuration has no requirement to communicate outside the Layer 2 network. In some situations, where it is required (or desired) to use a Layer 3 network for vSAN communication, static routes are required. The use cases include:

  • Stretched cluster vSAN
  • 2-node vSAN
  • vSAN nodes configured in different Layer 3 networks

In versions prior to vSAN 7 U1, static routes are required to ensure the vSAN-tagged VMkernel interface can properly route Layer 3 traffic. Without static routing in place, the vSAN-tagged VMkernel interface will attempt to use the default gateway for the management VMkernel interface. An example of using static routing can be seen in FIGURE 9-9.

FIGURE 9-9: Defining static routes in a vSAN stretched cluster

The vSAN-tagged VMkernel interfaces must communicate with the vSAN Witness Host, which is only accessible over Layer 3. Static routes ensure data can properly flow from the vSAN “backend” to the vSAN Witness Host.  In vSAN 7 U1 and later, static routes are no longer necessary, as a default gateway override can be set for both stretched cluster and 2-node configurations.  This removes the manual, static routing configuration steps performed at the command line through ESXCLI or PowerCLI, and provides an improved level of visibility to the current host setting. 

Recommendation.  Upgrade to vSAN 7 U1 to simplify the initial configuration and ongoing operation of stretched cluster environments.

More information specific to vSAN stretched cluster network design considerations can be found in the Stretched Cluster Guide on core.vmware.com.

Non-Disruptive Maintenance of One Site in a Stretched Cluster Environment

A vSAN stretched cluster uses a topology that provides an active-active environment courtesy of two site locations. It uses the vSAN data network to allow for VM data to be stored synchronously across two sites, and vMotion to provide the ability for the VM instance to move from one site to the other without disruption. Maintenance on vSAN hosts in a stretched cluster can occur in the same manner as a standard vSAN cluster: A discrete host is entered into maintenance mode, with activities performed, then back in production after exiting maintenance mode. In a stretched cluster, there may be times where maintenance to an entire site may be necessary: Perhaps there is new switchgear being introduced, or some other maintenance to the inter-site link. If there are sufficient compute and storage capacity resources to accommodate all workloads, a stretched cluster environment can allow for site
maintenance to occur on one site without interruption.

Site Maintenance procedure

The maintenance of a single site of a stretched cluster can be performed in a non-disruptive manner. All virtual workloads maintain uptime during the maintenance activity, assuming sufficient resources to do so.

Recommendation: Temporarily shutting down non-critical workloads will provide additional relief of minimizing traffic across any participating inter-site links.

The guidance for site maintenance of a stretched cluster assumes the following conditions:

  • Sufficient resources to run all workloads in just one site. If there is concern of this, some VMs could be temporarily powered down to reduce resource utilization.
  • The vSAN stretched cluster configuration is enabled and functioning without any error conditions.
  • All VMs use a storage policy that protect across sites. For example, the storage policy uses the option of "site disaster tolerance" of "Dual site mirroring (stretched cluster)" using an FTT=1.
  • vMotion between sites is operating correctly.
  • The appropriate DRS host groups for each site are configured and contain the correct hosts in each site
  • The appropriate DRS VM groups for each site are configured and contain the correct VMs to be balanced across sites
  • DRS site affinity "should run" rules are in place to define the preference of location that the VMs should run in a non-failure scenario.

Assuming the conditions described above, the general guidance for providing site maintenance is as follows. Note that any differences in the assumptions described may alter the steps accordingly.

  1. Document existing host groups, VM groups, and VM/Host rules in DRS.
  2. Update DRS site affinity rules for all VM groups defined in DRS. Ensure that a.) they are changed from "should run" to "must run" and b.) set the respective host group in the VM/Host rule to the site that will remain up during the maintenance. DRS migrations will begin shortly after these changes are made.
  3. Allow for DRS to migrate the workloads to complete and confirm that all of the workloads are running in the correct site, and still compliant with their intended storage policy.
  4. In the site selected for maintenance, begin to place those hosts into maintenance mode. Either "Ensure Accessibility" or "No data migration" options can be chosen, as in this scenario they will behave the same if all data is in fact mirrored across sites.
  5. Perform the desired maintenance at the site.
  6. Once complete, take all hosts out of maintenance mode.
  7. Wait for resynchronizations across sites to begin, and complete before proceeding to the next step, which will help minimize inefficient data paths from VMs prematurely moving back to their respective sites prior to data being fully resynchronized.
  8. Once resynchronizations are complete, change the settings that were modified in step #2 back to their original settings. VMs will migrate back to their respective sites based on DRS timing schedules. Allow for DRS migrations to complete and verify the effective result matches the intentions.

Recommendation: Double-check which hosts participate in each site of the stretched cluster. It can often be easy to accidently select the wrong hosts when entering an entire site into maintenance mode.

Another optional step to control the timing of DRS activities would be through temporarily changing the setting from "fully automated" to "partially automated." This would be a temporary step that would need to be returned to its original value after maintenance is complete.

If your stretched cluster is using site mirroring storage policies, and organization is uncomfortable with reducing the level of resilience during this maintenance period, you may wish to consider introducing storage policies that use secondary levels of protection: E.g. Dual site mirroring with an additional level of FTT applied at the host level in each site. Resilience during site maintenance would be reduced, but would still provide resilience on a host level during this maintenance period. If this is of interest, it is recommended that these storage policies be adjusted well prior to any planned site maintenance activities so that vSAN has the opportunity to apply storage policies that use a secondary level of protection to all objects in the cluster.

Summary

vSAN stretched clusters allow for administrators to perform site maintenance while providing full availability of data. The exact procedures may vary depending on environmental conditions, but the guidance provided here can serve as the foundation for site maintenance activities for a vSAN stretched cluster.

Decommission a vSAN Stretched Cluster

Decommissioning a vSAN stretched cluster is not unlike a decommissioning of a standard or 2-node vSAN cluster. The decommissioning process most often occurs when business changes require an adjustment to the underlying topology. Perhaps a few smaller vSAN clusters were going to be merged into a larger vSAN cluster, or maybe the cluster is simply being decommissioned due to replacement vSAN cluster already deployed into production.

Note that this task should not be confused with converting a vSAN stretched cluster to a standard vSAN cluster. That guidance is documented already in section 9 of this Operations Guide.

Key Considerations

Since this task involves permanent shutdown of the hosts that make up a vSAN cluster, an assumption is that all VMs and data has been migrated off of this cluster at some point. It will be up to the administrator to verify that this prerequisite has been performed.

Once the hosts are no longer housing any VM data, the vSAN performance service should also be disabled. The vSAN performance service houses its performance data much like a VM does, thus, it must be disabled in order to properly decommission the respective disk groups in the hosts that comprise the cluster.

Hosts can be decommissioned from the cluster by first entering them into maintenance mode. From there, disk groups can be deleted, which will clear the vSAN metadata on all of the capacity devices in the disk group. Disk groups can also be deleted using the vSphere host client and PowerCLI. Examples of such cases can be found in the PowerCLI Cookbook for vSAN at: https://vmware.com/go/powercli4vsan 

Recommendation: Ensure that all workloads and respective data has been fully evacuated from the vSAN stretched cluster prior to decommissioning. With vSAN 7, this could include many other data types beyond VMs, including first-class disks for cloud native storage, NFS shares through vSAN file services, and iSCSI LUNs through the vSAN iSCSI service. The decommissioning process will be an inherently destructive process to any data remaining on the disk groups in the hosts that are being decommissioned.

Summary

The exact steps of decommissioning of a vSAN cluster remains similar across all types of vSAN topologies. The area of emphasis should be in ensuring that all of the VMs and data housed on a vSAN cluster has been fully evacuated prior to any decommissioning begins.

Section 10: 2-Node

Replacing vSAN Witness Host in 2-Node Cluster

vSAN storage policies with the FTT=1 using mirroring require 3 nodes / fault domains available to meet storage policy compliance. This means that 2-node vSAN clusters require a third host to contribute to the 2-node cluster. A vSAN Witness Host is used to provide quorum. The vSAN Witness Host stores Witness components for each vSAN object. These Witness components are used as a tiebreaker to assist with ensuring accessibility to a vSAN object, such as a VMDK, VSWP, or namespace, in the event of a failure or network partition.

One of the two nodes in the cluster is designated as the preferred node. When the two nodes in the cluster cannot communicate with each other, they are considered partitioned from each other.

  • Data nodes isolated from each other: If both the preferred and non-preferred nodes can communicate with the vSAN Witness Host, the cluster will continue to operate with the preferred node and the vSAN Witness Host while the non-preferred node is isolated.
  • Preferred node failure or complete isolation: If the preferred node fails or is completely isolated from the non-preferred node and the vSAN Witness Host, the non-preferred node becomes the master with the vSAN Witness Host. Note that either node in a 2-node cluster may be designated to be the preferred and may be changed when required.

This third host can be a physical ESXi host or a VMware-provided virtual appliance, which is referred to as a vSAN Witness Appliance.

Replacing a vSAN Witness Host is not a typical or regular task, but one that could be required in some of the following scenarios:

  • Physical host is being decommissioned
    • Due to ESXi CPU requirements (e.g., upgrading from vSAN 6.6 to 6.7+)
    • Host is at the end of a lease lifecycle
  • vSAN Witness Appliance
    • Has been deleted
    • Has been corrupted
    • Does not meet the vSAN Witness Component count requirement in cases where a cluster has grown in component count

The process of replacing a vSAN Witness Host is relatively simple when a suitable vSAN Witness Host has been configured. The fault domains section in the vSAN configuration UI provides a quick and easy mechanism to swap the currently configured vSAN Witness Host with an alternate host for this purpose.

FIGURE 10-1: Using the “Change Witness Host” option to replace the currently configured witness host in stretched cluster

If the vSAN Cluster Level Object Manager Daemon (CLOMD) repair delay has expired (typically 60 minutes), Witness components will be recreated on the new vSAN Witness Host. If the CLOMD repair delay has not expired, the Witness components will be recreated when the timer has expired. Witness components can be manually recreated in the vSAN health UI using the “Repair objects immediately” operation.

More detailed information can be found in the vSAN 2 Node Guide here: “Replacing a Failed vSAN Witness Host.”

Recommendations:  If you wish to quickly swap to a new vSAN Witness Host, it is important to have the deployment process documented with specific networking requirements for your environment. Deploying a secondary/alternate “offline” vSAN Witness Host could streamline this process.  Also, always ensure that the “Repair objects immediately” operation is performed after the vSAN Witness has been replaced/swapped.

Configure a 2-Node Cluster

2-node vSAN clusters do share some similarities to vSAN stretched clusters, but with a few design and operational differences.

Similarity to stretched clusters

2-node vSAN clusters inherit the same architecture as a vSAN stretched cluster. The stretched cluster architecture includes nodes in two separate fault domains for availability and a third node, called the vSAN Witness Host, to assist in the event they are isolated from each other. Traditional vSAN clusters are typically located in a single location. Stretched cluster resources are typically located in two distinct locations, with the tiebreaker node in a third location. 2-node vSAN cluster resources are often in a single location, but the vSAN Witness Host often resides in a distinct alternate location.

The biggest challenges that come to mind when initially configuring 2-node or stretched vSAN clusters include having a vSAN Witness Host configured for use and proper networking.

vSAN Witness Host

The vSAN Witness Host can be physical or the VMware-provided vSAN Witness Appliance. This “host” must have proper connectivity to vCenter and to each host in the vSAN cluster. Communication with the vSAN Witness Host is always Unicast and requires additional ports open between itself and the vSAN cluster.

vSAN Witness Host networking

Connectivity to the vSAN Witness Host from the vSAN 2-node cluster is typically over Layer 3 networking. In vSAN 6.5 the WTS feature was introduced for 2-node to allow for cluster and Witness communication when it is desirable for a 2-node cluster to have data nodes communicate directly connected to each other. The vSAN Witness Host will only ever have a VMkernel interface tagged for vSAN Traffic. vSAN uses the same TCP stack as the management VMkernel interface. When using Layer 3 to communicate with a vSAN cluster, static routing is required. It is always best to isolate management traffic from workload traffic. Organizations may choose to have vSAN Witness Host communication with vSAN clusters performed over the same network. This is supported by VMware but should align with an organization’s security and risk guidance. If the management VMkernel interface is tagged for vSAN traffic, static routing is not required.

Additionally, when configuring a vSAN Witness Host to communicate with a vSAN cluster, if communication with the vSAN cluster is performed using a separate VMkernel interface, that interface cannot be on the same network as the management interface. A multi-homing issue occurs, causing vSAN traffic to use the untagged management interface. This is not a vSAN-centric issue and is detailed in-depth in “Multi-homing on ESXi/ESX.”

In vSAN 7 U1 and later, static routes are no longer necessary, as a default gateway override can be set for both stretched cluster and 2-node configurations.  This removes the manual, static routing configuration steps performed at the command line through ESXCLI or PowerCLI, and provides an improved level of visibility to the current host setting. 

Recommendation.  Upgrade to vSAN 7 U1 to simplify the initial configuration and ongoing operation of stretched cluster environments.

vSAN Witness Host sizing

The vSAN Witness Appliance can deployed using one of three profiles: tiny, normal, and large. Each of these are sized according to the number of vSAN components it can support. The tiny profile is typically used with 2-node vSAN clusters, as they seldom have more than 750 components.

Networking configuration settings

Before configuring a 2-node vSAN cluster, it is important to determine what type of networking will be used. The vSAN Witness Host must communicate with each vSAN cluster node. Just as the vSAN Witness Host has a requirement for static routing, vSAN data nodes also require static routing to communicate with the vSAN Witness Host.

WTS

Configurations using WTS should have Witness traffic configured before using the vSAN wizard or Cluster Quickstart in the vSphere UI.

vSAN VMkernel interfaces are required to have the same MTU configuration. In use cases where the vSAN Witness Host is in a different location, technologies such as an IPSEC VPN may be used. Overhead, such as the  additional headers required by an IPSEC VPN, could reduce the MTU value across all nodes, as they are required to have the same MTU value. vSAN 6.7 Update 1 introduced support for mixed MTU sizes when using WTS. Witness-tagged VMkernel interface MTUs must match the MTU of the vSAN-tagged VMkernel interface on the vSAN Witness Host. Having the vSAN Witness Host and networking in place before creating a 2-node vSAN cluster will help ensure the configuration succeeds.

Recommendations:  Deploy a vSAN Witness Host/Appliance before attempting to configure the 2-node cluster, and do not configure management and vSAN traffic interfaces on the same network. The exception here is that the management interface can be tagged for vSAN traffic instead of the secondary interface.

Decommission a vSAN 2-Node Cluster

Decommissioning a 2-node vSAN cluster is not unlike a decommissioning of a standard or vSAN stretched cluster. The decommissioning process most often occurs when business changes require an adjustment to the underlying topology. Perhaps a few smaller vSAN clusters were going to be merged into a larger vSAN cluster, or maybe the cluster is simply being decommissioned due to replacement vSAN cluster already deployed into production. Note that this task should not be confused with converting a 2-node cluster to a standard vSAN cluster.

Key Considerations

Since this task involves permanent shutdown of the hosts that make up a vSAN cluster, an assumption is that all VMs and data has been migrated off of this cluster at some point. It will be up to the administrator to verify that this prerequisite has been performed. Once the hosts are no longer housing any VM data, the vSAN performance service should also be disabled. The vSAN performance service houses its performance data much like a VM does, thus, it must be disabled in order to properly decommission the respective disk groups in the hosts that comprise the cluster.

Hosts can be decommissioned from the cluster by first entering them into maintenance mode. From there, disk groups can be deleted, which will clear the vSAN metadata on all of the capacity devices in the disk group. Disk groups can also be deleted using the vSphere host client and PowerCLI. Examples of such cases can be found in the PowerCLI Cookbook for vSAN at: https://vmware.com/go/powercli4vsan 

Once the 2-node cluster is fully decommissioned, then the witness host appliance can be that was responsible for providing quorum for the 2-node cluster can also be decommissioned.

Recommendation: Ensure that all workloads and respective data has been fully evacuated from the vSAN stretched cluster prior to decommissioning. With vSAN 7, this could include many other data types beyond VMs, including first-class disks for cloud native storage, NFS shares through vSAN file services, and iSCSI LUNs through the vSAN iSCSI service. The decommissioning process will be an inherently destructive process to any data remaining on the disk groups in the hosts that are being decommissioned.

Summary

The exact steps of decommissioning of a vSAN cluster remains similar across all types of vSAN topologies. The area of emphasis should be in ensuring that all of the VMs and data housed on a vSAN cluster has been fully evacuated prior to any decommissioning begins.

Creating a Static Route for vSAN in a 2-Node Cluster

Standard vSAN cluster configurations most often used a single layer 2 network to communicate between hosts. Thus, all hosts are able to communicate with each other without any layer 3 routing. In other vSAN cluster configurations, such as stretched clusters and 2-node clusters, vSAN hosts must be able to communicate with other hosts in the same cluster, but living in a different layer 3 network. vSAN uses a dedicated VMkernel that cannot use its own default gateway. Therefore, in these topologies, static routes must be set to ensure that vSAN traffic can properly communicate with all hosts in the vSAN cluster, regardless of what layer 3 network is used.

In vSAN 7 U1 and later, static routes are no longer necessary, as a default gateway override can be set for both stretched cluster and 2-node configurations.  This removes the manual, static routing configuration steps performed at the command line through ESXCLI or PowerCLI, and provides an improved level of visibility to the current host setting. 

Recommendation.  Upgrade to vSAN 7 U1 to simplify the initial configuration and ongoing operation of stretched cluster environments.


In a 2-node arrangement, these static routes ensure that vSAN tagged VMkernel traf fic can properly route to their intended destination. Without the static route, the vSAN tagged traffic will attempt to use the default gateway of the management VMkernel interface, which may result in no communication, or reduced performance as a result of traversing over a non-optimal path.

Configuring static routes

Static routing must be configured on each host in the participating vSAN cluster. It can be configured using the esxcfg-route command line utility or using PowerCLI with the Set-VMHostRoute cmdlet.

FIGURE 10-2: Using the esxcfg-route command line utility

After setting a static route, ensure the VMkernel interface can use vmkping to communicate with a destination address that requires the static route.

FIGURE 10-3: Verifying with vmkping

Recommendation: If you are running into network partition health check alerts, or any other issues discovered in the vSAN Health checks, review and verify all static route settings for all hosts in the cluster. A large majority of issues seen by VMware Global Supports Services for stretched cluster and 2-node configurations are a result of improper or no static routes set on one or more hosts in the cluster.

Summary

The flexibility of vSAN allows for it to provide storage services using many different topologies. When a topology (such as stretched clusters, or 2-node configurations) is using more than one layer 3 network, the use of static routes on the ESXi hosts in the vSAN cluster is a critical step to ensure proper communication between hosts.

HA Settings for 2-Node vSAN Clusters

vSphere High Availability (HA) is what provides the failure handling of VMs running on given host in a vSphere cluster. It will restart VMs on other hosts in a vSphere cluster when a host has entered some form of a failure, or isolation condition.

vSAN and vSphere HA work together to ensure that VM's running in a partition host or hosts are restarted on the hosts that have quorum. vSAN 2-Node configurations use custom HA setting because of their unique topology. While this is typically an "initial configuration" item for a cluster, it is not uncommon to see this set incorrectly and it should be a part of typical operating procedures and checks for a 2-Node cluster.

vSphere HA Settings

For 2-Node vSAN environments, the "Host Monitoring" toggle should be enabled. VM's being restarted upon a host failure, and VM's should be powered off and restarted when a host is isolated. These settings ensure that VM's are restarted in the event a VM fails, or a host is isolated.

FIGURE 10-4: Cluster HA settings

Admission Control accounts for the availability of resources should one or more hosts fail. In a 2-Node vSAN cluster, since only a single host can fail and still maintain resources, the host failover capacity should be set to 50%. The total utilization of CPU and Memory for workloads on a 2-Node vSAN Cluster should be never be more than a single node can accommodate. Adhering to this guideline will ensure sufficient resources are available should a node fail.

FIGURE 10-5: Cluster HA Admission Control settings

When attempting to place a host in Maintenance Mode in a 2-Node vSAN Cluster, the above Admission Control settings will prevent VM's from moving to the alternate host.

FIGURE 10-6: Admission Control warnings

To allow a host to go into Maintenance Mode, it is necessary to either Disable HA completely, or Disable Admission Control temporarily while maintenance is being performed.

FIGURE 10-7: Temporarily disabling Admission Control

Recommendation: Do not attempt to set an isolation address for a 2-Node vSAN cluster. Setting of an isolation address is not applicable to this topology type.

Summary

2-Node vSAN topologies use HA configuration settings that may be unique in comparison to traditional and stretched cluster environments. In order to ensure that a 2-Node environment behaves as expected under a host failure condition, be sure that these settings are configured correctly. 

Advanced Options for 2-Node Clusters

vSAN stretched clusters are designed to service reads from the site in which the VM resides in order to deliver low read latency, and reduce traffic over an inter-site link. 2-Node vSAN clusters are built on the same logic as a stretched cluster. One can think of each host as a single site, but residing in the same physical location. In these 2-Node configurations, servicing the reads from just a single host would not reflect the capabilities of the topology (2 nodes, directly connected). Fortunately, there are settings available to ensure that reads are serviced optimally in this topology

Advanced Options

When configuring 2-Node clusters, disabling the "Site Read Locality" will allow reads to be serviced by both hosts. Disabling "Site Read Locality" is the preferred setting for 2-Node configurations, whereas enabling "Site Read Locality" is the preferred setting for stretched cluster configurations.

FIGURE 10-8: The Site Read Locality cluster setting

Both all-flash and hybrid based 2-Node vSAN clusters will benefit from disabling "Site Read Locality." All-flash vSAN cluster use 100% of the cache devices as a write buffer, but read requests will also check the write buffer prior to the capacity devices. Allowing reads to be serviced from both hosts means that there is a much higher likelihood that read requests could be served by recently written data in the buffer across both hosts, thereby improving performance.

Recommendation: For environments with dozens or even hundreds of 2-Node deployments, use PowerCLI to periodically check and validate recommended settings such as "Site Read Locality" and vSphere HA settings.

Summary

vSAN 2-Node topologies borrow many of the same concepts used for vSAN stretched clusters. There are subtle differences in the topology of a 2-Node topology that necessitates the adjustment of the "Site Read Locality" setting for optimal performance

Section 11: vCenter Maintenance and Event Handling

Upgrade Strategies for vCenter Server Powering One or More vSAN Clusters

It is common for vCenter to host multiple vSAN clusters, these could be at different ESXi versions to each other, and as such, different vSAN versions. This is a fully supported configuration, but it is a good idea to ensure that your vCenter and ESXi versions are compatible with one-another.

The software compatibility matrix easily shows what versions of ESXi are compatible with your target vCenter. (As an example, ESXi 5.5 and vSAN 5.5 are not compatible with vCenter 6.7 U2—so that will need upgraded to at least 6.0 first.)

FIGURE 11-1: Interoperability checks courtesy of VMware’s software compatibility matrix

Always update your vCenter before your ESXi hosts. If you intend to migrate to 6.7 U2—and your current hosts are on a mixture of 6.5 U3 and 6.7 U1—you would upgrade your vCenter to 6.7 U2 before upgrading your ESXi hosts to 6.7 U2. vSAN 6.7 U3 does provide some improved flexibility in supporting vCenter editions within the same release line of hosts, but this is geared more toward a day -0 patching of hosts.

The recommendation to always update vCenter prior to vSAN hosts to the next major version still applies. The reason for this is that, as versions increase and API versions and compatibilities change, the only way to guarantee that Center can still make sense of its communications with the ESXi hosts after upgrade is if they are on the same software version.

An example that would break compatibility would be if you had a cluster at 6.7 U1 and vCenter at 6.7 U1, if you upgraded the hosts to ESXi 6.7 U2 while the vCenter was still at 6.7 U1, then vCenter would have no way of knowing how to talk to the upgraded hosts if API calls changed between versions, which is likely. Good upgrade hygiene for VMware solutions is threefold. First, verify that your vSAN host’s components are supported on the target version of ESXi and vSAN by checking the vSAN Hardware Compatibility List (HCL) and if any firmware upgrades are required, do those.

Next, make sure that all your integrated components (such as vRealize Automation, NSX, vRealize Operations, and vCloud Director) are all compatible with the target version of vCenter and ESXi by checking the product interoperability matrix (for example, the interoperability matrices for vCenter 6.7 U2 and ESXi 6.7 U2). Finally, once you have established the product versions for each solution you have in your environment, you can follow the upgrade workflow/guide to ensure all components are upgraded in the right order.

Replacing vCenter Server on Existing vSAN Cluster

There may be instances in which you need to replace the vCenter server that hosts some vSAN clusters. While vCenter acts as the interaction point for vSAN and is used to set it up and manage it, it is not the only source of truth and is not required for steady-state operations of the cluster. If you replace the vCenter server, your workloads will continue to run without it in place.

FIGURE 11-2: Understanding the role of vCenter in a vSAN cluster

Replacing the vCenter server associated with a vSAN cluster can be done, but is not without its challenges or requisite planning. There is an excellent blog covering the operations behind this, and there was a Knowledge Base article recently released with the process listed in detail.  Below are the broad strokes of a vSAN cluster migration to a new vCenter server (taken from the Knowledge Base).

Scenario: An all-flash vSAN cluster on 6.7 U2 needs migrated to a new vCenter server—DD&C enabled.

  • Ensure the target vCenter server has the same vSphere version as the ESXi hosts, or higher (same is preferable).
  • Create a new cluster on the new vCenter server with the same settings as the source cluster (vSAN enabled, DD&C, encryption, HA, DRS) and ensure the Disk Claim Mode is set to Manual.
  • If you are using a Distributed Switch on the source vCenter server, export the vDS and import it into the new vCenter server, ensure “Preserve original distributed switch port group identifiers” is NOT checked upon import.
  • Recreate all SPBM policies on the target vCenter server to match the source vCenter server.
  • Run esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListUpdates on all hosts (prior to 6.6).
  • Disconnect all hosts from the source vCenter server.
  • Remove hosts from the source vCenter server inventory.
  • Add hosts into the new vCenter server.
  • Drag the hosts into the new cluster.
  • Verify hosts and VMs are contactable.
  • Run esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListUpdates on all hosts.
  • Configure hosts to use the imported vDS one by one, ensuring connectivity is maintained.
  • Reconfigure a VM with the same policy as source—ensuring no resynchronization when the VM is reconfigured.
  • For each SPBM policy, reconfigure one of each VM as a test to ensure no resynchronization is performed.
  • Once verified, reconfigure all VMs in batches with their respective SPBM policies.

Recommendation: If you are not completely comfortable with the above procedure and doing this in a live environment, please open a ticket with GSS and have them guide you through the procedure.

Summary

Replacing a vCenter server for an existing vSAN cluster is an alternate method for restoring vCenter should a backup of vCenter not be available. With a little preparation, a vCenter server can be replaced with a clean installation and the vSAN management plane will continue to operate as expected.

Protecting vSAN Storage Policies

Storage policy based management (SPBM) is a key component of vSAN. All data that is stored on a vSAN cluster: VM data, file services shares, first-class disks, and iSCSI LUNs are stored in the form of objects. Each one of the objects stored in a vSAN cluster have an assignment of a single storage policy that helps define the outcome that governs how the data is placed.

Storage policies are a construct of a vCenter server. Similar to vSphere Distributed Switches (vDS), they are defined and stored on a vCenter server, and can be applied to any supporting cluster that the vCenter server is managing. Therefore, when replacing a vCenter server in an already existing cluster (as described in "Replace vCenter server on existing vSAN cluster" in this sect ion of the operations guide), the storage policies will either need to be recreated, or imported from a previous time in which they were exported from vCenter.

Protecting storage policies in bulk form will simplify the restoration process, and will help prevent unnecessary resynchronization from occurring due to some unknown difference in the storage policy definition.

Procedures

The option of exporting and importing storage policies is not available in the UI, but a simple PowerCLI script will be able to achieve the desired result. Full details and additional options for importing and exporting policies using PowerCLI can be found in the PowerCLI Cookbook for vSAN.

Back up all storage policies managed by a vCenter server:

# Back up all storage policies

# Get all of the Storage Policies

$StoragePolicies = Get-SpbmStoragePolicy

# Loop through all of the Storage Policies

Foreach ($Policy in $StoragePolicies) {

# Create a path for the current Policy

$FilePath = "/Users/Admin/SPBM/"+$Policy.Name+".xml"

# Remove any spaces from the path

$FilePath = $FilePath -Replace (' ')

# Export (backup) the policy

Export-SpbmStoragePolicy -StoragePolicy $Policy -FilePath $FilePath

}

Importing or restoring all storage policy XML files that reside in a single directory

# Recover the Policies in /Users/Admin/SPBM/ $PolicyFiles = Get-ChildItem “/Users/Admin/SPBM/” -Filter *.xml

# Enumerate each policy file found

Foreach ($PolicyFile in $PolicyFiles) {

# Get the Policy XML file path

$PolicyFilePath = $PolicyFile.FullName

# Read the contents of the policy file to set variables

$PolicyFileContents = [xml](Get-Content $PolicyFilePath)

# Get the Policy’s name & description $PolicyName = $PolicyFileContents.PbmCapabilityProfile.Name.’#text’

$PolicyDesc = $PolicyFileContents.PbmCapabilityProfile.Description.’#text’

# Import the policy

Import-SpbmStoragePolicy -Name $PolicyName -Description $PolicyDesc -FilePath $PolicyFile

}

When restoring a collection of storage policies to a newly built vCenter server, it will make most sense to restore them at the earliest possible convenience so that vSAN has the abilities to associate the objects with the respective storage policies.

Recommendation: Introduce some scripts to automate the process of regularly exporting your storage policies to a safe location, on a regular basis. It is a practice that is highly recommended for the vDS' managed by vCenter, and should be applied to storage policies as well.

Summary

Exporting storage policies is an optional safeguard to make the process of introducing a new vCenter server to an existing vSAN cluster easier. The effort to streamline this protection up front will make the steps for replacing a vCenter server more predictable and easier to document internal runbooks for such events.

Protecting vSphere Distributed Switches Powering vSAN

Virtual switches and the physical uplinks that are associated with them are the basis for connectivity in a vSAN powered cluster. Connectivity between hosts is essential for vSAN clusters since the network is the primary storage fabric, as opposed to three-tier architectures that may have a dedicated storage fabric.

VMware recommends the use of vSphere Distributed Switches (VDS) for vSAN. Not only do they provide additional capabilities to the hosts, they also provide a level of consistency, as the definition of the vSwitch and the associated port groups are applied to all hosts in the cluster. Since a vDS is a management construct of vCenter, it is recommended to ensure these are protected properly, in the event of unknown configuration changes, or if vCenter server is being recreated and introduced to an exist ing vSAN cluster.

Procedures

The specific procedures for exporting, importing, and restoring VDS configurations can be found at "Backing Up and Restoring a vSphere Distributed Switch Configuration" at docs.vmware.com. The process for each respective task is quite simple, but it is advised to become familiar with the process, and perhaps experiment with a simple export and restore in a lab environment to become more familiar with the task. This will help minimize potential confusion for when it is needed most. Inspecting the data.xml file included in the zip file of the backup can also provide a simple way to review the settings of the vDS.

Recommendation: The VDS export option provides the ability to export just the vDS, or the vDS and all port groups. You may find it helpful to perform the export twice, using both options. This will allow for maximum flexibility in any potential restoration activities.

Similar to the protection of vSAN storage policies, automating this task would be ideal. A sample import and export script can be found in the samples section of code.vmware.com at: https://code.vmware.com/samples/1120/import-export-vdswitch?h=vds 

vDS can apply to more than one cluster. In any type of scenarios in which the VDS is restored from a backup, it will be important for the administrator to understand what clusters and respective hosts it may impact. Understanding this clearly will help minimize the potential impact of unintended consequences, and may also influence the naming/taxonomy of the vDS used by an organization, as the number of clusters managed by vCenter continues to grow.

Summary

Much like VMware's guidance for protecting storage policies contained in vCenter, VMware recommends all vSphere Distributed Switches are protected in some form or another. Ideally, this should occur in an automated fashion at regular intervals to ensure the backups are up to date.

Section 12: Upgrade Operations

Upgrading and Patching vSAN Hosts

Upgrading and patching vSAN hosts is very similar to the process for vSphere hosts using traditional storage. The unique role that vSAN plays means there are additional considerations to be included in operational practices to ensure predictable results.

vSAN is a cluster-based storage solution. Each ESXi host in the vSAN cluster participates in a manner that provides a cohesive, single storage endpoint for the VMs running in the cluster. Since it is built directly into the hypervisor, ESXi depends heavily on the expected interaction between hosts to provide a unified storage system. This can be dependent on consistency of the following:

  • The version of ESXi installed on the host.
  • Firmware versions of key components, such as storage controllers, NICs, and BIOS.
  • Driver versions (VMware-provided “inbox” or vendor-provided “async”) for the respective devices.

Inconsistencies between any or all of these may change the expected behavior between hosts in a cluster. Therefore, avoid mixing vSAN/ESXi versions in the same cluster for general operation. Limit inconsistency of versions on hosts to the cluster upgrade process, where updates are applied one host at a time until complete. The Knowledge Base contains additional recommendations on vSAN upgrade best practices.

Method of updates

In versions prior to vSphere 7, the vSphere Update Manager (VUM) was the primary delivery method for updating vSphere and vSAN clusters. VUM centralizes the updating process, and vSAN’s integration with VUM allows for updates to be coordinated with the HCL and the vSphere Release Catalog so that it only applies the latest version of vSAN that is compatible with the hardware.  VUM also handles updates of firmware and drivers for a limited set of devices. For more information on the steps that VUM takes to update these devices, see the section “Using VUM to Update Firmware on Selected Storage Controllers.” Note that NIC drivers and firmware as well as BIOS firmware are not updated by VUM, nor monitored by the vSAN health UI, but they play a critical role in the functionality of SAN.

While there have been recent additions to a very limited set of NIC driver checks (as described in “Health Checks—Delivering a More Intelligent vSAN and vSphere”), firmware and driver updates using VUM is a largely manual process. Ensure that the correct firmware and drivers are installed, remain current to the version recommended, and are a part of the cluster lifecycle management process.

vSphere 7 introduced the new VMware vSphere Lifecycle Manager (vLCM). This is an entirely new solution for unified software and firmware management that is native to vSphere.  vSphere Lifecycle Manager, or vLCM is the next-generation replacement to vSphere Update Manager (VUM), and is built off of a desired-state, or declarative model, and will provide lifecycle management for the hypervisor and the full stack of drivers and firmware for the servers powering your data center.  vLCM is a powerful new approach to creating simplified consistent server lifecycle management at scale.  vLCM was built with the needs of vendors in mind.  VUM and vLCM coexist in versions vSphere 7 and vSphere 7 U1, with the intention of vLCM eventually being the only method of software lifecycle management for vSphere and vSAN.

vLCM has the capability of delivering updates beyond the scope of VUM, including NIC drivers, etc.  As of vSphere 7 U1, they are not a part of the validation and health check process.

Recommendation: Focus on efficient delivery of services during cluster updates as opposed to speed of update. vSAN restricts parallel host remediation. A well-designed and -operating cluster will seamlessly roll through updating all hosts in the cluster without interfering with expected service levels.

Viewing vSphere/vSAN version levels and consistency

Hypervisor versions and patch levels can be found in a number of different ways.

vCenter—Version information can be found by clicking on the respective hosts within the vSAN cluster and viewing the “Summary” or “Updates” tab

FIGURE 12-1: Viewing a hypervisor version of a host in vCenter

PowerCLI—PowerCLI can be used to fetch a wide variety of host state information, as well as provide a vehicle for host patch remediation and verification. The “PowerCLI Cookbook for vSAN” offers practical examples (on page 50) of patch management with VUM using Power CLI.

A PowerCLI script called VCESXivSANBuildVersion.ps1 will provide version level information per cluster.

FIGURE 12-2: Viewing a hypervisor version of multiple hosts using PowerCLI

vRealize Operations—vRealize Operations includes a “Troubleshoot vSAN” dashboard which enumerates the hosts participating in a selected vSAN cluster, and provides some basic version level information.

FIGURE 12-3: Viewing host details of a vSAN cluster in vRealize Operations

Upgrades and host restart times

The host upgrade process may consist of one or more host reboots, especially as firmware and driver updates become more common in the upgrade workflow. A host participating in a vSAN cluster typically takes longer to restart than non-vSAN hosts, as vSAN digests log entries in the buffer to generate all required metadata tables, but has been improved significantly in vSAN 7 U1. This activity is visible in the DCUI. The default vSAN data-migration option when placing a host into maintenance mode manually or using VUM is “Ensure accessibility.” This minimizes data movement during the maintenance process to ensure data remains accessible but less resilient, and is typically the most appropriate option to use for most maintenance scenarios.

vSAN will hold off on rebuilding any data to meet storage policy compliance for 60 minutes. Depending on the updates being performed, and the host characteristics, temporarily increasing the “Object Repair Timer” may reduce resynchronization activity and make the host update process more efficient.

FIGURE 12-4: Setting the Object Repair Timer at the vSAN cluster level in vCenter

It is recommended that the Object Repair Timer remain at the default of 60 minutes in most cases, but it can be changed to best meet the needs of the organization.

Health checks

The vSAN health service provides a number of health checks to ensure the consistency of the hypervisor across the cluster. This is an additional way to alert for potential inconsistencies. The vSAN health checks may also show alerts of storage controller firmware and driver inconsistencies. Notable health checks relating to updates include:

  • Customer Experience Improvement Program (necessary to send critical updates regarding drivers and firmware)
  • vCenter server up-to-date
  • vSAN build recommendation engine health
  • vSAN build recommendation
  • vSAN release catalog
  • vCenter state is authoritative
  • vSAN software version compatibility
  • vSAN HCL DB up-to-date
  • SCSI controller is VMware certified
  • Controller is VMware certified for ESXi release
  • Controller driver is VMware certified
  • Controller firmware is VMware certified
  • vSAN firmware provider health
  • vSAN firmware version recommendation

Recommendation: Keep vCenter server running the very latest version, regardless of the version of vSAN clusters it is running. vSphere and vSAN hosts require a version of vCenter that is equal to or newer than the hosts managed. Updating the hosts without updating the vCenter server may lead to unexpected operational behaviors. Running the latest version of vCenter will also provide environments running multiple clusters of vSAN to phase in the latest version per cluster, in a manner and time frame that works best for the organization.

Host updates may take a while to complete the full host restart and update process. Using the DCUI during host restarts can help provide better visibility during this stage. See the section “Restarting a Host in Maintenance Mode” for more details.

Summary

A vSAN cluster, not the individual hosts, should be viewed as the unit of management. In fact, with the all new vLCM, this is how it treats the upgrade process.  During normal operations, all vSAN host s that compose a cluster should have matching hypervisor, driver, and firmware versions. Host upgrades and patches should be performed per cluster to provide consistency of hypervisor, firmware, and driver versions across the hosts in the cluster.

Upgrade considerations when using HCI Mesh

In versions prior to vSAN 7 U1, updating hosts in a vSAN cluster typically had an impact-domain of just the VMs powered by the given cluster.  In vSAN 7 U1, HCI Mesh was introduced, which allows the ability to borrow storage capacity resources so that a VM using compute resources on one vSAN cluster can use the storage resources of another cluster.  The Skyline Health Check includes several checks to ensure that cluster satisfies all prerequisites for HCI Mesh.  One of the relevant checks includes is referred to as:  "vSAN format version supports remote vSAN." 

Recommendation:  While not required, it may be best to factor in clusters using HCI Mesh, and coordinate the updates of clusters with an HCI Mesh relationships.  Determining the timing and order of these cluster upgrades may improve the update experience.

Multi-Cluster Upgrading Strategies

While VMware continues to introduce to vSAN all new levels of performance, capabilities, robustness, and ease of use, the respective vSAN clusters must be updated to benefit from these improvements. While the upgrading process continues to be streamlined, environments running multiple vSAN clusters can benefit from specific practices that will deliver a more efficient upgrade experience.

vCenter server compatibility

In a multi-cluster environment, vCenter server must be running the version equal to, or greater than the version to be installed on any of the hosts for the clusters it manages. Ensuring that vCenter server is always running the very latest edition will guarantee compatibility among all potential host versions running in a multi-cluster arrangement, and introduce enhancements to vCenter that independent from the clusters it is managing.

Recommendation: Periodically check that vCenter is running the latest edition. The vCenter Server Appliance Management Interface (VAMI) can be accessed using https://vCenterFQDN:5480.

Phasing in new versions of vSAN

As noted in the “Upgrading and Patching vSAN Hosts” section, vSAN is a cluster-based solution. Therefore, upgrades should be approached per cluster, not per host. With multi-cluster environments, IT teams can phase in a new version of vSAN per cluster to meet any of their own vetting, documentation, and change control practices. Similar to common practices in application maintenance, upgrades can be phased in on less critical clusters for testing and validation prior to rolling out the upgrade into more critical clusters.

FIGURE 12-5: Phasing in new versions of vSAN per cluster

Cluster update procedures are not just limited to hypervisor upgrades, but should also include firmware and drivers for NICs, storage controllers, and BIOS versions. See “Upgrading Firmware and Drivers for NICs and Storage Controllers” and “Using VUM to Update Firmware on Selected Storage Controllers” for more detail. Recommendation: Update to the very latest version available. If a cluster is several versions behind, there is no need to update the versions one at a time. The latest edition has more testing and typically brings a greater level of intelligence to the product and the conditions it runs in.

Parallel upgrades

While vSAN limits the upgrade process to one host at a time within a vSAN cluster, cluster upgrades can be performed concurrently if desired. In fact, as of vSAN 7 U1, vLCM supports up to 64 concurrent cluster update activities.  This can speed up host updates across larger data centers. Whether to update one cluster or multiple clusters at a time is at your discretion based on understanding tradeoffs and your procedural limitations.

Updating more hosts simultaneously should be factored into the vSAN cluster sizing strategy. More clusters with fewer hosts allows for more parallel remediation than fewer clusters with more hosts. For example, an environment with 280 hosts could cut remediation time in half if the design was 20 clusters of 14 hosts each, as opposed to 10 clusters of 28 hosts each.

Since a vSAN cluster is its own discrete storage system, administrators may find greater agility in operations and troubleshooting. “vSAN Cluster Design—Large Clusters Versus Small Clusters” discusses the decision process of host counts and cluster sizing in great detail.

Larger environments with multiple vSAN clusters may have different generations of hardware. Since drivers and firmware can cause issues during an update process, concurrent cluster upgrades may introduce operational challenges to those managing and troubleshooting updates. Depending on the age and type of hardware, a new version of vSAN could be deployed as a pilot effort to a few clusters individually, then could be introduced to a larger number of clusters simultaneously. Determine what level of simultaneous updates is considered acceptable for your own organization.

Recommendation: Focus on efficient delivery of services during cluster updates, as opposed to speed of update. vSAN restricts parallel host remediation. A well-designed and -operating cluster will seamlessly roll through updating all hosts in the cluster without interfering with expected service levels.

Why are vSAN clusters restricted to updating one host at a time? Limiting to a single host per cluster helps reduce the complexity of subtracting not only compute resources but storage capacity and performance. Factoring in available capacity in addition to compute resources is unique to an HCI architecture. Total available host count can also become important for some data placement policies such as an FTT=3 using mirroring, or an FTT=2 using RAID-6 erasure coding. Limiting the update process to one host at a time per cluster also helps avoid this complexity, while reducing the potential need for data movement due to resynchronization.

Summary

For data centers, the availability of services generally takes precedent over everything else. Environments consisting of multiple vSAN clusters can take advantage of its unique, modular topology by phasing in upgrades per cluster to the hypervisor, as well as any dependent hardware updates including storage controllers, NICs, and BIOS versions.

Upgrading Large vSAN Clusters

Standard vSAN clusters can range from 3 to 64 hosts. Since vSAN provides storage services per cluster, a large cluster is treated in the same way as a small cluster: as a single unit of services and management. Maintenance should occur per cluster and is sometimes referred to as a “maintenance domain.”

Upgrading vSAN clusters with a larger quantity of hosts is no different than upgrading vSAN clusters with a smaller quantity of hosts. In addition, those described in “Upgrading and Patching vSAN Hosts,” there are a few additional host upgrade considerations to be mindful of during these update procedures.

FIGURE 12-6: Visualizing the “maintenance domain” of a single large vSAN cluster

vLCM and VUM are limited to updating one host at a time in a vSAN cluster. The length of time for the cluster to complete an update is proportional to the number of hosts in a cluster. To upgrade more than one host at a time, reduce the size of the maintenance domain by creating more clusters comprising fewer hosts. This smaller maintenance domain will allow for more hosts (one per cluster) to perform parallel upgrades.

Designing an environment that has a modest maintenance domain is one of the most effective ways to improve operations and maintenance of a vSAN-powered environment. For more information on this approach, see the topic “Multi-Cluster Upgrading Strategies.”

While no more than one host per vSAN cluster can be upgraded at a time, there are some steps that can be taken to potentially improve the upgrade speed.

  • Use hosts that support the new Quick Boot feature. This can help host restart times. Since hosts in a vSAN cluster are updated one after the other, reducing host restart times can significantly improve the completion time of the larger clusters.
  • Update to vSAN 7 U1.  The host restart times in vSAN 7 U1 have improved dramatically over previous versions.
  • If a large cluster has relatively few resources used, an administrator may be able to place multiple hosts into maintenance mode safely without running short of storage and capacity resources. Updates will still occur one host at a time, but this may save some time placing the respective hosts into maintenance mode. This would only be possible in large clusters that are underused, and actual time savings may be negligible.

Recommendation: Focus on efficient delivery of services during cluster updates, as opposed to speed of update. vSAN restricts parallel host remediation of hosts. A well-designed and -operating cluster will seamlessly roll through updating all hosts in the cluster without interfering with expected service levels.

Larger vSAN clusters may better absorb reduced resources as a host enters maintenance mode for the update process. Proportionally, each host contributes a smaller percentage of resources to a cluster. Large clusters may also see slightly less data movement than much smaller clusters to comply with the “Ensure accessibility” data migration option when a host is entered into maintenance mode. For more information on the tradeoffs between larger and smaller vSAN clusters, see “vSAN Cluster Design—Large Clusters Versus Small Cluster” on core.vmware.com.

Summary

Upgrading a vSAN cluster with a larger quantity of hosts is no different than updating a vSAN cluster with a smaller quantity of hosts. Considering that the update process restricts updates to occur one host at a time within a cluster, an organization may want to revisit their current practices to cluster sizing, and how hosts themselves can be optimized by using new features such as Quick Boot while also running vSAN 7 U1 or newer.

Upgrading Firmware and Drivers for NICs and Storage Controllers

Outdated or mismatched firmware and drivers for NICs and Storage Controllers can impact VM and or vSAN I/O handling. While VUM handles updates of firmware and drivers for a limited set of devices, firmware and driver updates remain a largely manual process.  Whether installed directly on an ESXi server from the command line or deployed using VUM, ensure the correct firmware and drivers are installed, remain current to the version recommended, and are a part of the cluster lifecycle management process.

vLCM strives to simplify the coordination of firmware and driver updates for select hardware.  It has a framework to coordinate the fetching of this software from the respective vendors for the purposed of building a single desired state image for the hosts.  As of vSAN 7 U1, three vendors provide the Hardware Support Manager (HSM) plugin that helps coordinate this activity. Those servers vendors are Dell, HPE, and Lenovo.  VUM was unable to do any type of coordination like this.

For environments still running VUM, it is recommended to verify that vSAN supports the software and hardware components, drivers, firmware, and storage I/O controllers that you plan on using. Supported items are listed on the VMware Compatibility Guide.

Summary

When it comes to vSAN firmware and drivers, consistency and supportability are critical. Always reference the vSAN Compatibility Guide for guidance on specific devices. Also, whether updating via command line or using VUM, be sure to maintain version consistency across the cluster.  vLCM will make this operational procedure easier for eligible servers that support this new framework.

Update Firmware on Selected Storage Controllers

For versions of vSAN prior to vSAN 7, VUM was the primary delivery method for updating vSphere and vSAN clusters. VUM centralizes the updating process, and vSAN’s integration with VUM allows updates to be coordinated with the HCL and the vSphere Release Catalog so that it only applies the latest version of vSAN that is compatible with the hardware.

FIGURE 12-7: Using VUM to update firmware on selected storage controllers prior to vSAN 7.

VUM handled updates of firmware and drivers for a limited set of devices: “I/O Controllers that vSAN Supports Firmware Updating.” Note that this is no longer supported on vSphere 7 as the firmware tools are no longer supported by the provider.

Note that vSphere 7 introduced the new vSphere Lifecycle Manager (vLCM) that will be the eventual replacement to VUM. For vSphere/vSAN 7 based environments, both Lifecycle management methods are available, but one method must be chosen on a per cluster basis. More operational guidance on using vLCM in a production environment will be provided at a later date.

Recommendation: Once updated to vSAN 7 U1, consider using vLCM for a more comprehensive lifecycle management mechanism

Summary

As the method of updating hosts move to a more sophisticated solution (vLCM), legacy methods and tools may no longer be a supported workflow.

Section 13: vSAN Capacity Management

Observing Storage Capacity Consumption Over Time

An increase in capacity consumption is a typical trend that most data centers see, regardless of the underlying storage system used. vSAN offers a number of different ways to observe changes in capacity. This can help with understanding the day-to-day behavior of the storage system, and can also help with capacity forecasting and planning.

Options for visibility

Observing capacity changes over a period of time can be achieved in two ways: vCenter and vRealize Operations.

FIGURE 13-1: Displaying capacity history for a vSAN cluster in vCenter and in vRealize Operations (vR Ops)

Both provide the ability to see capacity usage statistics over time and the ability to zoom into a specific time window. Both methods were designed for slightly different intentions, and have different characteristics.

Capacity history in vCenter:

  • Natively built into the vCenter UI and easily accessible.
  • Can show a maximum of a 30-day window.
  • Data in performance service retained for 90 days, and this retention period is not guaranteed.
  • Data will not persist if vSAN performance service is turned off then back on.

Capacity history in vRealize Operations

  • Much longer capacity history retention periods, per configuration of vRealize Operations.
  • While vRealize Operations requires the vSAN performance service to run for data collection, the capacity history will persist if the vSAN performance service is turned off then back on.
  • Able to correlate with other relevant cluster capacity metrics, such as CPU and memory capacity.
  • Can view aggregate vSAN cluster capacity statistics.
  • Breakdowns of capacity usage with and without DD&C.
  • Requires vRealize Operations Advanced licensing or above.

The vSAN capacity history in vCenter renders the DD&C ratio using a slightly different unit of measurement than found in the vSAN capacity summary in vCenter and in vRealize Operations. The capacity summary in vCenter and vRealize Operations displays the savings as a ratio (e.g., 1.96x) whereas the vSAN capacity history renders it at a percentage (e.g., 196%). Both are accurate.

Also note that vCenter’s UI simply states, “Deduplication Ratio.” The number presented actually represents the combined savings from DD&C.

Recommendation: Look at the overall capacity consumed after a storage policy change, rather than simply a DD&C ratio. Space efficiency techniques like erasure codes may result in a lower DD&C ratio, but actually increase the available free space. For more information on this topic, see “Analyzing Capacity Utilization with vRealize Operations” in the operations guidance for vRealize Operations and Login Insight in vSAN Environments on core.vmware.com.

Summary

Capacity usage and history can be easily found in both vCenter and vRealize Operations. An administrator can use one or both tools to gain the necessary insight for day-to-day operations, as well as capacity planning and forecasting.

Observing Capacity Changes as a Result of Storage Policy Adjustments or EMM Activities

Some storage policy definitions will affect the amount of storage capacity consumed by the objects (VMs, VMDKs, etc.) that are assigned the policy. Let’s explore why this happens, and how to understand how storage capacity has changed due to a change of an existing policy, or assignment of VMs to a new policy.

Policies and their impact on consumed capacity

vSAN is unique when compared to other traditional storage systems: it allows configuring levels of resilience (e.g., FTT) and the data placement scheme (RAID-1 mirroring or RAID-5/6 erasure coding) used for space efficiency. These configurations are defined in a storage policy and assigned to a group of VMs, a single VM, or even a single VMDK.

FIGURE 13-2: Understanding how a change in storage policy will affect storage capacity

Changes in capacity as a result of storage policy adjustments can be temporary or permanent.

  • Temporary space is consumed when a policy changes from one data placement approach to another. It builds a new copy (known as resynchronization) of that data to replace the old copy and comply with the newly assigned policy. (VM using a RAID-1 mirror to a RAID-5 erasure code would result in space used to create a new copy of the data using a RAID-5 scheme.) Once complete, the copy of the data under the RAID-1 mirror is deleted, reclaiming the temporary space used for the change. See the topic “Storage Policy Practices to Improve Resynchronization Management in vSAN” for a complete list of storage policies that impact data placement.
  • Permanent space is consumed when applying a storage policy using a higher FTT level (FTT=1 to FTT=2), or from an erasure code to a mirror (e.g., RAID-5 to RAID-1). The effective capacity used occurs after the change in policy has been completed (using temporary storage capacity), and remains for as long as that object is assigned to the given storage policy. The amount of temporary and permanent space consumed for a storage policy change is a reflection of how many objects are changed at the same time, and the respective capacity used for those objects. The temporary space needed is the result of resynchronizations. See the topic “Storage Policy Practices to Improve Resynchronization Management in vSAN” for more information.

Due to the prescriptive nature of storage policies, vSAN presents the raw capacity provided by the datastore, as observed in vCenter, vRealize Operations, and PowerCLI.

Estimating usage

The vSAN performance service provides an easy-to-use tool to help estimate free usable capacity given the selection of a desired policy. Simply select the desired storage policy, and it will estimate the free amount of usable capacity with that given policy. It does not account for the free space needed for slack space as recommended by VMware.

FIGURE 13-3: The free capacity with policy calculator in the vSAN UI found in vCenter

Observing changed usage as a result of storage policy changes

There are multiple options for providing visibility into storage capacity changes. See the topic “Observing Storage Capacity Consumption Over Time” for more information. The following illustrates how observing capacity changed via storage policy changes is achieved by using vCenter and vRealize Operations.

In this example, a group of VMs using a storage policy using an FTT=1 via a RAID-1 mirror were changed to another storage policy using an FTT=1 via a RAID-5 erasure coding scheme. In vCenter, highlighting a vSAN cluster and selecting Monitor → vSAN → Performance → Backend will reveal the resynchronization activity that has occurred as a result of the policy change, as shown below.

FIGURE 13-4: Observing resynchronization I/O activity as a result of a change in storage policies

When looking at the capacity history in vCenter, the policy change created a temporary use of more space to build the new RAID-5 based objects. Once the resynchronization is complete, the old object data is removed. DD&C begins to take effect, and free capacity is reclaimed. FIGURE 13-5 below shows how this is presented in vCenter.

FIGURE 13-5: Using vCenter to observe cluster capacity use as a result of a resynchronization event

The Cluster Utilization widget in the vSAN capacity overview dashboard found in vRealize Operations shows the same results. vRealize Operations will offer additional details via context sensitive “sparklines” that will give precise breakdowns of DD&C savings and storage use with and without DD&C. FIGURE 13-6 below shows how this is presented in vRealize Operations.

FIGURE 13-6: Using vRealize Operations to observe cluster capacity use as a result of a resynchronization event Note that different views may express the same data differently due to three reasons:

  • Limits on the window presented on the X axis
  • Different values on the Y axis
  • Different scaling for X and Y values

This is the reason why the same data may visually look different, even though the metrics are consistent across the various applications and interfaces.

Recommendation: Look at the overall capacity consumed after a storage policy change, rather than simply a DD&C ratio. Space efficiency techniques like erasure codes may result in a lower DD&C ratio, but may actually improve space efficiency by reducing consumed space. 

Summary

Storage policies allow an administrator to establish various levels of protection and space efficiency across a selection of VMs, a single VM, or even a single VMDK. Assigning different storage policies to objects impacts the amount of effective space they consume across a vSAN datastore. Both vCenter and vRealize Operations provide methods to help the administrator better understand storage capacity consumption across the vSAN cluster.

Estimating Approximate “Effective” Free/Usable Space in vSAN Cluster

With the ability to prescriptively assign levels of protection and space efficiency through storage policies, the amount of capacity a given VM consumes in a vSAN cluster is subject to the attributes of the assigned policy. While this offers an impressive level of specificity for a VM, it can make estimating free or usable capacity of the VM more challenging. Recent editions of vSAN offer a built-in tool to assist with this effort.

The vSAN performance service provides an easy-to-use tool to help estimate available free usable capacity given the selection of a desired policy. Simply select the desired storage policy, and it will estimate the free amount of usable capacity with that given policy.

FIGURE 13-7: The free capacity with policy calculator in the vSAN UI found in vCenter

The tool provides a calculation for only the free raw capacity remaining. Capacity already consumed is not accounted for in this estimating tool. The estimator looks at the raw capacity remaining, and then applies the traits of the selected policy to determine the effective amount of free space available. Note that it does not account for the free space needed for slack space as recommended by VMware.

For more information, see the topic “Observing Capacity Changes as a Result of Storage Policy Adjustments.”

Recommendation: If you are trying to estimate the free usable space for a cluster knowing that multiple policies will be used, select the policy used that is the least space efficient. For example, if an environment will run a mix of FTT=1 protected VMs, but some use policies with RAID-1, while others use policies with the more space-efficient RAID-5, select the RAID-1 policy in the estimator to provide a more conservative number.

Summary

Due to the use of storage policies and the architecture of vSAN, understanding free usable capacity is different than a traditional architecture. The estimator provided in the vSAN capacity page helps provide clarity on the effective amount of capacity available under a given storage policy.

Section 14: Monitoring vSAN Health

Remediating vSAN Health Alerts

vSAN health check UI provides an end-to-end approach to monitoring and managing the environment. Health check alerts are indicative of an unmet condition or deviation from expected behavior.

The alerts can typically stem out of:

  • Configuration inconsistency
  • Exceeding software/hardware limits
  • Hardware incompatibility
  • Failure conditions

The ideal methodology to resolve a health check is to correct the underlying situation. An administrator can choose to suppress the alert in certain situations.

For instance, a build recommendation engine health check validates whether the build versions are the latest for the given hardware (as per the VMware Compatibility Guide). Some environments are designed to stay with the penultimate release as a standard. You can suppress the alert in this case. In general, you should determine the root cause and fix the issue for all transient conditions. The health check alerts that flag anomalies for intended conditions can be suppressed.

Each health check mainly has the following two sections:

  • Current state: Result of the health check validation against the current state of the environment
  • Info: Information about the health check and what it validates

FIGURE 14-1: The details available within a health check alert

The “Info” section explains the unmet condition and the ideal state. Clicking on the “Ask VMware” button triggers a workflow to a Knowledge Base article that describes the specific health check in greater detail, the probable cause, troubleshooting, and remediation steps.

Recommendation: Focus remediation efforts on addressing the root cause. Ensure sustained network connectivity for up-to-date health checks.

Summary

vSAN health helps ensure optimal configuration and operation of your HCI environment to provide the highest levels of availability and performance.

Checking Object Status and Health When There Is a Failure in a Cluster

An object is a fundamental unit in vSAN around which availability and performance are defined. This is done by abstracting the storage services and features of vSAN and applying them at an object level through SPBM.

At a high level, an object’s compliance with the assigned storage policy is enough to validate its health. In certain scenarios, it may be necessary to inspect the specific state of the object, such as in a failure.

In the event of a failure, ensure all objects are in a healthy state or recovering to a healthy state. vSAN object health check provides a cluster-wide overview of the object’s health and its respective states. This health check can be accessed by clicking on the vSAN cluster and viewing the Monitor tab. The data section comprises information specific to the object health check.

FIGURE 14-2: Viewing object health with the vSAN health checks

On failure detection, vSAN natively initiates corrective action to restore a healthy state. This, in turn, reinstates the object’s compliance with the assigned policy. The health check helps quickly assess the impact and validates that restoration is in progress. In certain cases, based on the nature of failure and the estimated restoration time, an administrator may choose to override or expedite the restoration. More information is available on failure handling in vSAN.

Recommendation: SPBM governs how, where, and when an object is to be rebuilt. It is generally not required or recommended to override this unless warranted.

Summary

It is not uncommon for components such as disks, network cards, or server hardware to fail. vSAN has a robust and highly resilient architecture to tolerate such failures by distributing the objects across a cluster. vSAN object health

Viewing vSAN Cluster Partitions in the Health Service UI

vSAN inherently employs a highly resilient and distributed architecture. The network plays an important role in accommodating this distributed architecture.

Each host in the vSAN cluster is configured with a VMkernel port tagged with vSAN traffic and should be able to communicate with other hosts in the cluster. If one or more hosts are isolated, or not reachable over the network, the objects in the cluster may become inaccessible. To restore, resolve the underlying network issue.

There are multiple network-related validations embedded as part of the health service to detect and notify when there is an anomaly. These alerts ought to be treated with the highest priority, specifically the vSAN cluster partition. Health service UI can provide key diagnostic information to help ascertain the cause.

Accessing the health service UI

The vSAN Skyline Health service UI provides a snapshot of the health of the vSAN cluster and highlights areas needing attention. Each health check validates whether a certain condition is met. It also provides guidance on remediation when there is a deviation from expected behavior. The UI can be accessed by clicking on the vSAN cluster and viewing the Monitor tab. The specific “vSAN cluster partition” health check is a good starting point to determine the cluster state. A partition ID represents the cluster as a single unit. In an ideal state, all hosts reflect the same partition ID. Multiple subgroups within the cluster indicate a network partition requiring further investigation. At a micro level, this plausibly translates to an object not having access to all
of its components.

FIGURE 14-3: Identifying unhealthy network partitions in a vSAN cluster

The network section in the health service UI has a plethora of network tests that cover some basic yet critical diagnostics, such as ping, MTU Check, Unicast connectivity, and host connectivity with vCenter. Each health check can systematically confirm or eliminate a layer in the network as the cause.

Recommendation: As with any network troubleshooting, a layered methodology is strongly recommended (top-down or bottom-up).

Summary

With vSAN-backed HCI, data availability is directly dependent on the network, unlike traditional storage. Built-in network-related health checks aid in early detection and diagnosis of network-related issues.

Verifying CEIP Activation

Customer Experience Improvement Program (CEIP) is a phone-home system that aids in collecting telemetry data, shipping it over to VMware’s Analytics Cloud (VAC) at regular intervals. The feature is enabled by default in the recent releases. The several benefits |in joining CEIP are described here: “vSAN Support Insight.” There are a few validation steps required to ensure that the telemetry data is shipped and available in VAC.

The verification process is twofold:

  • Ensuring the feature is enabled in vCenter in the environment
  • Ensuring the external network allows communication from vCenter to VAC

The first step is fairly straightforward and in the purview of a vSphere admin to log in and check from the vCenter. The second step has a dependency on the external network and security setup.

Validation in vCenter

The new HTML5 client has an improved UI categorization of the tabs. CEIP is categorized under Monitor → vSAN → Support. The following screenshot depicts a representation while it is enabled.

FIGURE 14-4: Checking the status of CEIP

Alternatively, this can also be verified by traversing to Menu → Deployment → Customer Experience Improvement Program.

External network and security validation

For CEIP to function as designed, the vCenter server needs to be able to reach VMware portals: vcsa.vmware.com and vmware.com. The network, proxy server (if applicable), and firewall should allow outbound traffic from vCenter to the portals above. The network validation is made easy with an embedded health check, “Online health connectivity,” which validates internet connectivity from vCenter to the VMware portal. Alternatively, this can also be verified manually through a secure shell from vCenter.

Sample command and output (truncated for readability):

vcsa671 [ ~ ]$ curl -v https://vcsa.vmware.com:443

* Rebuilt URL to: https://vcsa.vmware.com:443

* Connected to vcsa.vmware.com (10.113.62.242) port 443 (#0)

Recommendation: Ensure CEIP is enabled to benefit from early detection of issues, align with best practices, and faster resolution times.

Summary

CEIP aids in relaying critical information between the environment and VMware Analytics Cloud that can help improve the product experience. It is enabled by default and an embedded health check can be used to periodically monitor connectivity between Center and VMware portals.

Section 15: Monitoring vSAN Performance

Navigating Across the Different Levels of Performance Metrics

The vSAN performance service provides storage-centric visibility to a vSAN cluster. It is responsible for collecting vSAN performance metrics and presents them in vCenter. A user can set the selectable time window from 1 to 24 hours, and the data presented uses a 5-minute sampling rate. The data may be retained for up to 90 days, although the actual time retained may be shorter based on environmental conditions.

Levels of navigation

The vSAN performance service presents metrics at multiple locations in the stack. As shown in FIGURE 15-1, vSAN-related data can be viewed at the VM level, the host level, the disk and disk group level, and the cluster level. Some metrics such as IOPS, throughput, and latency are common at all locations in the stack, while more specific metrics may only exist at a specific location, such as a host. The performance metrics can be viewed at each location simply by highlighting the entity (VM, host, or cluster) and clicking on
Monitor → vSAN → Performance.

FIGURE 15-1: Collects and renders performance data at multiple levels

The metrics are typically broken up into a series of categories, or tabs at each level. Below is a summary of the tabs that can be found at each level.

  • VM Level
    • VM: This tab presents metrics for the frontend VM traffic (I/Os to and from the VM) for all VMs on the selected host.
    • Virtual disk: This presents metrics for the VM, broken down by the individual VMDK and especially helpful for VMs with multiple VMDKs.
  • Host Level
    • VM: This tab presents metrics for the frontend VM traffic (I/Os to and from the VM) for all VMs on the selected host.
    • Backend: This tab presents metrics for all backend traffic, as a result of replica traffic and resynchronization data.
    • Disks: This tab presents performance metrics for the selected disk group, or the individual devices that compose the disk group(s) on a host.
    • Physical adapters: This tab presents metrics for the physical uplink for the selected host.
    • Host network: This tab presents metrics for the specific or aggregate VMkernel ports used on a host.
    • iSCSI: This tab presents metrics for objects containing data served up by the vSAN iSCSI service.
  • Cluster Level
    • VM: This tab presents metrics for the frontend VM traffic (I/Os to and from the VM) for all VMs living on the selected host.
    • Backend: This tab presents metrics for all backend traffic as a result of replica traffic and resynchronization data.
    • iSCSI: This tab presents metrics for objects containing data served up by the vSAN iSCSI service.

Typically, the cluster level is an aggregate of a limited set of metrics, and the VM level is a subset of metrics that pertain to only the selected VM. The host level is the location at which there will be the most metrics, especially as it pertains to the troubleshooting process. A visual mapping of each category can be found in FIGURE 15-2.

FIGURE 15-2: Provides vSAN-specific metrics and other vSphere-/ESXi-related metrics

Note that the performance service can only aggregate performance data up to the cluster level. It will not be able to provide aggregate statistics from multiple vSAN clusters. vRealize Operations can achieve that result. Which are most important? They all relate to each other in some form or another. The conditions of the environment and the root cause of a performance issue will dictate which metrics are more significant than another. For more general information on troubleshooting vSAN, see the topic “Troubleshooting vSAN Performance” in this document. For a more detailed understanding of troubleshooting performance as well as definitions to specific metrics found in the vSAN performance service, see “Troubleshooting vSAN Performance” on core.vmware.com.

Recommendation: If you need longer periods of storage performance retention, use vRealize Operations. The performance data collected by the performance service does not persist after the service has been turned off then back on. vRealize Operations fetches the performance data directly from the vSAN performance service, so the data will be consistent yet remain intact if the performance service needs to be disabled and enabled.

Summary

The vSAN performance service is an extremely powerful feature that, in an HCI architecture, takes the place of storage array metrics typically found on in a three-tier architecture. Since vSAN is integrated directly into the hypervisor, the performance service offers metrics at multiple levels in the stack, and can provide outstanding levels of visibility for troubleshooting and further analysis.

Troubleshooting vSAN Performance

Troubleshooting performance issues is a common challenge for many administrators, regardless of the underlying infrastructure and topology. A distributed storage platform like vSAN also introduces other elements that can influence performance, and the practices for troubleshooting should accommodate those. Use the metrics in the vSAN performance service to isolate the sources of the performance issue.

The performance troubleshooting workflow

The basic framework for troubleshooting performance in a vSAN environment is outlined in FIGURE 15-3. Each of the five steps is critical to identifying the root cause properly and mitigating it systematically.

FIGURE 15-3: The troubleshooting framework

Troubleshooting vSAN Performance” on core.vmware.com contains a more complete understanding of the performance troubleshooting process.

The order of review for metrics

Once steps 1–3 have been completed, begin using the performance metrics. The order in which the metrics are viewed can help decipher what level of contention may be occurring. FIGURE 15-4 shows the order in which to better understand and isolate the issue; it is the same order used in “Appendix C: Troubleshooting Example” in “Troubleshooting vSAN Performance.”

FIGURE 15-4: Viewing order of performance metrics

Here is a bit more context to each step:

  • View metrics at the VM level to confirm unusually high storage related latency. This must be verified that there is in fact storage latency as seen by the guest VM.
  • View metrics at the cluster level to provide context and look for other anomalies. This helps identify potential “noise” coming from somewhere else in the cluster.
  • View metrics on the host to isolate the type of storage I/O associated with the latency.
  • View metrics on the host, looking at the disk group level to determine type and source of latency.
  • View metrics on the host, looking at the host network and VMkernel metrics to determine if the issue is network related.

Steps 3–5 assume that one has identified the hosts where the VM’s objects reside. Host-level metrics should look at only the hosts where the objects reside for the particular VM in question. For further information on the different levels of performance metrics in vSAN, see the topic “Navigating Across the Different Levels of Performance Metrics.”

Viewing metrics at the disk group level can provide some of the most significant insight of all metrics . However, they shouldn’t be viewed in complete isolation, as there will be influencing factors that affect these metrics.

Recommendation: Be diligent and deliberate when changing your environment to improve performance. Changing multiple settings at once, overlooking a simple configuration issue, or not measuring the changes in performance can often make the situation worse, and more complex to resolve.

Summary

While tracking down the primary contributors to performance issues can be complex, there are practices to help simplify this process and improve the time to resolution. This information, paired with the “Troubleshooting vSAN Performance” guide on core.vmware.com, is a great start to better understanding how to diagnose and address performance issues in your own vSAN environment.

Monitoring Resynchronization Activity

Resynchronizations are a common activity that occur in a vSAN environment. They are simply the process of replicating the data across the vSAN cluster so it adheres to the conditions of the assigned storage policy that determines levels of resilience, space efficiency, and performance. Resynchronizations occur automatically and are the result of policy changes to an object, host or disk group evacuations, rebalancing of data across a cluster, and object repairs should vSAN detect a failure condition.

Methods of visibility

Resynchronization visibility occurs in multiple ways: through vCenter, vRealize Operations, and PowerCLI. The best method depends on what you attempt to view, and familiarity with the tools available.

Viewing resynchronizations in vCenter

Resynchronization activity can be found in vCenter in two different ways:

  • At the cluster level as an enumerated list of objects currently being resynchronized
  • At the host level as time-based resynchronization metrics for IOPS, throughput, and latency

Find the list of objects resynchronizing in the cluster by highlighting the cluster and clicking on Monitor → vSAN → Resyncing Objects, as shown in FIGURE 5-5.

FIGURE 15-5: Viewing the status of resynchronization activity at the cluster level

Find time-based resynchronization metrics by highlighting the desired host and clicking on Monitor → vSAN → Performance → Backend, as shown in FIGURE 15-6.

FIGURE 15-6: A breakdown of resynchronization types found in the host-level view of the vSAN performance metrics

Recommendation: Discrete I/O types can be “unticked” in these time-based graphs. This can provide additional clarity when deciphering the type of I/O activity occurring at a host level.

Viewing resynchronizations in vRealize Operations

vRealize Operations 7.0 and later have all new levels of visibility for resynchronizations in a vSAN cluster. It can be used to augment the information found in vCenter, as the resynchronization intelligence found in vRealize Operations is not readily available within the vSAN performance metrics found in Center.

vRealize Operations can provide an easy-to-read resynchronization status indicator for all vSAN clusters managed by the vCenter server. FIGURE 15-7 displays an enumerated list of all vSAN clusters managed by the vCenter server, and the current resynchronization status.

FIGURE 15-7: Resynchronization status of multiple vSAN clusters

vRealize Operations provides burn down rates for resynchronization activity over time. Measuring a burn down rate helps provide the context in a way that can be difficult to understand using simple resynchronization throughput statistics. A burn down graph for resynchronization activity provides an understanding of the extent of data queued for resynchronization, how far along the process is, and a trajectory toward completion. Most importantly, it measures this at the cluster level, eliminating the need to gather this data per host to determine the activity across the entire cluster.

vRealize Operations renders resynchronization activity in one of two ways:

  • Total objects left to resynchronize
  • Total bytes left to resynchronize

A good example of this is illustrated in a simple dashboard shown in FIGURE 15-8, where several VMs had their storage policy changed from using RAID-1 mirroring to RAID-5 erasure coding.

FIGURE 15-8: Resynchronization burn down rates for objects, and bytes remaining

When paired, the “objects remaining” and “bytes left” can help us understand the correlation between the number of objects to be resynchronized, and the rate at which the data is being synchronized. Observing rates of completion using these burn down graphs helps better understand how Adaptive Resync in vSAN dynamically manages resynchronization rates during periods of contention with VM traffic. These charts are easily combined with VM latency graphs to see how vSAN helps prioritize different types of traffic under these periods of contention.

Burn down graphs can provide insight when comparing resynchronization activities at other times, or in other clusters. For example, FIGURE 15-9 shows burn down activity over a larger time window. We can see that the amount of activity was very different during the periods that resynchronizations occurred.

FIGURE 15-9: Comparing resynchronization activity—viewing burn down rates across a larger time window

The two events highlighted in FIGURE 15-9 represent a different quantity of VMs that had their policies changed. This is the reason for the overall difference in the amount of data synchronized.

Note that, as of vRealize Operations 7.5, the visibility of resynchronization activity is not a part of any built-in dashboard. But you can easily add this information by adding widgets to a new custom dashboard or an existing dashboard.

Viewing resynchronizations in PowerCLI

Resynchronization information can be gathered at the cluster level using the following PowerCLI command:

Get-VsanResyncingComponent -Cluster (Get-Cluster -Name “Clustername”)

Additional information will be shown with the following:

Get-VsanResyncingComponent -Cluster (Get-Cluster -Name “Clustername”) |fl

See the “PowerCLI Cookbook for vSAN” for more PowerCLI commands and how to expose resynchronization data.

Summary

Resynchronizations ensure that data stored in a vSAN cluster meets all resilience, space efficiency, and performance requirements as prescribed by the assigned storage policy. They are a normal part of any properly functioning vSAN environment and can be easily viewed using multiple methods.

About the Authors

This documentation was a collaboration across the vSAN Technical Marketing team. The guidance provided is the result of extensive product knowledge and interaction with vSAN; discussions with vSAN product, engineering, and support teams; as well as scenarios commonly found in customer environments.

Filter Tags

  • Intermediate
  • Operational Tutorial
  • Document
  • vSAN 6.7