Performance Recommendations for vSAN ESA

Introduction

The vSAN Express Storage Architecture (ESA) introduced in vSAN 8 and enhanced in vSAN 8 U1 delivers new capabilities that are unachievable in the vSAN Original Storage Architecture (OSA).  As a result, guidance to achieve optimal performance in the ESA is different than recommendations found for the OSA.  The following is a collection of recommendations for those who wish to achieve the highest levels of performance for workloads in a vSAN ESA cluster. Be sure to revisit this resource as our recommendations for ESA continue to evolve.

ReadyNode Host Specifications and Sizing

Use the vSAN ReadyNode sizer paired with imported usage data to determine correct ESA ReadyNode class and final specification.  ReadyNodes are categorized into classes that reflect their performance capabilities.  AF-2, AF-4, AF-6 and AF-8 reflect different potential performance capabilities, where a higher number reflects higher potential performance.  Higher ReadyNode classes will specify network connectivity minimums that are higher as the number of storage devices increase.  Since vSAN ESA has near linear scalability in performance as the quantity of storage devices increase, the increase in network throughput simply ensures that the network will not be an overwhelming bottleneck.  See "vSAN ESA ReadyNode Hardware Guidance" for more information before using the vSAN ReadyNode Sizer.

Use higher performance hardware to achieve faster storage performance.  With vSAN, performance of a VM is derived from the host hardware and the network used to interconnect the hosts in the vSAN cluster, not the cluster host count.  While increasing the host count of a cluster will increase the aggregate IOPS and bandwidth achieved by the cluster, in most cases it will not improve the discrete performance capabilities observed by the VM.  VM performance will be a function of the host hardware and network connectivity.  See the post:  "Performance Capabilities in Relation to vSAN Cluster Size" for more information.

Ensure that the “vSAN ESA” button has been selected in the vSAN ReadyNode Sizer.  The calculations for sizing and specifying ReadyNodes for ESA in the vSAN ReadyNode Sizer are different than for the OSA.  This reflects vSAN ESA’s ability to drive better performance using fewer hosts, while achieving a lower Total Cost of Ownership.

Know what can and cannot be changed in a ReadyNode certified for use with ESA.  The document "What you Can (and Cannot) Change in a vSAN ESA ReadyNode" describes what can be changed in a vSAN ReadyNode certified for use with ESA.  Note that just like storage devices, these ReadyNode specifications are prescriptive to the CPU manufacturer and CPU generation used.

Keep in general alignment with hardware recommendations for each ReadyNode profile.  vSAN ReadyNodess are designed to provide a proportionally balanced set of CPU, memory, network and storage device specifications to provide optimum performance.  If you want to increase one resource type substantially (equal to or greater than the next higher profile) from a vSAN ReadyNode profile, consider looking at the next higher ReadyNode profile available.  This will help keep your server resources proportional and deliver better performance and resource utilization.  Selecting the next higher ReadyNode profile available will also allow the cluster to more easily accommodate higher-demanding workloads in the future.  This applies to ReadyNodes certified for vSAN HCI, as well as ReadyNodes certified for vSAN Max.

Use recommended host BIOS power saving settings.  Hosts may have default power settings from the manufacturer that are non-optimal for maximum performance.  See "Performance Best Practices for VMware vSphere 8.0" for more information on recommended power management settings to achieve maximum performance.

Networking

Use fast networking.  For higher demanding workloads, 100GbE networking ensures workloads running on vSAN ESA to exploit the capabilities of high-performing NVMe devices used by ESA, under maximum load.  The use of 100GbE networking can be even more important in topologies that use vSAN’s disaggregated storage capabilities.  Dell Technologies recently published "100GbE Networking - Harness the Performance of vSAN Express Storage Architecture" that details the significant performance improvements while using ESA on 100GbE networking versus 25GbE networking.  A more extensive analysis can be found on this Dell VxRail Performance Analysis white paper.  This testing used the ESA in vSAN 8 and does not include the performance improvements found in vSAN 8 U1.

Don't let these minimum requirements, or competing solutions mislead you. Even though the networking requirements for the ESA are higher, if you are migrating production workloads from a vSAN OSA cluster to a vSAN ESA cluster, on average you will see fewer CPU and network resources used for those same workloads. This is because the vSAN ESA uses fewer CPU cycles and fewer network resources to process and store I/O when compared to the vSAN OSA.  The only reason for these higher requirements is that the ESA can deliver near-device level performance of NVMe if the workloads demand it.  A distributed storage solution that uses NVMe devices and does not require high speed networking is simply unable to exploit the full capabilities of NVMe storage devices.  For more information on bandwidth requirements, see the post “Adaptive Network Traffic Shaping for the vSAN Express Storage Architecture.”

Use higher performance hardware to achieve faster storage performance.  With vSAN, performance of a VM is derived from the host hardware and the network used to interconnect the hosts in the vSAN cluster, not the cluster host count.  While increasing the host count of a cluster will increase the aggregate IOPS and bandwidth achieved by the cluster, in most cases it will not improve the discrete performance capabilities observed by the VM.  VM performance will be a function of the host hardware and network connectivity.  See the post:  "Performance Capabilities in Relation to vSAN Cluster Size" for more information.

Use vSAN over RDMA if possible.  vSAN ESA supports RDMA over Converged Ethernet, or RoCE v2.  When deployed properly on supporting switches and NICs, vSAN over RDMA can reduce host CPU utilization and improve performance for certain workloads.  Using a more efficient protocol like RoCE v2 can be especially beneficial for the ESA, where it’s storage stack inside the host has been designed to deliver near device level rates of storage performance, and shifting the potential bottleneck to the network.  This will translate to higher IOPS and improved throughput, while reducing latency and host CPU utilization. See “vSAN RDMA Support” for more information.

Use higher bandwidth connections versus bonding.  Bonding of multiple uplinks from a host will only have an incremental increase in performance.  In other words, a single 100GbE uplink will demonstrate much higher performance than multiple bonded 25GbE links.  More information can be found at "Designing vSAN Networks - 2022 Edition - vSAN ESA."

If using ESA in a stretched cluster topology, revisit your inter-site link (ISL) capabilities.  Workloads on a vSAN ESA cluster can potentially drive more traffic than an OSA cluster if the workloads are demanding such resources.  Once an ESA stretched cluster is deployed, monitor the ISL to ensure it can meet the workload requirements.  See the post "Using the vSAN ESA in a Stretched Cluster Topology" for more information.

Monitor network connectivity and performance.  Packet loss and retransmits can have a profound effect on performance.  vSAN now has advanced, network-related performance metrics to help determine issues in network connectivity.  This can be found in vCenter Server by highlighting a vSAN host, clicking Monitor > vSAN > Performance and then selecting Physical Adapters.  Counters such as Packet loss, CRC errors, flow control, and buffer overflow rates can all be tracked and configured for alerting.  The built-in "Network Performance Test" (found by highlighting the cluster, clicking on Monitor > vSAN > Proactive Tests) as well as HCI Bench can be good way to verify consistency of each host's network connectivity prior to introducing the cluster into production.  Using a network monitoring solution for your switchgear is also advised.

Be mindful of your spine and leaf network design.  vSAN transmits its storage traffic using a common (and often shared) network fabric.  The topology and size of a cluster may change if vSAN traffic remains on the Top of Rack (ToR) leaf switches or end up traversing across the spine switches.  This may occur with using larger clusters, nested fault domains, or disaggregated vSAN topologies.  See the post "How vSAN Cluster Topologies Change Network Traffic" for more information.

Ensure switch or NIC flow control is not enabled without your knowledge.  Flow control on switches, or NPAR on host NICs may limit the amount of bandwidth offered to an uplink or a port.  If enabled, these may be unknowingly limiting the capabilities of the network, potentially hindering performance.  See the section "Non-vSAN Environment Health Checks" in the Troubleshooting vSAN Performance document for more information.

Storage Policies and Data Services

Use the Auto-Policy Management feature included in vSAN 8 U1 and later.  Introduced in vSAN 8 U1, and enhanced in vSAN 8 U2, this feature ensures that VMs using the default storage policy will be stored with optimal resilience and space efficiency that is compatible with the host count, topology and configuration of the cluster. 

Use RAID-5/6 erasure coding whenever possible.  Unlike the OSA, the vSAN ESA can deliver performance equal to or better than RAID-1 mirroring when using RAID-5/6 erasure coding.  Customers can achieve guaranteed levels of space efficiency without any compromise in performance.  And with the ESA, erasure coding can be used on as few as three hosts.  Thus, if the cluster topology supports it, all test and production workloads should use erasure coding when possible.  A stretched cluster topology will still require a RAID-1 mirror to protect data across sites, and a 2-node topology will still require a RAID-1 mirror to protect across hosts.  If either one of these topologies are using an optional secondary level of resilience, that storage policy should use erasure coding for maximum efficiency and performance. See the post "RAID-5/6 with the Performance of RAID-1 using the vSAN Express Storage Architecture" for more information.  Note that the new Auto-Policy Management Capabilities with the ESA in vSAN 8 U1 may not always use erasure coding on extremely small clusters, but a user can easily assign a custom storage policy in these conditions.

Leave data compression enabled in the ESA.  Data compression in the ESA is controllable using storage policies.  Due to the new architecture, data compression is extremely efficient and is enabled by default.  Since it compresses data at the top of the storage stack, all data being written to other hosts in the cluster is transmitted across the network in a compressed state.  This can provide a higher effective write throughput versus the same conditions but with compression not enabled..  See the section "Compression (ESA)" in the vSAN Space Efficiency Technologies guide for more information.

Leave the "Number of Disk Stripes per Object" storage policy rule to its default value.  This policy rule, known sometimes as "stripe width" attempted to improve performance by dispersing object data across more devices and hosts.  This storage policy rule has limited relevance in vSAN's new architecture.  See  the post "Stripe Width Storage Policy Rule in the vSAN ESA" for more information. 

For some distributed applications that used vSAN’s special host affinity feature, use RAID-5 instead.  Some distributed applications are responsible for their own data resilience and preferred an affinity of a VM instance to data to assist with performance.  A "Host Affinity" feature was available in past editions of the vSAN OSA under special request to achieve this colocation of VM instance and non-redundant VM data.  It introduced additional, non-standard operational procedures that complicated management.  With the vSAN ESA, one can simply use RAID-5 erasure coding.  This will offer supreme levels of performance, and in clusters of 6 hosts or larger, this will only consume an additional 25% of capacity to ensure resilience, which may be offset entirely by data compression that is enabled by default.  This approach for these applications running in a clustered powered by the ESA will deliver better performance, optimal space efficiency and operational practices that are like your other applications. 

VM Virtual Hardware and Configuration

Historically, VMware would recommend deploying multiple paravirtual SCSI adapters to work around limitations of SCSI being a single serial I/O queue. For performance, we now recommend the use of the NVMe controller. Starting with vSAN 8 U2, significant performance improvements have been made to single VMDK performance, that combined with the multiple queue capabilities of NVMe largely remove the need for "wide striping" of VMDKs and controllers.   This will provide optimal performance, with the potential of simplifying the virtual hardware configuration of the VM.  Note that the use of NVMe virtual controllers does introduce some additional considerations, as noted at:  https://knowledge.broadcom.com/external/article/311931/hot-addremove-disk-on-vnvme-controller-d.html 

For VMs continuing to use SCSI virtual controllers, you may wish to follow the traditional, historical practice of configuring multiple paravirtual SCSI adapters in a VM's virtual hardware configuration.  This helps the guest operation systems ability to queue additional I/O.  The use of multiple VMDKs using multiple paravirtual SCSI adapters in a VM’s virtual hardware configuration has been a common recommendation by Independent Software Vendors (ISV) for running their applications optimally in VMs on most storage systems.  See the "Applications" section in the Troubleshooting vSAN Performance guide for more information.

System Monitoring and Updates

Run the very latest version of vSAN ESA.  vSAN 8 U1 introduced two new capabilities that improves performance significantly from the ESA in vSAN 8.  A new adaptive write path improves performance of all workloads issuing writes using large I/O sizes, or many outstanding I/Os.  The ESA in vSAN 8 U1 also improves the parallel processing of I/Os that can improve both read and write performance for resource intensive VMs placing high demands on a single virtual disk.  See the post "Performance Improvements with the Express Storage Architecture in vSAN 8 U1" for more information.

Use Skyline Health and resolve any identified issues.  Skyline Health for vSAN offers dozens automated health checks that help ensure the health and well-being of a vSAN cluster.  It can be your first and best indicator that there is a health condition that should be addressed.  See the post "Skyline Health Scoring, Diagnostics and Remediation in vSAN 8 U1" for more information.

Maintain sufficient free capacity.  While capacity is often seen as a responsibility independent from storage performance requirements, storage systems often need a minimum amount of capacity to perform its underlying operations.  Having insufficient free capacity may impact storage performance.  Using the vSAN ReadyNode sizer can help assist with this effort.  Capacity overheads and free space recommendations for the ESA are very similar to the OSA.  See the post "Capacity Overheads for the ESA in vSAN 8" for more information.  If you are refreshing an existing vSAN cluster, note the special guidance provided in the post "Calculating Capacity Needs when Refreshing Existing vSAN Clusters"

Miscellaneous

The following are a collection of useful links on this topic.

Filter Tags

Storage vSAN vSAN 8 Document Best Practice Deployment Considerations Intermediate Optimize