December 21, 2022

Using the vSAN ESA in a Stretched Cluster Topology

Why is the ESA so much better in a stretched cluster topology?  Read on to find out!

With performance being one of the key benefits of the new vSAN Express Storage Architecture (ESA), will the ESA provide any benefit to stretched clusters? 

Stretched clusters are a popular topology for our customers in Europe, so it makes sense that this question came up often at VMware Explore Europe earlier this year.  Let's look at why the ESA in vSAN 8 can be a great fit for a stretched cluster topology.

More Efficient Network Traffic across the ISL

To ensure data resilience across sites, a stretched cluster must send writes across an inter-site link (ISL).  But it must do so efficiently as the ISL is limited by bandwidth and latency.  To address this, vSAN uses a "proxy owner" to minimize the use of the ISL.  It lives on the site opposite the object owner and will receive a cloned write from the object owner so that it can process subsequent I/O within the site. 

In the vSAN ESA, the method described above remains the same, but with a twist.  The vSAN ESA compresses the data once before cloning that write across the ISL.  Reducing the data sent across the ISL between the two sites for the same workloads when using the ESA will provide lower write latency for your VMs previously caused by ISL contention.  This new approach will also provide a higher effective write throughput across the ISL, as reducing the amount of data transmitted can have the same result as increasing the bandwidth of an ISL.

vSAN ESA compression across ISL

Figure 1.  More efficient network traffic across the ISL when using the vSAN ESA in a stretched cluster.

The ESA also prepares the data differently than the OSA.  It will coalesce many small writes in memory before persisting the data to disk.  Fewer I/Os that are larger will reduce the number of write operations, and provide a more uniform method of data delivery across the ISL.  This more efficient approach reduces resource utilization and can improve the performance consistency of VMs in a stretched cluster.

Improved Performance when using Secondary Levels of Resilience

Secondary levels of resilience in a stretched cluster allow data to remain available in the event of a site outage, and one or more subsequent failures within a site.  Customers can apply this to VMs easily through a storage policy.

When using the vSAN Original Storage Architecture (OSA), VM performance could be affected if the VM used a storage policy that applied a secondary level of resilience to the VM - especially RAID-5/6 erasure coding.  This was primarily due to the read-modify-write step (shown in Figure 4 of the post:  "Performance with vSAN Stretched Clusters") that occurred when committing that RAID-5/6 stripe with parity at each site before it could send the write acknowledgment back to the guest VM.  Even if the ISL was sized sufficiently, it is the read-modify-write steps that increase I/O amplification, network round-trips, and serialization.  This could affect the latency of the guest VM.

The ESA in vSAN 8 writes data using fewer resources.  It eliminates the read-modify-write step when writing data resiliently using erasure coding.  As described in the post:  "RAID-5/6 with the Performance of RAID-1 using the vSAN Express Storage Architecture" incoming write I/Os from the guest will be coalesced and briefly written as a 2-way or 3-way mirror (depending on the storage policy) before sending the write acknowledgment to the VM.  This approach is also used in a stretched cluster, where the write operation is cloned to the proxy owner, and the 2-way or 3-way mirror occurs on each site before it writes it as an efficient, fully aligned, full stripe write.  Avoiding the read-modify-write sequence found in the OSA can lower the write latency for your VMs while freeing up additional host resources.  It also simplifies storage policy management since there is no performance penalty when using RAID-5/6 erasure codes.

High performance secondary levels of resilience in vSAN ESA

Figure 2.  Site mirroring with high performance secondary level of resilience in a stretched cluster.

Improved Space Efficiency in Small to Medium-sized Stretched Clusters

As noted above, secondary levels of resilience in stretched clusters offer data resilience across sites, and within each site.  For smaller stretched clusters where perhaps there are just 3 hosts in each site (6 data hosts total), customers were limited to using a RAID-1 mirror for the secondary level of resilience because using RAID-5 required 4 hosts in each site.  In these situations, capacity utilization could be strained because a RAID-1 mirror consumes more capacity than erasure coding.

Thanks to "Adaptive RAID-5 Erasure Coding with the Express Storage Architecture in vSAN 8" customers can use RAID-5 with as few as 3 hosts.  This means that for a stretched cluster comprising just 3 hosts in each of the two data sites, customers can ensure data is stored using this secondary level of resilience but with significant levels of space efficiency.   For stretched clusters that use 6 hosts or more at each site, vSAN will change this RAID-5 erasure coding to use a highly space-efficient 4+1 scheme.  See Figure 3 for space efficiency and host count comparisons.

Comparing site level protection

Figure 3.  Comparing how adaptive RAID-5 in the vSAN ESA can benefit stretched clusters.

Improved Performance when using Encryption

While the vSAN OSA did an admirable job of maintaining levels of Performance when using vSAN Encryption Services, there were opportunities to improve performance and reduce the overhead associated with encryption services - especially in stretched clusters.

The vSAN ESA does encryption better.  While the ESA still offers at-rest and in-transit encryption as two independent services, the ESA was designed to process more data using fewer resources, and this is exactly the case with encryption.  Unlike the OSA, the ESA encrypts the data once at the top of the vSAN stack and does not perform any decrypt and re-encrypt steps.  This can reduce resource overheads on the hosts, as well as the potential for reduced latency for VMs.

Improved Resilience for Planned or Unplanned Maintenance Events

The OSA in vSAN used the construct of a disk group to provide storage resources.  This served as an effective way to provide reasonable levels of performance for vSAN hosts largely populated by value-based SAS or SATA devices, but given that the failure of a disk group caching device meant that the data in the entire disk group would need to be resynchronized elsewhere, this was not ideal for many of our customer's environments. 

The ESA takes a hardware-optimized approach in its design.  As a result, pairing high-performance NVMe devices with the architectural changes in the ESA to exploit the full power of these devices allows the vSAN ESA to do away with the limitations of disk groups.  Not only can this make the ESA simpler to administer, but it also reduces the failure/maintenance domain down to a discrete storage device.  This benefits every type of vSAN topology, but can significantly reduce resynchronization that occurs between sites through the ISL when secondary levels of resilience are not used. 

Since the new architecture only supports high-performing NVMe storage devices certified for use with the ESA and does not impart any performance penalty when using RAID-5/6 erasure codes, this can make stretched cluster design and sizing easier.  With the OSA it was not uncommon to see a configuration that used a very capable ISL connecting hosts in a stretched cluster, but the hosts were using value-based, low-performing SATA flash devices and RAID-5/6 erasure coding for secondary levels of resilience.  Now, much of that focus can be centered on network connectivity and cluster capacity.

Recommendation:  Use the vSAN ReadyNode Sizer for all of your sizing needs, as it now supports the sizing of both the OSA and ESA.

Network Requirements for the ESA in Stretched Clusters

While the vSAN ESA has new networking requirements connectivity of hosts in a standard vSAN cluster, the network requirements for the ISL in a stretched cluster remain the same.  The data-site to data-site network Round Trip Time (RTT) should be no greater than 5 milliseconds (2.5ms each way) and data-site to data-site ISL bandwidth remains at no less than 10Gbps.  As the performance capabilities of vSAN increase - especially a cluster running the vSAN ESA - the demands on the ISL, as well as the expectations with performance may also increase.  Your workloads will ultimately determine the demand.  The vSAN Stretched Cluster Bandwidth Sizing document can be used for general guidance in sizing the ISL for the OSA and ESA, but further tuning may be added to help accommodate the different ways the vSAN ESA handles data across the ISL.

Recommendation:  When running the ESA in a stretched cluster, monitor the effective bandwidth utilized across the ISL to ensure it is sized correctly.  Depending on the demands of the workloads, the bandwidth and latency capabilities of the ISL may need to be revisited to meet expectations and ensure it is not the primary bottleneck.


vSAN stretched clusters offer unique resilience capabilities for environments that span across geographic locations.  Using the ESA in a stretched cluster environment sets customers up to meet the ever-growing demands of an active-active data center.



Filter Tags

Storage vSAN vSAN 8 Blog What's New Intermediate Advanced