Flexible Topologies with vSAN Max
vSAN's distributed architecture has always been a natural fit for alternative topologies like stretched clusters, 2-Node clusters, and clusters using fault domains. But what about vSAN Max? Let's look at how vSAN Max can help provide centralized shared storage for your vSphere clusters using these alternative topologies.
The Flexibility of Distributed Object Storage
A vSAN HCI cluster aggregates compute and storage resources into the same hosts that comprise the cluster, which makes for an incredibly easy and powerful way to provide a stretched cluster. Simply place vSAN hosts in a cluster across two geographic sites, along with a virtual witness host appliance at a third site, and configure the cluster as a stretched cluster. Both compute and storage resources are distributed across sites in an all-in-one, cohesive manner that ensures availability of virtual machine instances and virtual machine data in the event of a partial or full site outage. The witness host appliance has been omitted for clarity from all stretched cluster illustrations on this post.
Figure 1. Site-level resilience of a VM in a vSAN HCI stretched cluster across two data sites.
The data is stored in a resilient way across the sites, meaning there are two paths from the compute resource to the data. Since compute resources and storage resources are aggregated into the same hosts that comprise the vSAN cluster, there is an inherent awareness of the availability of both resource types and preferred data path, which is one of the reasons why a vSAN HCI cluster that is stretched can automatically account for failure scenarios and other non-optimal conditions.
Stretched Topologies that Use Disaggregated Storage and Compute Resources
Conceptually, a stretched topology implies that data is resiliently stored across two defined fault domains – typically, but not always two geographic sites. It is this consideration that must be accounted for by the compute and storage resources in a disaggregated environment.
When storage and compute resources are disaggregated from each other, it must understand the characteristics of the two network paths from the compute resource to the site-resilient data. In most cases, one of the network paths (the inter-site link, or ISL) will be slower than the other. This is known as an asymmetric network topology and is shown in Figure 2. While it is the most common configuration found for a stretched cluster, it presents an interesting challenge because the system must correctly choose the optimal network path over the suboptimal network path for the best performance.
Figure 2. Asymmetric Network Topologies for Stretched Environments.
A much less common symmetric network topology is shown in Figure 3. This would represent a topology where the bandwidth and latency are the same regardless of the data path taken to complete the request. One might see this where the two fault domains, or "sites" defined are simply server racks sitting adjacent to each other using the same network spine, and can deliver less than 1ms latency between the client cluster and the server cluster within the same fault domain, or across fault domains.
Figure 3. Symmetric Network Topologies for Stretched Environments.
To help vSAN Max understand the correct network path to take in a stretched cluster topology, the configuration wizard for vSAN Max will allow you to select a network topology that represents your environment.
vSAN Max Stretched Across Geographic Sites
A vSAN Max cluster can be configured as a single site cluster, or in a stretched configuration. vSAN Max can provide site-level resilience of data by mirroring data across sites, and secondary levels of resilience through space-efficient RAID-6 erasure coding within each site. The latter delivers high levels of resilience in a space-efficient manner, and ensures data rebuilds are accomplished locally should there be a discrete host failure within a site. Since it is built using the ESA, vSAN Max provides many of the same efficiency, performance, and availability benefits described in the post: "Using the vSAN ESA in a Stretched Cluster Topology."
Figure 4 illustrates a stretched vSAN HCI cluster that is mounting the datastore of a vSAN Max cluster that is also stretched. In this type of asymmetric configuration, the vSAN HCI cluster and the vSAN Max cluster will maintain site affinity of I/O processing and data between the client cluster and the server cluster.
Figure 4. vSAN Max Stretched cluster providing resilient storage across two data sites for a vSAN HCI cluster that is also stretched.
Recommendation: Use ReadyNode profiles certified for vSAN Max for all vSAN Max deployments.
Supported Client Clusters when using vSAN Max in a Stretched Topology
The following table summarizes the types of client clusters supported when using a vSAN Max cluster in a stretched cluster configuration. These limitations also apply to server clusters running vSAN HCI with datastore sharing (previously known as HCI Mesh). The scenarios below assume the latency requirement between the client cluster and the vSAN Max cluster is met, and assumes all client clusters are using vSphere 8.
Client Cluster Type | Server Cluster Type | Supported | Notes |
vSAN HCI clusters (ESA) in a stretched cluster configuration. | vSAN Max cluster or vSAN HCI cluster (ESA) in a stretched cluster configuration | Yes | Provides resilience of data and high availability of running VM instances. |
vSAN HCI clusters (ESA) when it resides in one of the data sites where the vSAN Max cluster resides. | vSAN Max cluster or vSAN HCI cluster (ESA) in a stretched cluster configuration | Yes | Provides resilience of data but no high availability of running VM instances. |
vSphere clusters stretched across two sites using asymmetrical* network connectivity. | vSAN Max cluster or vSAN HCI cluster (ESA) in a stretched cluster configuration | No | Not supported at this time. |
vSphere clusters stretched across two sites using symmetrical* network connectivity. | vSAN Max cluster or vSAN HCI cluster (ESA) in a stretched cluster configuration | Yes | Supported, but less common, as it would require the same network capabilities (bandwidth and latency) between fault domains defining each site. |
vSphere clusters when it resides in one of the data sites where the vSAN Max cluster resides. | vSAN Max cluster or vSAN HCI cluster (ESA) in a stretched cluster configuration | Yes | Provides resilience of data but no high availability of running VM instances. |
Any client cluster running vSAN OSA | vSAN Max cluster or vSAN HCI cluster (ESA) in a single site or stretched cluster configuration | No | Not supported at this time. |
As noted above, when a vSAN Max cluster is configured as a stretched cluster using an asymmetrical network topology, a vSphere cluster mounting the vSAN Max datastore and stretched across the same two sites is not currently supported. If site-level resilience of both data and VM instances is required, a vSAN HCI cluster as a client cluster in a stretched configuration may be the better option at this time. This will ensure that the VM instances and the data they serve will remain highly available.
When used in a stretched cluster configuration, vSAN Max clusters will have the same network bandwidth and latency requirements between sites as traditional vSAN HCI clusters of the same size. See the vSAN Stretched Cluster Bandwidth Sizing guide for more information.
Recommendation. Size your Inter-site link (ISL) based on your workload demands. Given that the vSAN Max cluster is able to offer high performance storage, ensure that the ISL is able to deliver the bandwidth and latency necessary for your workloads. This means your environment may need more than the 10Gbps of bandwidth stated as the minimum necessary for this type of topology.
vSAN Max using vSAN's Fault Domains Feature
vSAN Max can also be configured using vSAN's Fault Domains feature, which is most commonly used to provide rack level awareness and resilience for larger clusters. The Fault Domains feature became much more efficient with the ESA, and since vSAN Max is built off of that architecture, it delivers all of the enhanced levels of performance, efficiency, and data availability associated with the ESA.
Figure 5. vSAN Max providing rack level resilience using the Fault Domains feature.
When configured correctly, the Fault Domain feature is generally limited to larger clusters. This is because, as shown in Figure 5 above, a RAID-6 erasure code spreads data and parity across a minimum of six fault domains, and we recommend at least 3 hosts per fault domain. To achieve this same rack level resilience using a relatively smaller cluster, one can simply place one (and no more than one) host in a vSAN Max cluster per rack, without enabling the Fault Domains feature, as shown in Figure 6. In this configuration, it will provide rack-level resilience in the same manner.
Figure 6. vSAN Max providing rack level resilience without using the Fault Domains feature.
This type of strategy will change how the vSAN traffic will traverse across the network spine, and should be a part of your consideration when designing your vSAN Max cluster.
While our typical recommendation is to enable the "Host Rebuild Reserve" toggles for vSAN Max clusters, note that these toggles cannot be enabled when configuring vSAN Max in a stretched topology, or when using the vSAN Fault Domains feature.
Summary
Customers can enjoy many of the same topology options in vSAN Max that are found with vSAN HCI clusters, but should also be aware of supported configurations and other design considerations when using these topologies.