VMware vSAN HCI Mesh Tech Note
Why HCI Mesh
The scale-out architecture of VMware vSAN enables powerful non-disruptive scale-up or scale-out capabilities. You can non-disruptively expand capacity and performance by adding hosts to a cluster (scale-out) or just grow capacity by adding disks to a host (scale-up). As application workloads organically grow, this enables performance to be right sized at each expansion interval. Over time the ratio of storage and compute can be right sized through vSAN. Despite this, inorganic scale events can prove challenging to any architecture. Examples include:
- An application refactor requires significant storage being added for logs.
- A new line of business for analytics may consume excessive compute, potentially stranding storage assets.
- M&A may result in new business units bringing unforeseen storage requirements to a cluster.
- The deployment of Networking Virtualization, powered by NSX, has enabled the migration and consolidation of legacy stranded cluster resources.
Historically, when these scaling events happen it could cause an existing clusters to run out of storage or compute and potentially strand the lower demanded resource. While vMotion enables “Shared nothing migration” between clusters, this still forced storage and compute to move together between clusters.
Traditional storage have had methods of solving this challenge, but the results were often underwhelming. Array virtualization solved this problem by proxying storage through an additional array. This allowed for consolidation of stranded storage resources, but resulted in added latency, complexity, support costs as administrators were forced to use multiple user interfaces and tools for basic operational tasks and troubleshooting. For these reasons a different approach was chosen.
While vSAN can export iSCSI or NFS, the native vSAN protocol was chosing to export storage to another cluster for a number of reasons:
1. SPBM management is preserved end to end.
2. Lower compute and IO overhead is preserved by using the native vSAN RDT protocol end to end.
3. The vSAN performance service can allow for end to end monitoring of IO.
4. There is no need to manage islands of storage within LUNs or NFS exports, and no need for datastore clustering or VAAI to try to work around issues that would come from adding another layer of abstraction.
5. Storage is still managed and maintained as a cluster resource.
Deploy and manage HCI Mesh
After selecting a remote vSAN cluster managed from the same vcenter server, a set of compatability checks will automatically run to verify that the remote clsuter may be mounted.
Finally, for clusters mounting a remote datastore, the Datastore with APD response should be changed.
This setting can be found withing vCenter Server by browsing to: Cluster --> Configure --> vSphere Avalability --> Edit
Note, either aggressive or conservative can be used. For most customers, conservative will be preferred.
Migrating Storage to HCI Mesh
Once a mesh relationship has been established, a simple Storage vMotion is all that is required to migrate a Virtual Machine's storage to a remote vSAN cluster datastores. Do note, that you can change the Storage Policy (For instance, changing the RAID from RAID 1 to RAID 5), while you undergo this migration.
It is at this time, not supported to split VMDKs of a given VM across multiple datastores. To Migrate, simply right click on the virtual machine and select "migrate" followed by "storage only". Upon selecting the storage policy and compatible cluster a Storage vMotion process will migrate the storage non-disruptively.
Monitor HCI Mesh
VMware vSAN HCI Mesh includes a number of default health checks that ensure a solution will be supportable at setup. In addition to this, the vSAN performance service when run on both clusters is capable of providing end to end visibility of the IO path for both the clusters providing compute and storage to the virual machine. A new "Remote VM" tab will appear on clusters consuming remote storage that enable this performance visibility. This provides Metrics about clusters in the perspective of remote vSAN VM consumption.
When Reviewing Capcity usage from the vSAN capacity monitoring dashboard, a tool tip will appear with a quick link to any remote mounted datastores.
HCI Mesh Design Considerations
HCI Mesh Limits
Client cluster: Can mount up to a maximum of 5 remote vSAN datastores
Server cluster: Can only serve its datastore to a maximum of 5 client clusters
Connections per datastore: A single vSAN datastore in an HCI Mesh topology can support no more than 64 hosts, including both local and remote hosts connected.
Mesh cluster and hosts count totals: The number of clusters and hosts participating in the overall HCI Mesh (Any cluster connected in some form or another to the overall mesh) is limited to the total available clusters and hosts within a single datacenter object in a single vCenter.
Storage Policy Support: A policy being supported is limited based on the cluster storing the data and not the client cluster. (e.g. A VM using FTT=2 via RAID-6 must be using capacity from a cluster that is 6 hosts or larger.)
Networking Design Considerations
The cross-cluster traffic associated with HCI-Mesh is using the very same protocol stack (RDT over TCP/IP) that exists in a traditional vSAN cluster. Connections are made directly from the host running the virtual machine to the hosts supplying the backing storage.
•Since vSAN and HA communication share the same vmkernel port, HA is dependent on any links that provide communication between clusters. The same principles of HA apply, but recognizing that compute may be provided on one cluster, while storage may be provided on another. In the event of a cross cluster communication issue, an APD will occur 60 seconds after isolation event, and attempt VM restarts after HA determined settings (e.g. 180 seconds)
In an HCI Mesh Architecture, since VM’s living in one cluster may be using storage resources in another cluster, the network communication requirements will need to meet adequate levels of performance to not hinder the workloads. Latency between the clusters will add to the effective latency seen by the VMs using resources across a cluster. The recommendations are as follows:
Network topology that ensures the highest level of availability, including redundant NICs, switching across clusters, etc.)
- Network performance that reduces the likelihood it is the performance bottleneck. 25Gbps end-to-end using storage class gear is recommended.
- A recommended minimum threshold is to provide sub millisecond latency for meshed clusters. The data path may be inherently more complex as it passes across east-west cluster boundaries, which may be reflected in different network topologies. Datastore mounting prechecks are available to warn the administrator if these conditions are not met (alert will trigger at 5,000us/5ms or greater), but will not prevent the mounting of the datastore.
- Use of vSphere Distributed Switches (vDS) is needed to allow for proper bandwidth sharing via NIOC
- L2 and L3 are supported. Configuration of routing for vSAN VMkernel port traffic will be necessary.
To more easily support layer 3 configurations, vSAN 7U1 supports overriding of the default gateway for a VMkernel port from within the UI.
HCI Mesh Requirements
The following use cases are not currently supported:
Remote provisioning workflows for File Services, iSCSI, or CNS based block volume workloads (they can exist locally, but not be served remotely)
Air-gapped vSAN networks, or clusters using multiple vSAN VMkernel ports are not supported with HCI Mesh. LACP is supported as an alternative means of agregating throughput.
Objects of a VM spanning across multiple datastores (e.g. one VMDK in one datastore, and another VMDK for the same VM in another datastore)
vSAN clusters without local capacity (3 disk groups in 3 hosts are required in both clusters).