]

Solution

  • Storage

Type

  • Document

Level

  • Overview

Category

  • Deployment Considerations
  • Technical Overview

Product

  • vSAN
  • vSAN 6.7
  • vSAN 7

Running Tier-1 Apps with HCI Mesh Solution Overview

Solution Overview

vSAN HCI Mesh Delivers Efficency For Changing Application Requirements

VMware® vSAN™ is the industry-leading software powering HyperConverged Infrastructure (HCI) solutions. vSAN is optimized for VMware vSphere® virtual machines and natively integrated with vSphere. Since drives internal to the vSphere hosts are used to create a vSAN datastore, there is no dependency on expensive, difficult to manage, external shared storage.

The Scale-out architecture of VMware vSAN enables powerful non-disruptive scale-up or scale-out capabilities. You can non-disruptively expand capacity and performance by adding hosts to a cluster (scale-out) or just grow capacity by adding disks to a host (scale-up). As application workloads organically grow, this enables performance to be right-sized at each expansion interval. Over time the ratio of storage and compute can be right-sized through vSAN. Despite this, inorganic scale events can prove challenging to any architecture. Examples include:

• An application refactor requires significant storage being added for logs.

• A new line of business for analytics may consume excessive compute, potentially stranding storage assets.

• M&A may result in new business units bringing unforeseen storage requirements to a cluster.

• The deployment of Networking Virtualization, powered by NSX, has enabled the migration and consolidation of legacy stranded cluster resources.

vSAN HCI Mesh Architecture

VMware vSAN HCI Mesh™ can be implemented with as few as two clusters. A local cluster, a cluster providing the storage, and a remote cluster that provides compute for the workload. All clusters are deployed within the same datacenter. The diagram below shows what the high-level architecture might look like for an organization with 3 clusters. In this example, one cluster is providing storage to two remote clusters. It is additionally possible for clusters to both receive and shares storage with another given cluster. For more technical information and requirements see the vSAN HCI Mesh Tech Note.

 

 

Since the local cluster providing storage must handle external storage traffic as well as intra-cluster replication traffic, network connectivity requirements are critical. 25Gbps is strongly recommended for clusters that will be sharing storage with remote clusters.

Hosts at the remote clusters should be in the same datacenter and ideally be directly connected to the cluster hosting the storage. This reduces the opportunity for network contention, minimizes complexity, and improves reliability.

vSAN is built on the concept of fault domains. The use of HCI mesh expands the fault domain for a virtual machine from it's the local cluster to potentially two clusters. HCI Mesh does provide for separating the updates of vSphere of the cluster running the compute for a virtual machine, from the update of the cluster powering the storage.

Configuring vSAN HCI mesh is simple. The vSphere Web Client is used to mount a remote datastore.

HCI Mesh Tier 1 Application Testing

Today, VMware vSAN is commonly used for traditional tier 1 applications. As part of validating VMware HCI Mesh, several tier 1 applications were tested to better understand the scaling and performance characterizes of HCI Mesh. The following applications were tested:

    SAP HANA
    Oracle
    Microsoft SQL
    CoachroachDB
    Kafka

Several key observations were made as a result of this testing.

Large Block Benchmarks - Network Performance Is Key

SAP benchmarks that focus on log read and write performance. Some observations found were:

  • Transactions and latency were significantly improved on the local cluster over the remote cluster.
  • The 10Gbps network used in the testing system was found to be the bottleneck.
  • large throughput reads did not benefit significantly from the client cache in the remote hosts.

Storage light workloads that rely on small block read work can scale effectively with 10Gbps. Write heavy and large block read workloads can expose 10Gbps networking for the clusters providing storage. These hosts must provide both backend replication traffic, as well as front end storage traffic to the remote compute cluster.

VMware Recommends - Deploy 25Gbps networking (or faster) for clusters needing to deliver high performance. Either LACP or larger interfaces are particuarlly important for the cluster providing storage to the remote clusters.

Small block read heavy benchmarks - Could the remote cluster be "faster?"

In testing SQL, in some cases, the remote cluster was found to be slightly (10-15%) faster.

Why is this? In this case, it was a result of:

  • A complete lack of contention with storage data services, on a workload that was otherwise configured and deployed to consume 100% of the host's CPU.
  • A workload that was not taxing the networking, or exposing the cluster to cluster network connectivity as a bottleneck.
  • In these cases, the local cluster hosting the storage was otherwise idle. In effect twice as much compute was being provided to the entire solution.
  • Read workloads that can fit within the DRAM-based client cache will not need to be read from the local cluster providing storage.

Performance for applications running in the remote cluster was largely comparable to the local cluster when compute was the primary bottleneck. For workloads that fit within the storage network's ability to deliver there is no significant advantage to placing the workload local or remote.

 

Application Testing Conclusions

In general, workloads that do not significantly put hosts under extreme CPU load, Push the limits of the storage network the latency and transactions were found to be comparable (within 2-15%) between the local and remote cluster. Additional VMware CPU Efficiency features (DRS) and tuning of workloads (designing around NUMA node placement etc) are likely to have a great influence on application performance than the choice of local or remote for these non-outlier virtual machines. Simple decisions such as deploying a higher clock and more core rich CPUs, and 25/100Gbps NICs and switching will likely outweigh any overhead to this feature itself. 

For best practices for Tier 1 applications check out the specific application VMware vSAN solution briefs and reference architectures.

Filter Tags

  • Storage
  • Overview
  • Deployment Considerations
  • Technical Overview
  • Document
  • vSAN
  • vSAN 6.7
  • vSAN 7