The nature of a distributed storage system like vSAN means that a network connecting the hosts is heavily relied upon for resilient connectivity, performance, and efficiency. Traditionally, vSAN has relied upon TCP communication to move data between hosts in a reliable and consistent manner. vSAN 7 Update 2 now supports clusters configured for RDMA-based (Remote Direct Memory Access) networking. Specifically, RoCE v2 (RDMA over Converged Ethernet version 2) will be supported. Transmitting native vSAN protocols directly over RDMA can offer a level of efficiency that is difficult to achieve with traditional TCP-based connectivity over ethernet. With vSAN 7 Update 2, vSAN communications and payload will be able to use RDMA, which (on its own) will offer low levels of latency, high throughput, and high IOPS, all with reduced CPU utilization per I/O. While this is just one factor that contributes to performance and efficiency, RDMA can help position the network to not be the primary bottleneck.
RDMA provides direct memory access from the memory of one computer to the memory of another computer without involving the operating system or CPU. Transmitting native vSAN protocols directly over RDMA can offer a level of efficiency that is difficult to achieve with traditional TCP-based connectivity over ethernet. This results in lowered CPU overhead and improved performance for sequential reads and random mixed read/write workloads.
Please consult with your switch vendor for identifying switches that will support RCoEv2. vSAN RDMA Ready network adapters can be found on the vSAN VCG.
vSAN RDMA is enabled with a single toggle. If all hosts cannot establish an RDMA connection, vSAN will automatically revert the configuration for the cluster to TCP networking.
NIC Teaming with RDMA
vSAN with RDMA supports NIC fail-over, but does not support LACP or IP-hash-based NIC teaming.
RDMA Health Checks
Additional health checks will be included to provide visibility into RDMA if it is enabled in the cluster. These checks will make sure the network is configured for lossless traffic using data center bridging (DCB) and that priority flow control (PFC) is configured for a value of 3.
Host level network monitoring has increased. Within the vSAN performance service, new metrics and default alarms have been added to 7 Update 2.
For additional questions about this topic, feel free to reach out to the author John Nicholson on Twitter: @Lost_Signal