Performance Monitoring Enhancements in vSAN 7 U2
Time-based performance metrics found in vCenter Server can be one of the most effective ways to monitor and troubleshoot the performance of discrete VMs, and the vSAN cluster providing the resources. These metrics give important insight and context in a way that is quantifiable and consistent. They help you answer common questions such as "what level of resources is a given workload demanding?" and "what is the response time (latency) of storage I/O over a given time period?"
The bursty nature of workloads means that inevitably, spikes in resource usage will occur across a cluster. Spikes may be arbitrary or have a pattern in their intervals and amplitude. Some of those spikes may come from VMs that regularly demand much higher levels of resources than other VMs. These are sometimes referred to as noisy neighbors, and can potentially impact the performance of other systems. vCenter Server, and more specifically, the vSAN performance service will provide VM and aggregated cluster-level metrics for common metrics such as IOPS, throughput, and latency from all of the workloads in the vSAN cluster. These aggregate views can be helpful in many ways, but with a cluster consisting of hundreds or thousands of VMs, this may lead to the question, "Can I easily identify the top contributors of resource utilization across a cluster, at any given point in time?"
With vSAN 7 U2, the answer is, "yes!"
The new "Top Contributors" view
vSAN 7 U2 introduces the new "Top Contributors" view to help administrators quickly see VMs and disk groups that contribute to the most demand on resources provided by the vSAN cluster. The view can be found by clicking on the desired cluster, and selecting "Monitor" > "vSAN Performance" > "VM” as shown in Figure 1.
Figure 1. The "Top Contributors" view in the vSAN performance views of vCenter Server
By selecting the desired cluster-level metric, such as IOPS, throughput, or latency, the view will provide an enumerated list of top contributors, sorted by reads or writes of the selected metric.
Imagine a scenario where the load on a cluster is periodically checked, and you discover a new series of large spikes in throughput occurring every day at 1:00 am and lasting for several hours. You could use the “Top Contributors” view to easily identify the VMs contributing to that burst in load and all of the relevant details: When it started, what systems are contributing to the load, and how much of a strain they are placing on the system. This can help determine the root cause of an issue, such as inefficiencies with a database administrator's newly deployed batch processes on their OLTP database servers.
One interesting aspect of this list of contributors is that it is not limited to viewing VMs. This view can present the list of top disk groups that are contributing to the use of resources across the cluster, with full drill-down capabilities to view the performance of that given disk group, as shown in Figure 2.
Figure 2. Viewing the Top Contributors by disk group
When selecting the top disk groups, it will sort all the disk groups by load, based on reads or writes of the given IOPS, throughput, or latency metric selected. When expanding the view, one can drill into the full performance view of that disk group or view the individual disks that comprise the selected disk group.
This view can be useful for several scenarios, including the ability to determine if one disk group is underperforming in comparison to other disk groups in the cluster. One can easily determine if a disk group is experiencing much higher levels of latency than other disk groups in hosts across the cluster and if so, drill into the full performance attributes of the given disk group.
Note that the listing of VMs or disk groups are a result of clicking on a single point in time from the time-based view. Selecting a windowed area of the view will ONLY serve to zoom in to that specific window of time. It will not show the top contributors for the selected period.
The Top Contributors view pairs nicely with the VM Consolidated Performance View introduced in vSAN 7 U1. For example, one could first view a cluster in the "Top Contributors" view and take the selected heavy-hitting VMs listed and show them in an overlapped manner using the "Show specific VMs" view. Adding a critical workload or two to the latter view can help you determine if those spikes in resource utilization are affecting an important tier-one application.
Performance metrics viewed through vCenter Server represents a key strength of vSphere: Allowing you to measure the right data in the right way, from the right location. The vSAN metrics found in vCenter Server is the first place that customers should go to when they suspect performance issues in their vSAN powered cluster. The new "Top Contributors" view adds another tool to help determine what is demanding the most resources at any point in time across a vSAN cluster.