Write Buffer Sizing in vSAN when Using the Very Latest Hardware
One of the privileges of presenting at VMworld each year is the opportunity to hear and answer the questions that are top of mind for vSAN customers. This year was certainly no different. A question that came up a few more times than I expected went something like this: "Does the speed of my buffer device used in a disk group change how I should size the buffer devices used for my vSAN cluster?"
The answer is "yes." But let's explore this answer in more detail to properly explain why this is the case. For simplicity, this post will be focused on the performance of write I/Os in an all-flash based vSAN cluster.
The purpose of a two-tier storage system
Caching and buffering exist throughout the data center, from chipsets, to network and storage systems. Conceptually, the reason behind a two-tier storage system is simple. This architecture provides a higher level of storage performance while keeping the cost per gigabyte/terabyte of capacity reasonable.
It achieves this through the following, as illustrated in Figure 1:
- Ingesting incoming write I/O through a relatively small, high-performance buffer tier. This sends the write acknowledgments back to the VM as quickly as possible.
- Providing capacity through more value-based, high-density storage. Data in the buffer tier is destaged to the capacity tier at a time and frequency determined by the system.
Figure 1. A two-tier storage system
Depending on the approach, a two-tier architecture also can minimize data manipulation performed at the buffer tier. Processing is reduced, and the write is absorbed into the buffer more quickly, lowering the effective latency seen by the VM. With a two-tier design, data services such as deduplication and compression can be performed as the data is destaged to the capacity tier without an immediate performance penalty to the VM - yet another reason why VMware chose this design.
There are basically two theoretical maximum speeds of a two-tier storage system. The speed of the buffering tier, and the speed of the capacity tier. It is the combination of those two tiers that help define the overall performance delivered to the VMs. If there was no buffer tier to burst to, latency would be increased, and the I/Os would be spread out over a longer duration of time, as illustrated in Figure 2.
Figure 2. Visualizing the bursts of I/O activity in a two-tier system
The duration of burst performance provided by a buffering tier does not last indefinitely. This is because the buffer tier is unable to move the data to the slower performing capacity tier as fast as it can absorb the incoming writes. How long the burst performance lasts is dependent on the amount of incoming I/O to the buffer, the size of the buffer, and the drain rate/performance to the capacity tier. The drain rate is largely dictated by the ingest rate of the more value-based devices that make up the capacity tier.
The good news is that production workloads tend to be bursty, oscillating in I/O activity. This allows a two-tier approach to "catch up" on writes to destage during times of less activity and achieves the desired result of a two-tier system: Top levels of performance for a better cost per terabyte. If there are enough workloads that have this type of bursty behavior, or there are some workloads that have a more sustained write pattern, then the steady-state rate being requested by the VMs may be beyond the steady-state I/O capabilities hardware used for the capacity tier.
Upgrading a cluster with the very latest buffer devices will allow for writes to be absorbed more quickly (reducing latency) while the destaging rate remains the same. This increases the delta in performance between the two tiers, as shown in Figure 3. Assuming the workloads are demanding the resources, this means that a faster buffer device may fill up more quickly than a slower buffer device. They may offer up better performance to the VMs, but for a shorter duration of time.
Figure 3. The performance delta between the buffer tier and the capacity tier
As the delta in performance between the buffer tier and the capacity increases, so does the fill rate of the buffer.
vSAN's implementation of a two-tier storage system
The two-tier architecture used by vSAN allows customers to take advantage of the very latest, high performing technologies in an affordable way, as described above. vSAN's two-tier architecture uses the concept of disk groups, a logical construct of one device used for caching/buffering, and one to seven devices used for capacity. vSAN allows for one to five disk groups per host, as shown in Figure 4.
Figure 4. Disk groups of a host in a vSAN cluster
vSAN has a logical limit of 600GB for a buffering device in a disk group. In an all-flash vSAN cluster, the entire capacity (up to 600GB) is reserved for write buffering. The buffering tier does act as a cache for fetching reads that have not yet destaged to the capacity tier. While the logical limit of 600GB for a buffer device exists, devices larger than 600GB are readily available, and commonly used. This improves the write endurance of the device.
Data is destaged in accordance with a variety of conditions determined by vSAN. This asynchronous task does not destage data immediately, which allows for same-block overwrites to occur on frequently written data without amplifying those writes to the capacity tier. With real workloads, a large percentage of writes can be same-block overwrites. This behavior paired with vSAN's logic and large buffers will further reduce the burden of resources at the capacity tier. Yet another benefit of configurations that maximize buffer capacity.
One of the influencing factors in destaging is the percentage of capacity consumed in the buffer by incoming writes that have not been destaged. The smaller the buffer capacity device, or the faster the buffer device, the more quickly it will fill and the sooner it will begin to destage. The sooner it begins to destage means the sooner the performance may no longer be at a peak level offered by the buffer.
Yes, you read that correctly. In a two-tier architecture where nothing else changed, introducing an even faster buffer device is functionally similar to a smaller, slightly slower buffer device. Therefore, when introducing higher performance buffering devices to your environment, you will want to understand what options you have available to you to fully realize the benefits of that faster tier.
The buffer tier's only purpose is to provide improved performance to the VMs powered by the vSAN cluster. If there is a need to improve performance even more by moving to an even faster buffer device, revisiting the design of the hosts to ensure that the newer buffer devices are not unnecessarily constrained by other hardware components or configurations is a wise step. An easy checklist of items would include:
- Strive for larger buffer devices when performance delta between buffer tier and capacity tier is significant. As buffer devices improve in performance and increase the performance delta between the buffer and capacity tiers, a larger buffer will be able to absorb a larger burst of writes – maintaining optimal performance. Sizing according to vSAN’s logical limits will provide optimal buffering capacity.
- When using faster and/or potentially smaller high-performance buffer devices, consider adding higher-performing capacity devices. This will improve the rate at which data can be destaged, and the overall steady-state performance of the cluster during long sustained periods of writes. Improving the destage rate to the capacity tier will in effect decrease the fill rate of the buffer, should a more sustained write workload exist. Higher performing capacity tiers pair nicely with high-performance buffer devices that are smaller in capacity and provides overall higher vSAN performance.
- Consider adding more disk groups. Since each disk group requires a dedicated buffer device, this means that one can introduce more buffer capacity, providing more capacity to absorb more bursts of writes - up to 3TB total buffer capacity per host using 5 disk groups. It also increases the parallelism of all write I/O received by the buffer, and destaging tasks. Larger buffer capacities will always improve the opportunity for reads to be fetched from the buffer tier, as well as same-block overwrites.
- Consider increasing the number of capacity devices within a disk group. This can potentially improve the rate at which data can be destaged, and the overall steady-state performance of the cluster during long sustained periods of writes.
- Observe your workloads using the vSAN Performance Service. This can provide the insight necessary to determine if performance is being hindered by a buffer tier being overwhelmed. See "Troubleshooting vSAN Performance" for more information.
- Run the very latest version of vSAN. vSAN 6.7 U3 made significant performance improvements for clusters running deduplication & compression. Not only did it improve the consistency of latency provided to VMs, but it also increased the destage rate through software optimizations.
Note that the performance of a specific hardware configuration is dependent on the demands of your applications, and the configuration of your cluster. Your applications may have very short periods of write bursts, or your performance of the capacity tier may be quite good. In those cases, no further adjustments may be necessary in order to realize the full benefits of the faster buffer devices.
Write buffering is an extremely effective way to deliver higher performance while keeping capacity costs reasonable. With a two-tier storage system like vSAN, introducing the very latest, fastest storage device at the buffer tier allows for writes to be acknowledged more quickly delivering lower latency to a VM. Ensuring that you have enough buffer capacity with these higher-performing devices is a great step to ensuring their performance benefits can be fully realized.