VMware vVols – Performance Evolution and Future
VMware Virtual Volumes (vVols) is an external storage solution for vSphere. It is designed for scale and performance of operations such as creating, snapshotting, cloning, etc. of virtual disks. vVols improves performance by offloading such workflows to the storage array using the VASA API implemented by a storage vendor in a VASA provider.
This article discusses the evolution of vVols from a performance and scalability perspective, and the evolution of vVols for scalability and performance of VM workflows. We'll see how vVols optimization has evolved over the past few releases.
The workflows described below are optimized on the vSphere side only. Based on the implementation of the VASA provider, it may see further improvements or regressions. Overall, these figures show that with a good VP implementation, vVols datastores will achieve scalable performance comparable to traditional storage.
Please note, actual results may vary based on numerous variables in the environment.
Performance Benchmark Setup
vVols are storage objects which serve as virtual disks for Virtual Machines in the VMware vSphere solution. The setup would need both vCenter and ESXi to measure end-to-end provisioning performance for a virtual machine backed by vVols storage. The benchmark run is performed against multiple releases of vSphere and ESXi.
The tested workflows measure the VM operations related to a VM (snapshot, register, powering on, powering off etc.). VM operations are measured on a single host with concurrent operations of the same kind running on different VMs, which basically measure the vertical scalability of the VM operations in the context of a single host. The maximum number of VMs on the single host are kept to 120 VMs and up to 16 operations are executing concurrently on 16 VMs. Operations are triggered on every VM, this makes 120 Concurrent operations queued out of which 16 operations are actively being served by the host.
For snapshot benchmarking, VMs are used with a different number of disks and multiple levels of snapshots. Along with this, there are additional benchmark workflows measuring a couple of optimizations that directly or indirectly affect the performance of some other workflows that may be triggered by the user, e.g., Datastore Browser or AllocatedBitmapVirtualVolume (the latter obtains the set of blocks actually written on a vVol).
We have chosen vSphere 7.0 U3 and 7.0 P06 releases to measure the performance of the different workflows. All the optimizations should be available in the next update of vSphere 8 along with some additional performance improvements not measured in this blog.
When using vVols, vSphere offloads a large chunk of workflows to the storage array over the VASA control path while some part of a workflow is still done in ESXi. This often accounts just for bookkeeping and pushing calls to the VASA Provider and all of that dictates the overall time taken by any provisioning operation. This makes both the efficiency of the VASA Provider and the efficiency of ESXi crucial for the scalability and performance of the VM operations.
VM Power-related workflows
VM power operations are frequent operations in the datacenter and hence become critical when running at scale and the amount of resources it consumes on the external storage during power-on cycle will impact the scalability of a datacenter overall.
For every power on operation, vSphere was allocating storage for Swap Disks, which are backed by a special vVol type called “Swap vVol”. vSphere 7.0P06 was changed to keep the swap vVol across the reboots, which reduces the overall power-on time by up to 50% and allows for ultra-fast power off by reducing the power-off time by up to 80%. It also provides performance benefits to storage arrays as well where the VASA Provider does not need to ask the Storage Array to create and destroy swap vVols for each power cycle. However, the trade-off in keeping the swap vVol across VM reboot is an increase in consumed storage space. VMware chose this tradeoff for VM power-on/off performance, but with future releases, this tradeoff may be eliminated by efficient storage management for swap vVol at power-off.
Other optimizations, like opening many disks attached to a VM in batches (called batch bind) while powering it on, make the control path less busy compared to sequential open operations asking the VASA Provider to open each disk. This allows a greater number of operations running on the VASA Provider while opening all related disks in a single batch (limited by the size of batch supported by the VASA Provider).
The first power on operation shows the improvement just by doing the batch open operation while the second power-on shows the additional benefit of keeping the Swap Disk vVol across the power cycles.
Overall, it makes second (and third etc.) power-on operations 2x faster and power-off operations are 8x faster compared to the 7.0 U3 release. These benchmarks represent VM operation time when 120 VMs concurrently are being powered-on or powered-off on a single ESXi host.
VM Registration/Un-registration related workflow
Registration and un-registration workflows are important both when adding an existing VM to the vCenter inventory and when migrating a VM (compute, storage, or both). Hence the performance of these operations is critical and should be faster for the best user experience. In 7.0 P06 this has been improved by a great factor by re-engineering the required steps to make a vVol backed VM available to use in the vCenter inventory. These changes will benefit all the workflows that trigger VM register/unregister operations for example migration, replication, and deletion of a VM.
With the final changes, we should notice more scalable and performant register and unregister operations. In our benchmarks, we have observed a good amount of improvement in these workflows. Overall compared to the 7.0U3 release, the register operation is improved by 15 % and there is a full 87% improvement in the unregister workflow on vSphere 7 P06.
Snapshots are performance-critical operations in a virtualized environment. Since these operations are frequently executed by backup applications, these operations need to be performant and the penalty of stun (the period where the VM is not executing due to being quiesced) should be minimal. The VM Stun time is a visible marker for the Snapshot performance, which affects the user experience. Hence the overall stun time becomes a very critical parameter when performing a snapshot operation. At the same time, the overall time taken to complete the operation needs to be minimized to keep the operation scalable for the backup tools and applications.
Workflow: While performing the snapshot operation it is important that the depth of the snapshot tree, and the number of disks on the VM, are considered and taken into account. Hence, in the benchmark setup, we are taking different scenarios where we have 1, 2, and 4 levels of snapshots and where the VM has either 4 disks or 16 disks. These disks are 40% used and moderate IO operations are running against the disks while the benchmark is performed.
Snapshot workflow used to get impacted with a large number of disks and the related increases in VASA calls per disk to complete the snapshot operation. Below discussed optimization would address all such issues and demonstrate how an increase in the number of disks now has much less impact on the overall snapshot performance.
Note: All the times measured are in seconds.
Before understanding the result from the benchmark operation, we should understand the snapshot workflow with vVols. In vVols, a snapshot is a two-phase process, wherein the first phase ESXi asks the VASA Provider to prepare for a snapshot and does the bookkeeping before the second phase commits the snapshot. When entering the commit phase ESXi stuns the VM and asks the VASA Provider to complete (commit) the snapshot and once done, it updates the virtual disk descriptors before resuming the VM.
As explained above, most of the heavy lifting for a snapshot operation is done by the VASA provider and still there was a moderate scope to improve the overall stun time and elapsed time on ESXi. In vSphere 7.0 P06 we now use batched commits for snapshots (aka batched/vectored snapshots in VASA) to reduce the elapsed time and to make it more scalable. In the benchmarks, we have reduced the snapshot creation time up to 15%.
Optimization comes from batching the calls for snapshots while the VM is stunned and removing all non-essential VASA call to save time spent on the control path. One thing to keep in mind is that each vendor may support a different batch size that could also change the performance characteristic of the snapshot.
There is also stun time improvements while creating the snapshot- here we see up to 16 disk snapshot can be taken with sub-second of stun time. With vVols we reduced the stun time in the bookkeeping workflow and it has been re-engineered with the help of host-side caching of metadata and improved virtual disk descriptor handling. This improvement yielded up to a 35% reduction in the best case (16 disks) and on average we see a 15% reduction in the stun time for the benchmarks above.
vVols Datastore Browser
In vVols, the Datastore browser remains a critical part since most of the time it is not possible to explore the datastore in any other way and it needs to be performant and scalable when there are a large number of VMs backed by a storage container. vSphere 7 P06 included optimizations for the datastore browser when exploring a vVols container using the vSphere UI.
Let us first understand the workflow before concluding on the numbers. When a user browses the datastore via vSphere, ESXi will make a query to the VASA Provider using the “queryVirtualVolume” VASA call to get all unique IDs of the namespace objects also known as “Config vVols” and inspect the metadata of each of these config vVols. This would require a large number of arguments being sent to VASA Provider and in case VASA Provider does not support a large number of arguments (Batch Size) these calls will become sequential and would require multiple round trips to the VASA Provider.
Below we can see the results of the improvements made in 7.0 P06. The benchmark measures the overall time taken to browse the datastore for up to 1024 Config vVol objects. So far it was heavily dependent on the VASA provider performance, which previously dictated the overall performance of the browser, but with the improved implementation of the browser, performance has become a constant time operation and independent of the VASA Provider performance. Hence the datastore browser will be able to respond in a few milliseconds compared to earlier releases where it used to take many seconds to respond.
In 7.0 P06 ESXi will cache the metadata of the Config vVol (up to a max of 1024) so that after the first browsing of the datastore, a subsequent operation would only cause VASA calls to find new or deleted config vVols and their metadata. Such changes happen only with VM creation or deletion, which makes the cache reliable and datastore browser operation a constant time operation compared to the earlier version where it had to make multiple round trips to VASA Provider.
The performance benchmark represented below shows up to a 60x performance improvement while exploring the datastore browser, which has now become a constant time operation.
Note: All the times measured are in milliseconds.
VASA Workflow Optimization
The above-mentioned changes are made in ESXi and have yielded stable and scalable operations for VM and Datastore. There is another effort, which has gone into improving the utilization of the VASA Provider resources. The following section discusses the improvement done in the VASA APIs to improve the performance of the VM operation.
This VASA API is a critical API when doing sVMotion and Backup related workflows. It is triggered for any workflow that needs the block allocation information of a vVol Object. The result of the VASA call brings a bitmap of the allocation for a given range of bytes on the disk.
With VASA 3.5, this also brings additional information for thin provisioned virtual disks, which is useful for sparsely allocated disks. VASA 3.5 allows vendors to provide a hint for the “nextAllocatedChunk”, which scans ahead and gives information about the next block allocation. Once ESXi gets this information, it can use it to fill non-allocated blocks itself (unwritten blocks will just return zeroes) without invoking the API to the VASA Provider. It reduces the time to get an allocated bitmap for the entire disk. Hence the overall elapsed time for the Storage VMotion and Backup operations reduce if this is implemented by the VASA Provider.
We should also note that this specific workflow is not implemented by all the vendors, hence the performance optimization in backup solution and sVMotion would be seen only with the vendors who have implemented this VASA 3.5 Spec functionality.
The benchmark tests for the effectiveness of the allocated bitmap when different allocation patterns are applied on the disk. The above benchmark is of a 500GB thin provisioned disk. Benchmarks performed are to scan the disk with the earlier mentioned VASA API in two different patterns mentioned below.
Storage vMotion – Here we scan the disk allocation using a 2GB segment at a time, which improves the effectiveness of the hint since we have information for the next few 2GB segments and for each trip to the VASA Provider we would get the additional hint for the next few 2GB segments.
Full Scan – Here the disk is being scanned for the entire disk bitmap based on the chunk size or block size supported by the array. This operation gets broken down into multiple allocatedBitmapVirtualVolume VASA calls, so based on the block size of the VASA Provider and the 4 MB max bitmap restriction of the VASA API, we get different
The overall performance gain is when the disk is sparsely allocated, allocatedBitmapVirtualVolume takes 3x less time to scan the disk during the storage migration of the VM. This allows us to do faster migration and reduces the time to scan the disk by up to 35% when the disk is filled densely in the beginning and up to 15 % performance improvement when using this optimization with a disk that has an allocation in every 512 MB byte offset.
We continue to work towards larger-scale operations and making the control path more performant for vVols. By scaling the VASA Provider, large clusters with shared datastores may have HA Clusters and disaggregated storage that is not feasible with traditional storage.
Yogender Solanki is one of our vVols engineers and he wrote this document to help customers understand the performance aspects engineers are and continue to work on for vVols.