VMware Horizon 7 on VMware vSAN 6.2 All-Flash
This section covers the Business Case, Solution Overview and Key Results of the VMware Horizon 7 on VMware vSAN 6.2 All-Flash document.
Customers today wanting to deploy a virtual desktop infrastructure on All-Flash require a cost-effective, highly scalable, and an easy-to-manage solution. Applications need to be refreshed and published at will and should not require multiple levels of IT administration. Most importantly, the infrastructure itself must be able to scale with minimal cost impact yet still provide enterprise-class performance.
All-Flash storage products have traditionally been regarded as too expensive. VMware vSAN™ changes this by supporting All-Flash configurations with extreme performance and radically simple management at a cost that is lower than many competing hybrid solutions.
VMware Horizon 7 on VMware vSAN 6.2 delivers the required solution. VMware enables the virtual desktop infrastructure to be rapidly deployed and managed by coupling with VMware App Volumes™ 2.11, with the just-in-time desktop provisioning technology of Instant Clone, and with the space efficiency features.
This document provides recommendations and details on the deployment and performance of 200 virtual desktops per VMware vSphere® host using VMware Horizon and vSAN All-Flash.
Table 1 summarizes the results of All-Flash vSAN with Horizon 7.
Table 1. Key Results Overview
This section provides the purpose, scope and the intended audience of this document.
This reference architecture provides a standard, repeatable, and highly scalable design that can be easily adapted to specific environments and customer requirements. It aims at developing a common customer virtual desktop infrastructure (VDI) environment using Horizon 7 on All-Flash vSAN 6.2 with substantial storage cost saving.
This reference architecture:
- Demonstrates storage performance, scalability, and resilience of Horizon–based VDI deployments using All-Flash vSAN.
- Validates instant clone introduced in Horizon 7 with App Volumes 2.11 works well with vSAN to manage desktops and applications.
- Proves vSAN with space efficiency features enabled can easily support sustainable workloads with minimal resource overhead and impact on desktop application performance.
This reference architecture is intended for customers—IT architects, consultants, and administrators—involved in the early phases of planning, design, and deployment of VDI solutions using VMware Horizon running on All-Flash vSAN. It is assumed that the reader is familiar with the concepts and operations of Horizon technologies and VMware vSphere products.
This section provides an overview of the technologies used in this solution.
This section provides an overview of the technologies that are used in this solution:
- VMware vSphere 6.0 Update 2
- VMware vSAN 6.2
- All-Flash architecture
- Deduplication and compression
- Erasure coding
- Client Cache
- Sparse swap
- VMware Horizon 7
- Instant Clone Technology
- VMware App Volumes 2.11
VMware vSphere 6.0 Update 2
VMware vSphere is the industry-leading virtualization platform for building cloud infrastructures. It enables users to run business-critical applications with conﬁdence and respond quickly to business needs. vSphere accelerates the shift to cloud computing for existing data centers and underpins compatible public cloud offerings, forming the foundation for the industry’s best hybrid cloud model. VMware vSphere 6.0 Update 2 supports the following new features that can benefit the solution:
- High Ethernet link speed: VMware ESXi™ 6.0 Update 2 supports 25G and 50G Ethernet link speeds.
- VMware host client: the VMware host client is an HTML5 client used to connect to and manage single ESXi servers.
- VMware vSAN 6.2: the new VMware vSAN 6.2 is an integral part of ESXi 6.0 Update 2.
VMware vSAN 6.2
VMware vSAN is VMware’s software-defined storage solution for hyperconverged infrastructure, a software-driven architecture that delivers tightly integrated compute, networking, and shared storage from a single virtualized x86 server.
With the major enhancements in vSAN 6.2, vSAN provides enterprise-class scale and performance as well as new capabilities that broaden the applicability of the proven vSAN architecture to business-critical environments. The new features of vSAN 6.2 include:
- Deduplication and compression: software-based deduplication and compression optimizes All-Flash storage capacity, providing as much as 7x data reduction with minimal CPU and memory overhead.
- Erasure coding: erasure coding increases usable storage capacity by up to 100 percent while keeping data resiliency unchanged. It is capable of tolerating one or two failures with single parity or double parity protection.
- Software checksum: end-to-end data checksum detects and resolves silent errors to ensure data integrity. This feature is policy-driven.
- Client cache: this feature leverages local dynamic random access memory (DRAM) to virtual machines to accelerate read performance. The amount of memory allocated is 0.4 percent of the total host memory and the client cache distributed to local virtual machines is up to 1GB per host. Client cache extends DRAM caching of CBRC to linked clones, App Volumes, and other non-replica components.
With these new features, vSAN 6.2 provides the following advantages:
- VMware HyperConverged Software (HCS)-powered All-Flash solutions available at up to 50 percent less than the cost of other competing hybrid solutions in the market.
- Increased storage utilization by as much as 10x through new data efficiency features including deduplication and compression, and erasure coding.
- Future-proof IT environments with a single platform supporting business-critical applications, OpenStack, and containers with up to 100K IOPS per node at sub-millisecond latencies.
All-Flash vSAN aims at delivering extremely high IOPS with predictable low latencies. In All-Flash architecture, two different grades of flash devices are commonly used in an All-Flash vSAN configuration: lower capacity and higher endurance devices for the cache layer; more cost-effective, higher capacity, and lower endurance devices for the capacity layer. Writes are performed at the cache layer and then destaged to the capacity layer, only as needed. This helps extend the usable life of lower endurance flash devices in the capacity layer and lower the overall cost of the solution.
Figure 1. Virtual SAM All-Flash Datastore
Deduplication and Compression
Near-line deduplication and compression happens during destaging from the caching tier to the capacity tier. Deduplication and compression is a cluster-wide setting that is deactivated by default and can be enabled using a simple drop-down menu. The deduplication algorithm utilizes a 4K-fixed block size and is performed within each disk group. In other words, redundant copies of a block within the same disk group are reduced to one copy, but redundant blocks across multiple disk groups are not deduplicated. Bigger disk groups might result in a higher deduplication ratio. The blocks are compressed after they are deduplicated.
Figure 2. Deduplication and Compression for Space Efficiency
Erasure coding provides the same levels of redundancy as mirroring, but with a reduced capacity requirement. In general, erasure coding is a method of taking data, breaking it into multiple pieces and spreading it across multiple devices, while adding parity data so it may be recreated in the event that one or more pieces are corrupted or lost.
In vSAN 6.2, two modes of erasure coding are supported:
- RAID 5 in 3+1 configuration, which means 3 data blocks and 1 parity block per stripe.
- RAID 6 in 4+2 configuration, which means 4 data blocks and 2 parity blocks per stripe.
In this case, RAID 5 requires four hosts at a minimum because it uses a 3+1 logic. With four hosts, one can fail without data loss. This results in a significant reduction of required disk capacity. Normally, a 20GB disk would require 40GB of disk capacity in a mirrored protection, but in the case of RAID 5, the requirement is only around 27GB.
Figure 3. RAID 5 Data and Parity Placement
With RAID 6, two host failures can be tolerated when using RAID 1 protection. In the RAID 1 scenario for a 20GB disk, the required disk capacity would be 60GB. However, the required disk capacity is 30 GB with RAID 6. Note that the parity is distributed across all hosts and there is no dedicated parity host. A 4+2 configuration is used in RAID 6, which means that at least six hosts are required in this configuration.
Figure 4. RAID 6 Data and Parity Placement
Space efficiency features (including deduplication, compression, and erasure coding) work together to provide up to 10x reduction in dataset size.
vSAN 6.2 has a small in-memory read cache. Small in this case means 0.4 percent of a host’s memory capacity up to a max of 1GB. Note that this in-memory cache is a client side cache, meaning that the blocks of a VM are cached on the host where the VM is located. This feature is enabled by default.
Sparse swap is one of the space efficiency features of vSAN 6.2 that is available for both All-Flash and Hybrid configurations. Swap files on vSAN by default, are created with the vswp 100 percent reserved. By setting the advanced “SwapThickProvisionedDisabled” host setting, the swap file is provisioned thin and disk space is only claimed when the swap file is consumed.
Recommendation: Enable swap file thin provisioning only in environments where physical memory is not overcommitted. If physical memory is overcommitted, keep the default setting of thick-provisioned swap files.
VMware Horizon 7
VMware Horizon desktop and application virtualization solutions provide organizations with a streamlined approach to delivering, protecting, and managing desktops and applications while containing costs and ensuring that end users can work anytime, anywhere, and across any device.
With the introduction of Horizon 7, VMware is drawing on the best of mobile and cloud, offering greater simplicity, security, speed, and scale in delivering on-premises virtual desktops and applications with cloud-like economics and elasticity of scale. With this release, customers can enjoy:
- Just-in-time desktops—leverage Instant Clone Technology coupled with App Volumes to dramatically accelerate the delivery of user-customized and fully personalized desktops. Dramatically reduce infrastructure requirements while enhancing security by delivering a brand new personalized desktop and application services to end users every time they log in.
- VMware App Volumes—provides real-time application delivery and management.
- VMware User Environment Manager™—offers personalization and dynamic policy configuration across any virtual, physical, and cloud-based environment.
- Horizon Smart Policies—deliver a real-time, policy-based system that provides contextual, fine-grained control. IT can now intelligently enable or deactivate client features based on user device, location, and more.
- Blast Extreme—purpose-built and optimized for the mobile cloud, this new additional display technology is built on industry-standard H.264, delivering a high-performance graphics experience accessible on billions of devices including ultra-low-cost PCs.
Instant Clone Technology
VMware introduced Instant Clones with the release of Horizon 7. Instant Clone Technology uses rapid in-memory cloning of a running parent virtual machine and uses copy-on-write to rapidly deploy the virtual machines.
- Administrators can quickly provision from a parent virtual machine whenever new desktops are needed, just in time for a user to log in. With this type of speed, you can reduce the number of spare machines as required.
- Instant clones do not need to be refreshed, recomposed, or rebalanced. After a user logs out of a desktop, the desktop is always deleted and recreated as a fresh image from the latest patch.
- Desktop-pool image changes can also be scheduled during the day with no downtime for the users or for the availability of the desktop pool, so the View administrator has a full control over the changes when the users receive the latest image.
VMware App Volumes 2.11
VMware App Volumes 2.11 is an integrated and unified application delivery and end-user management system for Horizon and virtual environments:
- Quickly provision applications at scale.
- Dynamically attach applications to users, groups, or devices, even when users are logged into their desktop.
- Provision, deliver, update, and retire applications in real time.
- Provide a user-writable volume allowing users to install applications.
App Volumes makes it easy to deliver, update, manage, and monitor applications and users across VDI and published application environments. It uniquely provides applications and user environment settings to desktop and published application environments and reduces management costs by efficiently delivering applications from one virtual disk to many desktops or published application servers. Provisioning applications requires no packaging, no modification, and no streaming.
App Volumes 2.11 is designed to support Instant Clone Technology in conjunction with VMware Horizon 7.
- App Volumes works by binding applications and data into specialized virtual containers called AppStacks that are attached to each Windows user session at login or reboot, ensuring the most current applications and data are delivered to the user.
- App Volumes integrates a simple agent-server-database architecture into an existing View deployment. Centralized management servers are configured to connect to deploy virtual desktops that run an App Volumes Agent. An administrator can grant application access to shared storage volumes for users or virtual machines or both.
Figure 5 shows the major components of a View environment where App Volumes is deployed.
Figure 5. App Volumes High-Level Architecture
This section introduces the resources and configurations for the solution including architecture diagram, hardware & software resources and other relevant VM and storage configurations.
This section introduces the resources and configurations for the solution including:
- Architecture diagram
- Hardware resources
- Software resources
- Virtual Machine test image build
- Network configuration
- vSAN configuration
- VMware View Storage Accelerator
- VMware Cluster configuration
- VMware ESXi Server: storage controller mode
- Horizon 7 installation
Figure 6 shows architectural design of vSphere Cluster.
Figure 6. vSphere Cluster Design
Table 2 shows two vSAN Clusters used in the environment.
- 8-node All-Flash vSAN Cluster was deployed to support 1,000 virtual desktops.
- 4-node hybrid vSAN Management cluster was deployed to support infrastructure, management, and Login VSI launcher virtual machines used for scalability testing.
Table 2.Hardware Resources
|Server||8 x rack server|
|CPU||2 sockets, Intel(R) Xeon(R) CPU of 3 GHz 10-core|
|Network adapter||2 x Intel 10 Gigabit SFI/SFP|
|Storage adapter||2 x 12Gbps SAS PCI-Express|
|Disks||SSD: 2 x 800GB class-D 6Gbps SAS drive as cache SSD
SSD: 8 x 400GB class-D 6Gbps SAS drive as capacity SSD
Table 3 shows software resources used in this solution and 0 lists system configurations for different server roles.
Table 3. Software Resources
|VMware vCenter and ESXi||6.0 Update 2||ESXi Cluster to host virtual machines and provide vSAN Cluster. VMware vCenter Server provides a centralized platform for managing VMware vSphere environments|
|VMware vSAN||6.2||Software-defined storage solution for hyperconverged infrastructure|
|VMware Horizon||7||Horizon 7 offers greater simplicity, security, speed, and scale in delivering on-premises virtual desktops and applications while offering cloud-like economics and elasticity of scale.|
Table 4. System Configuration
|INFRASTRUCTURE VM ROLE||VCPU||RAM (GB)||STORAGE (GB)||OS|
|Infrastructure VM Role||vCPU||RAM (GB)||Storage (GB)||OS|
|Domain Controller and DNS)||2||6||40||Windows Server 2012 R2 64-bit|
|SQL Server (Composer DB)||4||8||140||Windows Server 2012 R2 64-bit|
|Horizon View 7 Composer||4||8||100||Windows Server 2012 R2 64-bit|
|Horizon View 7 Connection Server||4||10||60||Windows Server 2012 R2 64-bit|
|App Volumes 2.11||4||8||100||Windows Server 2012 R2 64-bit|
Virtual Machine Test Image Build
Two different virtual machine images were used to provision desktop sessions in the View environment, one for instant clone and the other for linked clone with App Volumes with Login VSI. We used optimization tools according to VMware OS Optimization Tool. The test image configurations are the same for instant clone and linked clone except the VMware View Agent: select Horizon View Composer Agent for linked clone and select Horizon Instant Clone Agent for instant clone.
Table 5. Virtual Machine Test Images
|ATTRIBUTE||LOGIN VSI IMAGE|
|Desktop OS||Windows 7 Enterprise SP1 (64-bit)|
|Hardware||VMware Virtual Hardware version 11|
|Virtual network adapter 1||VMXNet3 Adapter|
|Virtual SCSI controller 0||Paravirtual|
|Virtual disk—VMDK 1||30GB|
|Applications||Adobe Acrobat 11
Adobe Flash Player 16
Doro PDF 1.82
Internet Explorer 11
Microsoft Office 2010
|VMware View Agent®||7.00-3634043|
A VMware vSphere Distributed Switch™ (VDS) acts as a single virtual switch across all associated hosts in the data cluster. This setup allows virtual machines to maintain a consistent network configuration as they migrate across multiple hosts.
Figure 7. vSphere Distributed Switch
Network I/O control was enabled for the distributed switch. The following settings and share values were applied on the resource allocation as shown in Table 6.
Table 6. Resource Allocations for Network Resources in vSphere Distributed Switch
|NETWORK RESOURCE POOL||HOST LIMIT (MBPS)||PNIC SHARES||SHARES|
Linked clones and instant clones use vSAN for storage. Each ESXi server has the same configuration of two disk groups, each consisting of one 800GB cache-tier SSD and four 400 GB capacity-tier SSDs.
vSAN Storage Policy
vSAN can set availability, capacity, and performance policies per virtual machine if the virtual machines are deployed on the vSAN datastore. Horizon creates the default storage policies automatically. We need to modify the storage policy to enable or deactivate certain vSAN features. Table 7 shows the storage policy setting of RAID 1 with software checksum deactivated while Table 8 shows the storage policy setting of RAID 5.
Table 7. vSAN Storage Settings with RAID 1 and Software Checksum Disabled
|Number of Failures to Tolerate (FTT)||1|
|Number of disk stripes per object||1|
|Flash read cache reservation||0%|
|Object space reservation||0%|
|Disable object checksum||Yes|
Table 8. vSAN Storage Settings with RAID 5
|Number of FTT||1|
|Number of disk stripes per object||1|
|Flash read cache reservation||0%|
|Object space reservation||0%|
|Failure tolerance method||RAID 5 (erasure coding)-capacity|
Several vSAN 6.2 feature combinations are used in this solution. Table 9 lists the abbreviations used in this reference architecture to represent the feature configuration. We set FTT to 1 in all settings.
Table 9. Feature Configurations and Abbreviations
|NAME||CHECKSUM||RAID LEVEL||DEDUPLICATION AND COMPRESSION||CLIENT CACHE||SPARSE SWAP|
VMware View Storage Accelerator
View Storage Accelerator is an in-memory host caching capability that uses the Content-Based Read Cache (CBRC) feature in ESXi hosts. CBRC provides a per-host RAM-based solution for View desktops, considerably reducing the read I/O requests that are issued to the storage layer. It also improves performance during boot storms when multiple virtual desktops are booted at once, which causes a large number of reads. CBRC is beneficial when administrators or users load applications or data frequently.
VMware Cluster Configuration
Table 10 lists VMware Cluster configuration. We should enable VMware vSphere High Availability (vSphere HA) and DRS features in VMware Cluster.
Table 10. ESXi Cluster Configuration
|Cluster features||vSphere HA||–||Enabled|
VMware ESXi Server: Storage Controller Mode
The storage controller supports both pass-through and RAID mode. It is recommended to use controllers that support the pass-through mode with vSAN to lower complexity and ensure performance.
Horizon 7 Installation
The installation of View in Horizon includes the following core systems:
- Two connection servers
- One vCenter Server (vCenter Appliance) with the following roles:
- vCenter single sign-on (SSO)
- vCenter Inventory Service
- View in Horizon Composer
- App Volumes Manager
App Volumes delivers native applications to VMware Horizon virtual desktops on-demand through VMDKs. App Volumes Manager plays two roles:
- Administrator—Provisions new AppStacks, assigns AppStacks with applications to VMs, and monitors processes and usage.
- Service provider—Brokers the assignment of applications to end users, groups of users, and computers.
Note: We did not use security servers during this testing.
vCenter Server Settings
View Connection Server uses vCenter Server to provision and manage View desktops. vCenter Server is configured in View Manager as shown in Table 11.
Table 11. View Manager—vCenter Server Configuration
|Description||View vCenter Server|
|Connect using SSL||Yes|
|View Composer Port||18443|
|Enable View Composer||Yes|
Max Concurrent vCenter Provisioning Operations
Max Concurrent Power Operations
Max Concurrent View Composer Maintenance Operations
Max Concurrent View Composer Provisioning Operations
Max Concurrent Instant Clone Engine Provisioning Operations
Enable View Storage Accelerator
Default Host Cache Size
App Volumes Settings
We tested all the applications in a single AppStack except that IE (Internet Explorer) is installed on the OS by default.
Table 12. App Volumes—AppStack Configuration
|Location||[vsanDatastore] cloudvolumes/apps/newapp.vmdk (6,536 MB)|
|Template||[vsanDatastore] cloudvolumes/apps_templates/template.vmdk (2.11.0)|
|Applications||Adobe_Flash_Player_16_ActiveX, Adobe_Reader_XI_-11.0.10, Doro_1.82, FreeMind, Microsoft_Office_Professional_Plus_2010)|
In this section, we present the test methodologies and processes used in this operation guide.
The solution validates that the All-Flash vSAN storage platform can deliver the required performance for above 1,000 desktops with App Volumes 2.11 on vSAN 6.2 with its new features enabled. The solution includes the following tests:
- Performance benchmarking testing: to measure the VDI performance using Login VSI (Knowledge Worker).
- View Operation testing: to validate vSAN new features, which reduce the total storage needed with an excellent performance on Horizon 7.
- Resiliency testing: to ensure vSAN can support sustainable workload under predictable failure scenarios and the impact on application performance is limited.
- App Volumes backup: to verify the backup utility of App Volumes allows VMware App Volumes AppStacks and Writable Volumes to be backed up and recovered.
We used the following monitoring and benchmark tools in the solution:
- Monitoring tools
- vSAN Performance Service
The performance service collects and analyzes performance statistics and displays the data in a graphical format. vSAN administrators can use the performance charts to manage the workload and determine the root cause of problems. When the vSAN Performance Service is turned on, the cluster summary displays an overview of vSAN performance statistics, including IOPS, throughput, and latency. vSAN administrators can view detailed performance statistics for the cluster, for each host, disk group, and disk in the vSAN Cluster.
esxtop is a command line tool that can be used to collect data and provide real-time information about the resource usage of a vSphere environment such as CPU, disk, memory, and network usage. We measure the ESXi Server performance by this tool.
- VMware vRealize™ Operations Manager
VMware vRealize Operations Manager delivers intelligent operation management with application-to-storage visibility across physical, virtual, and cloud infrastructures. Using policy-based automation, operation teams automate key processes and improve IT efficiency.
- Workload testing tool
- Login VSI 4.1.5
Use the Login VSI in Benchmark mode with 20 sessions to measure VDI performance in terms of Login VSI baseline performance score (also called VSIbase or Login VSI index average score). The Login VSI baseline performance score is based on the response time reacting to the Login VSI workloads. A lower Login VSI score is better because it reflects that the desktops can respond with less time. In the tests the workload type is ‘Knowledge Worker * 2vCPU’. For the various Login VSI notations, see VSImax.
We took the following parameters into consideration to measure the testing performance:
- Test running time
- Benchmark VSImax
- CPU memory usage
- vSAN IO latency, IOPS
Login VSI Benchmarking
We used the Login VSI tool to load the target environment with simulated user workloads and activities. Common applications like Microsoft Office, Internet Explorer, and Adobe PDF Reader were utilized during the testing.
Login VSI 4.1 has several different workload templates depending on the type of user to be simulated. Each workload differs in application operations and in the number of operations executed simultaneously. Knowledge Worker workload was used in the testing. The medium-level Knowledge Worker workload was selected for because it is the closest analog to the average desktop users in our customer deployments.
This test was based on the Login VSI in the benchmark mode, which is a locked-down workload based on the Knowledge Worker template. It does not allow any workload parameters to be modified. This is an accurate way of performing a side-by-side comparison between VSIMax results in different configurations and platforms.
Note: You might notice the wording of “VSImax was not reached” in some of the test results. This is because we have more server capacity available for Login VSI. We have previously determined the number of sessions to run concurrently to achieve optimal results.
The VDI workload in general is very CPU intensive. vSAN can support up to 200 desktops per host from the storage perspective if host CPU is sized properly. The specific test environment limited CPU configuration on each host. We found that host CPU was completely saturated during Login VSI Knowledge Worker workload when number of desktops per host reached a certain level. Therefore, we focused our tests on 1,000 desktops to observe vSAN performance.
The AppStack “newapp” was created based on the default App Volumes AppStack template, which included the applications used by Login VSI. The “newapp” size was 6,536MB.
Figure 8. AppStack
Login VSI tests were running on 1,000 instant clones with one Appstack in R1 and R5 configurations as shown in Table 9 to validate the space efficiency feature RAID 5 has minimal impact on performance.
Login VSI tests were also running on 1,000 linked clones with one AppStack in R1B, D+R5, and D+R5+NC configurations as shown in Table 9 to validate that vSAN client cache improves performance greatly and space efficiency features (deduplication and compression, and RAID 5) affect performance slightly.
Test 1: Instant Clone in R1 Configuration
Figure 9 shows VSImax Knowledge Worker v4.1 was not reached with the baseline performance score of 892. We ran 1,000 sessions in total and 971 sessions ran successfully. This was equal to 971 desktop users reading documents, sending emails, printing docs, and browsing the internet.
Figure 9. VSImax on Login VSI Knowledge Worker Workload, 1,000 R1 Desktops
Figure 10 illustrates the CPU usage: the peak average CPU usage was 84 percent. Although we had additional CPU headroom, it would not be realistic to push the host CPU to 100 percent since this would have a negative impact on other services.
Figure 10. CPU Usage during Login VSI Knowledge Worker Workload, R1
Figure 11. Memory Usage during Login VSI Knowledge Worker Workload, R1
From vSAN Performance Service as shown in Figure 12, IOPS increased steadily because the number of active sessions increased. Peak write IOPS was 7,264 and peak read IOPS was 7,280.
Figure 12. vSAN IOPS during Login VSI Knowledge Worker Workload, R1
As shown in Figure 13, peak write latency was 3.845ms. Read latency was very low during the Login VSI testing: peak read latency was 1.831ms. There were 971 current users with two vCPUs per Windows 7 (64-bit) desktops running the Knowledge Worker workload.
Figure 13. vSAN Latency during Login VSI Knowledge Worker Workload, R1
Test 2: Instant Clone in R5 configuration
With 971 sessions, the maximum capacity VSImax (v4.1) Knowledge Worker was not reached with the baseline performance score of 943 as shown in Figure 14.
Figure 14. VSImax on Login VSI Knowledge Worker Workload, 1,000 R5 Desktops
From the average ESXi CPU usage as shown in Figure 15, the peak average CPU usage was 85 percent.
Figure 15. CPU Usage during Login VSI Knowledge Worker Workload, R5
Memory consumption increased slightly during the test: peak average memory consumed was 257,060MB (251GB). ESXi memory was 512GB, which was about 49 percent usage. The average kernel memory usage was 33,080MB (around 32GB).
Figure 16. Memory Usage during Login VSI Knowledge Worker Workload, R5
From vSAN Performance Service, IOPS increased nearly linearly because the number of active sessions increased. Peak write IOPS was 6,899 and peak read IOPS was 7,096.
Figure 17. vSAN IOPS during Login VSI Knowledge Worker Workload, R5
vSAN latency was low as shown in Figure 18, latency increased slightly because the number of active sessions increased. Peak write latency was 8.597ms and peak read latency was 1.493ms during the test. The yellow line in the diagram represents the warning line. If the latency value exceeds the warning line, the performance is not in a good status.
Figure 18. vSAN Latency during Login VSI Knowledge Worker Workload
Test 3: Linked Clone in R1B Configuration (Baseline)
As shown in Figure 19, the Windows 7 linked-clone pool with AppStack passed the Knowledge Worker workload easily without reaching VSIMax v4.1 at the baseline score of 852.
Figure 19. VSImax on Login VSI Knowledge Worker workload, 1,000 Baseline Desktops
From the average ESXi CPU usage in Figure 20, CPU usage increased steadily because the number of active session increased. Peak average CPU usage was 75 percent.
Figure 20. CPU Usage during Login VSI Knowledge Worker Workload, R1B
Figure 21 illustrates peak average memory consumed was 231,070MB (around 225GB). ESXi memory was 512GB, which was about 43 percent usage. The average kernel memory usage was 32,704MB (around 32GB).
Figure 21. Memory Usage during Login VSI Knowledge Worker Workload, R1B
From vSAN Performance Service as shown in Figure 22, peak write IOPS was 5,724 and peak read IOPS was 2,338. IOPS increased almost linearly because the number of active sessions increased.
Figure 22. vSAN IOPS during Login VSI Knowledge Worker Workload, R1B
As shown in Figure 23, vSAN peak write latency was 6.690ms and peak read latency was 2.384ms. There was a sharp fluctuation in latency when sessions became active, then it increased steadily. This correlates to the behavior in Figure 22.
Figure 23. vSAN Latency during Login VSI Knowledge Worker Workload, R1B
Test 4: Linked Clone in D+R5 Configuration
As shown in Figure 24, with 971 sessions, the maximum VSImax (v4.1) Knowledge Worker was not reached with a Login VSI baseline performance score of 850.
Figure 24. VSImax on Login VSI Knowledge Worker Workload, 1,000 D+R5 Desktops
From the average ESXi CPU usage as shown in Figure 25, CPU usage increased because the number of active sessions increased. Peak CPU usage was high, which was about 86 percent. CPU usage went down when the sessions logged off.
Figure 25. CPU Usage during Login VSI Knowledge Worker Workload, D+R5
Figure 26 illustrates that the memory consumption increased slightly during the test. Peak average memory consumed was 253,324MB (around 247GB). ESXi memory was 512GB, which was about 48 percent usage. The average kernel memory usage was 33,571MB (around 33GB).
Figure 26. Memory Usage during Login VSI Knowledge Worker Workload, D+R5
From vSAN Performance Service as shown in Figure 27, peak write IOPS was 5,876 and peak read IOPS was 1,342.
Figure 27. vSAN IOPS during Login VSI Knowledge Worker Workload, D+R5
As shown in Figure 28, vSAN peak write latency was 6.658ms and peak read latency was 1.712ms during the test.
Figure 28. vSAN Latency during Login VSI Knowledge Worker Workload, D+R5
Test 5: Linked Clone in D+R5+NC Configuration
Figure 29 shows 967 sessions ran successfully. VSImax V4.1 was not reached with the baseline score of 1,169 on 1,000 linked clones in D+R5+NC configuration.
 See Table 9 for the configuration full name.
Figure 29. VSImax on Login VSI Knowledge Worker Workload, 1,000 D+R5+NC Desktops
CPU usage was high and peak average CPU usage was 95 percent as shown in Figure 30.
Figure 30. CPU Usage during Login VSI Knowledge Worker Workload, D+R5+NC
Memory usage increased less than 1 percent during Login VSI tests as shown in Figure 31. Peak average memory consumed was 2829,000MB (around 276GB). ESXi memory was 512GB, which was about 54 percent usage. The average kernel memory usage was 32,700MB (around 32GB).
Figure 31. Memory Usage during Login VSI Knowledge Worker Workload, D+R5+NC
From vSAN Performance Service as shown in Figure 32, IOPS increased steadily because the number of active session increased. Peak write IOPS was 6,880 and peak read IOPS was 4,970.]
Figure 32. vSAN IOPS during Login VSI Knowledge Worker Workload, D+R5+NC
As shown in Figure 33, vSAN latency increased because the number of active sessions increased. Peak write latency was 13.950ms and peak read latency was 8.713ms.
Figure 33. vSAN Latency during Login VSI Knowledge Worker Workload, D+R5+N5
Summary of Instant Clone Login VSI Testing Results
971 sessions ran successfully in both tests. As shown in Figure 34, Login VSI baseline performance score was 943 in RAID 5 configuration. Baseline performance score was 892 in RAID 1 configuration.
The VSImax score was slightly affected in RAID 5 configuration.
Figure 34. Instant Clone Login VSI Results Comparison
Regarding the resource usage shown in Figure 35, it consumed 1 percent more peak average CPU, 1 percent more peak average memory, and 1GB more kernel memory in RAID 5 configuration than those in RAID 1. RAID 5 overhead was very limited.
Figure 35. 1,000 Instant Clone Login VSI Results Resource Usage Comparison
Storage performance was good in both R1 and R5 configurations as shown in Figure 36. There was no storage bottleneck.
Figure 36. 1,000 Instant Clones Login VSI IOPS and Latency Comparison
From instant clone Login VSI test results: although it consumed slightly more resources in RAID 5 configuration, it had minimal impact on the Login VSI score. Overall, vSAN 6.2 performed well in RAID 5 configuration with Login VSI Knowledge Worker workload.
Summary of Linked Clone Login VSI Testing Results
971 sessions ran successfully in both R1B and D+R5 tests. 967 sessions ran successfully in the D+R5+NC configuration with client cache disabled. Figure 37 shows that client cache improved VSIbase from 1,169 to 852, which was over 25 percent improvement on application response time. vSAN 6.2 space efficiency features have little performance impact in terms of VSIMax score.
Figure 37. 1,000 Linked Clone Login VSI Results Comparison
vSAN space efficiency feature consumed more CPU and memory resources. Deduplication and compression with RAID 5 used 11 percent more peak average CPU resources, 5 percent more peak average memory, and 1GB more peak average kernel memory. It was acceptable considering the space savings.
Figure 38. 1,000 Linked Clone Login VSI Resource Usage Comparison
From Figure 39, overall storage performance was good. With client cache, peak write IOPS dropped from 6,880 to 5,876 (about 15 percent decrease) and peak read IOPS dropped from 4,970 to 1,342 (over 70 percent decrease). Peak write latency and read latency reduced substantially. Peak write IOPS increased slightly with the space efficiency features, but peak read IOPS decreased because more data was in cache layer. In summary, deduplication and compression with RAID 5 on linked clone pools had limited storage performance impact during Login VSI tests.
Figure 39. 1,000 Linked Clones Login VSI IOPS and Latency Comparison
From the 1,000 linked clone Login VSI results, IOPS and latency decreased substantially with the client cache feature and application response time dropped by more than 25 percent. In the configuration of deduplication and compression with RAID 5 and checksum, there was less performance impact in terms of VSIMax score comparing to that in the baseline configuration and the resource overhead was relatively small. For linked clone pools, we recommend enabling deduplication and compression with RAID 5 and software checksum.
View Operations Testing
Instant Clone Desktops
Provision 1,600 Desktops
In this test, a new pool of 1,600 instant clone virtual desktops (floating pool) was provisioned on the vSAN datastore, with about 200 desktops per ESXi host. To complete this task:
- Create internal VMs such as the internal template, replica VMs, and parent VMs, which is called the priming phase.
- Use VMware Instant Clone Technology to create desktops, and prepare the operating system with the use of the Clone Prep feature.
We conducted the testing with R1 and R5 configuration respectively.
It took 10 minutes for priming and 27 minutes for 1,600 desktops to become “available” in R1 configuration. It took 8 minutes for priming and 28 minutes for provisioning in R5 configuration.
Figure 40 shows detailed capacity information about R1 configuration. Total used capacity was 10.31TB including 3.95TB physically written space, 6.28 TB VM over-reserved space, and 88.65 GB vSAN system overhead.
Figure 40. Capacity Information for 1,600 Instant Clones, R1
After sparse swap was enabled, the total capacity used decreased to 4.04TB including 3.94TB physically written space and 101.63GB vSAN system overhead. VM over-reserved space turned to zero as shown in Figure 41.
Figure 41. Capacity Information for 1,600 Instant Clones, R1+S
 See Table 9 for the configuration full name.
Figure 42 shows detailed capacity information about R5 configuration. Total capacity used was 9.63 TB including 3.24TB physically written space, 6.28TB VM over-reserved space, and 117.71GB vSAN system overhead.
Figure 42 . Capacity Information for 1,600 Instant Clones, R5
After sparse swap was enabled, the total capacity used was 3.39TB including 3.28TB physically written space and 119.06GB vSAN system overhead.
Figure 43. Capacity Information for 1,600 Instant Clones, R5+S
Figure 44 demonstrates the resource usage during 1,600 instant clone provision. Average CPU usage increased 11 percent and average memory usage increased 6 percent. The kernel memory consumption increased slightly.
Figure 44. 1,600 Instant Clones Provision Resource Usage
We summarized the storage performance of instant clones as shown in Figure 45. Peak write IOPS increased from 8,134 to 9,538 and peak read IOPS increased from 3,731 to 4,798 comparing to the values in RAID 1 configuration. Peak write latency and read latency also increased slightly, but the overall storage performance was good.
Figure 45. 1,600 Instant Clones Provision IOPS and Latency
Push Image 1,600 Desktops
You can change the image of an instant clone desktop pool to push out changes or to revert it to a previous image. You can select any snapshots from any virtual machines to be the new image.
Instant clones do not need to be refreshed, recomposed, or rebalanced. When a user logs out of the desktop, the desktop always deletes the old image and recreates a fresh image from the latest patch. This process creates a staggered approach to patching and eliminating boot storms.
It just took 38 minutes to push a new image to 1,600 instant clone pool in the default R1 configuration, and 39 minutes in R5 configuration.
Figure 46 shows the resource usage during the new image push operation. The average CPU consumption was 58 percent in R5 configuration, which was 19 percent more than that in R1 configuration. The average memory usage was 75 percent, which was 5 percent more than that in R1 configuration. Kernel memory used 0.81GB more than that in R1 configuration.
Figure 46. 1,600 Instant Clones Push Image Resource Usage
As shown in Figure 47, the overall vSAN performance was good. In RAID 5 configuration, peak write IOPS, peak read IOPS, and peak latency were slightly higher.
Figure 47. 1,600 Instant Clones Push Image IOPS and Latency
Summary of Instant Clone Operation Results
Figure 48 illustrates that 0.68TB storage capacity was saved in RAID 5 comparing to that in RAID 1, which was 7 percent space saving. The reason of small space saving percent was that over-reserved space occupied high percent in the used capacity and it was fixed in both RAID 1 and RAID 5 configurations. 6.24TB capacity was saved with sparse swap and RAID 5. A total of 67 percent capacity usage was saved comparing to that in RAID 1 configuration.
Figure 48. 1,600 Instant Clones Capacity Usage
Instant clones dramatically accelerated provisioning for fully featured and customized virtual desktops. With the default setting, it took only 37 minutes in total for 1,600 desktops to become “Available” in RAID 1 configuration, and it took 36 minutes in RAID 5 configuration. For push image, it took 38 minutes in RAID 1 and 39 minutes in RAID 5 configuration.
Figure 49. View Operation Tests Execution Time for 1,600 Instant Clones
More CPU and memory resources were consumed during instant clone operations in the RAID 5 configuration. As expected, IOPS was higher in RAID 5 than that in RAID 1.
IOPS and latency were higher in RAID 5 configuration, but the overall performance was good.
In RAID 5 configuration, it consumed less storage space when the resource usage was just slightly higher and the storage performance was good. We recommend using RAID 5 in instant clone pools. If memory is not overcommitted, enabling sparse swap feature saves capacity substantially.
Linked Clone Desktops
Provision Desktop Pool
A new pool of 1,600 Windows 7 (64-bit) linked-clone virtual desktops was provisioned on the vSAN datastore. To complete this task, View Composer:
- Created a replica copy of the 30GB-base image on the vSAN datastore
- Created and customized the desktops
- Added the replica and desktops to the Active Directory domain
- Took a snapshot of the virtual desktop
Then, the desktop was proceeded to enter the Available state.
We provisioned a dedicated linked-clone pool with 1,600 virtual desktops in R1B configuration as baseline, and then provisioned the pool in D+R5 configuration. We also compared the capacity usage when enabling sparse swap in D+R5+S configuration.
It took 65 minutes to provision the baseline of 1,600 Windows 7 linked-clone virtual desktops in the Available state in the Horizon 7 Administrator console.
The total capacity used was 13.59TB as shown in Figure 50 including 7.22TB physically written space, 6.25TB VM over-reserved space, and 124.07GB vSAN system overhead.
Figure 50. Capacity Information for 1,600 Linked Clones, R1B
It took 81 minutes to provision 1,600 desktops in D+R5 configuration.
As shown in Figure 51, there were 6.71TB used space and 1.23TB deduplication and compression overhead, which was 5 percent of the vSAN datastore capacity. Deduplication and compression ratio was 1.91x. The total capacity used was 7.94TB.
Figure 51. Capacity Information for 1,600 Linked Clones, D+R5
As shown in Figure 52, the total used capacity was 1.81TB including 593.99GB used capacity and 1.23TB deduplication and compression overhead after sparse swap was enabled. Deduplication and compression ratio was 11.60x.
Figure 52. Capacity Information for 1,600 Linked Clones, D+R5+S
Figure 53 shows CPU usage was 31 percent in D+R5 configuration, which was slightly higher (three percent) than that in the baseline configuration. The kernel memory usage was 0.4GB higher in D+R5 configuration than that in R1B configuration.
Figure 53. 1,600 Linked Clones Provision Resource Usage
Figure 54 demonstrates that vSAN performance was fairly good. In D+R5 configuration, write IOPS decreased because deduplication occurred when data was destaged from the cache tier to the capacity tier of an All-Flash vSAN datastore. Peak write latency was higher, which was 10.862ms in D+R5 configuration.
Figure 54. 1,600 Linked Clone Provision IOPS and Latency
Refresh Desktop Pool
A Horizon View refresh operation reverted a pool of linked-clone desktops to their original state. Any changes made to the desktop were discarded because the desktop was provisioned, recomposed, and last refreshed. When a refresh operation was initiated, desktops in the pool were refreshed in a rolling fashion, several at a time.
It took 50 minutes to refresh the baseline of 1,600 Windows 7 linked-clone virtual desktops in the Available state in the View Administrator console. It took 60 minutes to refresh the virtual desktops in D+R5 configuration.
As shown in Figure 55, the average CPU usage was 32 percent in R+D5, which was two percent higher than that in the baseline configuration. The average memory usage was 75 percent and the kernel memory usage was 0.6GB more than that in the baseline configuration.
Figure 55. 1,600 Linked Clones Refresh Resource Usage
Figure 56 shows vSAN performance data during the refresh process. In D+R5 configuration, peak IOPS was lower due to the deduplication and compression. The overall performance was good.
Figure 56. 1,600 Linked Clone Refresh IOPS and Latency
Recompose Desktop Pool
A Horizon View recompose operation changed the linked clone to a new parent-base image. To complete this task, View Composer:
- Created a replica of the new base image on the vSAN datastore
- Created a new OS disk for each virtual desktop
- Deleted the old OS disk
A new desktop was then customized and a new snapshot was created.
It took 108 minutes to recompose 1,600 baseline desktops while it took 122 minutes in D+R5 configuration.
Figure 57 shows the resource usage during the recompose process. The average CPU usage was 32 percent, which was 2 percent higher than that in the baseline configuration. The average memory usage was 79 percent and the average kernel memory usage was about 0.6GB more than that in the baseline configuration.
Figure 57. 1,600 Linked Clones Recompose Resource Usage
Figure 58 describes vSAN performance during the recomposing process. Peak IOPS was lower than that in baseline. In D+R5 configuration, peak latency was higher than that in the baseline configuration, but the average performance was good.
Figure 58. 1,600 Linked Clones Recompose IOPS and Latency
A boot storm was simulated for a pool of 1,600 Horizon Windows 7 (64-bit) linked clones. The desktops were booted all together from vCenter.
The task took less than eight minutes for all 1,600 desktops to boot up and become available in both R1B and D+R5 configurations.
Figure 59. Boot Storm Available Desktops over Time
Summary of Linked Clone Operation Results
After deduplication and compression with RAID 5 was enabled, the capacity usage decreased significantly by 48 percent for 1,600 linked clones in the tests and sparse swap saved space substantially, which saved 87 percent capacity comparing to that in the baseline configuration.
Figure 60. 1,600 Linked Clones Capacity Usage
It took longer time (less than 15 percent) for Horizon operations when you enable space efficiency features. It took 81 minutes to provision 1,600 linked clones in D+R5 configuration, which was 16 minutes more than that in the baseline configuration. The refresh action took 60 minutes, which was 10 minutes more than that in the baseline configuration. Recomposing took 14 minutes more and boot storm operation took similar time.
Figure 61. View Operation Execution Time for 1,600 Linked Clones
In D+R5 configuration, it consumed slightly more CPU and memory resources comparing to R1B configuration, but the total resource usage was acceptable.
Latency was higher in D+R5 configuration, but the overall performance was good.
In brief, when deduplication and compression with RAID 5 and checksum were enabled, it took longer operation time and consumed slightly more (less than three percent) resources on linked clone pool; however, the large space saving outweighs the small performance impact.
Resiliency Testing—One Node Failure
A single vSAN node hardware failure was simulated for a vSAN Cluster with 8 hosts and 1,600 running linked-clone virtual desktops, all under simulated workload for virtual desktops with FTT=1.
We tested linked clones in R1B and D+R5 configurations.
An ESXi host with 200 running virtual desktops was rebooted, and all the virtual machines became unavailable in the View Manager and thus inaccessible to the users. VMware vSphere High Availability restarted the virtual desktops on the other vSAN Cluster nodes. The 200 desktops were restarted and all desktops were ready for user login.
The server was up and rejoined to the vSAN Cluster in about 10 minutes. VMware vSphere Storage DRS™ rebalanced the load across all ESXi hosts in the cluster.
Note: A single-node failure does not trigger an immediate rebuild after a host failure is detected. If a failure that returns an I/O error is detected, such as a magnetic disk or SSD, vSAN immediately responds by rebuilding the disk object. However, for host failures that do not return an I/O error, vSAN has a configurable repair delay time (60 minutes by default) and components are rebuilt across the cluster after the delay time. vSAN prioritizes the current workload by rebuilding to minimize the impact on cluster performance.
In both configurations, vSphere HA and DRS behaved as expected, it took 20 minutes for the whole failover and rebalance.
Figure 62. Failover and Rebalance over Time for One-Node Failure
- vSphere HA behaved as expected to restart desktops on the remaining hosts.
- DRS behaved as expected to distribute the desktops relatively evenly among the remaining hosts.
App Volumes Backup
Validate that VMware App Volumes AppStacks and Writable Volumes can be backed up and recovered using App Volumes Backup Utility.
Note: App Volumes Backup Utility is a VMware Fling. The utility is NOT supported by VMware.
Verification Procedures and Result
Follow the instructions to run the backup utility:
Create a backup VM by entering App Volumes Manager URL and virtual Center URL.
Click "Attach Selected AppStacks to Backup VM". The underlying VMDK files are attached for each of the AppStacks and Writable Volumes to the backup VM.
Figure 63. App Volumes Backup Utility Usage
At this time, you can use a backup solution, which is VMDK aware, to back up the AppStacks and Writable Volumes. In our test, we used vSphere Data Protection 6.1.2.
- Follow the vSphere Data Protection Administration Guide to deploy vSphere Data Protection appliance and configure it. Then use the web client plugin to create a backup job.
- Select the VM created by the backup utility and you can only select the AppStack vmdk to back up as shown in Figure 64.
Figure 63. vSphere Data Protection vCenter Integrated UI Backup Job Submission
We validated that App Volumes 2.11 can be backed up with App Volumes Backup Utility and vSphere Data Protection.
This section provides the best practices to be followed, based on the solution validation.
We provided the following best practices based on our solution validation:
- View operation parameters
- AppStack storage policy
- vSAN sizing
View Operation Parameters
The running time of linked clone and instant clone operations might improve when the max concurrent operations change. If the backend storage performance is good, we can increase the value to a larger number.
For instant clones, the default value of Max Concurrent Instant Clone Engine Provisioning Operations (refer to Table 11) is 20, the provision time is good. We can increase the value to a larger number if the storage latency does not cause contention during provision. Otherwise, a larger number will not get a quicker provision.
For linked clones, we changed the default value of Max Concurrent View Composer Provisioning Operations, Max Concurrent View Composer Maintenance Operations, and Max Concurrent vCenter Provisioning Operations to larger number as described in Table 11, since we have eight hosts in the desktop cluster and All-Flash vSAN performance is good.
Note: Linked clone provision process includes not only clones, but also other operations such as domain login and customization, thus it has other limitation for the provision speed. Therefore, the execution might not improve when the concurrency number reaches a certain value.
AppStack Storage Policy
When we place AppStack on vSAN datastore, make sure FTT is not less than the FTT value of the desktop policy because AppStack is shared by users and desktops. Do not make it the availability bottleneck.
Acceptable performance of a virtual desktop is the ability to complete any desktop operation in a reasonable amount of time from a user’s perspective. This means the backend storage that supports the virtual desktop must be able to deliver the data (read or write operation) quickly. Therefore, sizing storage configuration should meet the IOPS requirements in a reasonable response time. With various space efficiency features, the required capacity differs. Refer to vSAN TCO and Sizing Calculator for the virtual desktop sizing on vSAN.
This section provides recommendations based on our test findings.
Based on the solution testing, for linked clone pools, the optimal balance of performance and cost-per-desktop is D+R5 configuration that enables deduplication and compression, RAID 5, and checksum as shown in Table 9. For instant clones, it is recommended to use R5 configuration as shown in Table 9. Moreover, enable sparse swap only in environments where physical memory is not overcommitted. If physical memory is overcommitted, keep the default setting of thick-provisioned swap files.
This section summarizes on All-Flash vSAN capabilities on a Horizon 7 based virtual desktop environment.
VMware vSAN is a low-cost and high-performance storage platform for a virtual desktop infrastructure that is rapidly deployed and easy to manage. Moreover, it is fully integrated into the industry-leading VMware vSphere Cloud Suite. Using SSDs in All-Flash vSAN with space efficiency features offers enterprise performance while reducing capacity cost substantially and other Operating Expense (OPEX) costs such as maintenance by IT as well as power consumption and cooling costs.
Extensive workload, operations, and the resiliency testing show that Horizon 7 with App Volumes 2.11 on All-Flash vSAN delivers exceptional performance, a consistent end-user experience, and a resilient architecture, all with a relatively low price.
All-Flash vSAN provides an easily scalable Horizon 7 based virtual desktop environment together with App Volumes 2.11, which provides superior performance and manageability.
This section lists the relevant references used for this operation guide.
For additional information, see the following white papers:
For additional information, see the following product documents:
- Documentation for VMware Horizon 7 version 7.0
- VMware APP Volumes 2.10
- vSphere Data Protection Administration Guide
For additional information, see the following documents:
About the Author
This section provides a brief background on the author and contributors of this operation guide.
- Sophie Yin, solution architect in the Storage and Availability, Product Enablement team wrote the original version of this paper.
- Catherine Xu, technical writer in the Product Enablement team, edited this paper to ensure that the contents conform to the VMware writing style.