]

Category

  • Reference Architecture

Product

  • vSAN

SQL Server 2016 Database Performance and Improvements on vSAN 6.7™

Executive Summary

This section covers the executive summary of this guide.

Business Case

As most CIOs are seeking for an appropriate Hyperconverged Infrastructure (HCI) solution to deploy their business-critical applications like Microsoft SQL Server, performance of OLTP workload for daily transactions, performance impact of changing environments like resynchronization, and the database operation such as batch/bulk insertion are most concerned. The former two things are related with the daily workload performance in the changing environment, and the latter is related with data loading and extraction, which is concerned by DBAs for further data analysis and batch processing purposes.

While VMware vSAN™ has been widely adopted as an HCI, it is more important to demonstrate the continued level of performances improvements across more recent versions of vSAN. Thus, we developed this solution to demonstrate the performance of SQL Server running on vSAN 6.7, including the OLTP workload performance, the performance with resynchronization in the backend and the DB insertion operation performance. We verify that running SQL Server 2016 on Windows Server 2016 Enterprise version using all-flash vSAN as the storage platform for application workloads. This solution also identifies the potential differences in behavior and propose recommendations to ensure a smooth transition from external storage to a hyperconverged and scale-out architecture like vSAN.

Solution Overview

This technical white paper is a showcase of using vSAN all-flash as an HCI for operating and managing SQL Server in a VMware vSphere® environment:

  • The SQL Server databases on a vSAN all-flash cluster
  • The vSAN 6.7 adaptive resynchronization feature with SQL Server database running workloads
  • DB batch insertion on a vSAN all-flash cluster

Key Highlights

This technical white paper demonstrates:

  • OLTP performance scale-up on a 4-node cluster, with NVMe enabled as the cache tier.
  • Consolidated performance of different-sized databases with high-throughput and low-disk latency.
  • Adaptive resynchronization queue adjustment feature of vSAN 6.7 fairly balances the VM I/O with running workload and the I/O for the backend resynchronization. 
  • DB batch insertion performance and provides tuning option to improve the performance.

Audience

This solution is intended for SQL Server database administrators, virtualization and storage architects involved in planning, architecting, and administering a virtualized instance SQL Server 2016 on VMware vSAN 6.7.

 

Technology Overview

This section provides an overview of the technologies used in this solution:

  • VMware vSphere 6.7
  • VMware vSAN 6.7
  • Microsoft SQL Server 2016
  • Samsung NVMe SSD
  • HCIBench

VMware vSphere 6.7

VMware vSphere 6.7 is the next-generation infrastructure for next-generation applications. It provides a powerful, flexible, and secure foundation for the business agility that accelerates the digital transformation to cloud computing and promotes the success in the digital economy.

vSphere 6.7 supports both existing and next-generation applications through its:

  • Simplified customer experience for automation and management at scale
  • Comprehensive built-in security for protecting data, infrastructure, and access
  • Universal application platform for running any application anywhere

With vSphere 6.7, customers can run, manage, connect, and secure their applications in a common operating environment, across clouds and devices.

VMware vSAN 6.7

VMware vSAN, the market leader HCI, enables low-cost and high-performance next-generation HCI solutions, converges traditional IT infrastructure silos onto industry-standard servers and virtualizes physical infrastructure to help customers easily evolve their infrastructure without risk, improve TCO over traditional resource silos, and scale to tomorrow with support for new hardware, applications, and cloud strategies. The natively integrated VMware infrastructure combines radically simple VMware vSAN storage, the market-leading VMware vSphere Hypervisor, and the VMware vCenter Server® unified management solution all on the broadest and deepest set of HCI deployment options.

vSAN 6.7 introduces further performance and space efficiencies. Adaptive resync ensures the fair-share of resources are available for VM I/O and resynchronization I/O during dynamic changes in load on the system providing optimal use of resources. Optimization of the destaging mechanism has resulted in data that drains more quickly from the write buffer to the capacity tier. The swap object for each VM is now thinned provisioned by default, and also matches the storage policy attributes assigned to the VM introducing the potential for significant space efficiency.

Microsoft SQL Server 2016

SQL Server consistently leads in performance benchmarks, such as TPC-E and TPC-H, and in real-world application performance. Gartner recently rated SQL Server as having the most complete vision of any operational database management system. With SQL Server 2016, performance is enhanced with a few new technologies, including new features and enhancements regarding enterprise-grade performance, security, availability, and scalability.

Samsung NVMe SSD

Samsung is well equipped to offer enterprise environments superb solid-state drives (SSDs) that deliver exceptional performance in multi-thread applications, such as compute and virtualization, relational databases and storage. These high-performing SSDs also deliver outstanding reliability for continual operation regardless of unanticipated power loss. Using their proven expertise and wealth of experience in cutting-edge SSD technology, Samsung memory solutions helps data centers operate continually at the highest performance levels. Samsung has the added advantage of being the sole manufacturer of all its SSD components, ensuring end-to-end integration, quality assurance, and the utmost compatibility.

Samsung PM1725a SSD delivers:

  • Extreme performance: The highest levels with unsurpassed random read speeds and an ultra-low latency rate using Samsung’s highly innovative 3D vertical-NAND (V-NAND) flash memory and an optimized controller.
  • Outstanding reliability: Features five DWPDs (drive writes per day) for five years, which translates to writing a total of 32 TB per day during that time. This means users can write 6,400 files of 5 GB-equivalent data or video every day, which represents a level of reliability that is more than sufficient for enterprise storage systems that have to perform ultrafast transmission of large amounts of data.
  • High capacities: Depending on your storage requirements and applications, 800 GB, 1.6 TB, 3.2 TB and 6.4 TB capacities are available.

This solution chooses the 1.6 TB Samsung PM1725 SSD as the cache tier for the vSAN cluster.

 See Samsung PM1725a NVMe SSD for more information.

HCIBench

HCIBench is a benchmark tool of HCI solution that automates the use of the popular VDbench testing tool in larger clusters. Users simply specify the testing parameters they plan to run, and HCIBench instructs VDbench what to do on each node in the cluster.

HCIBench aims to simplify and accelerate customer Proof of Concept (POC) performance testing in a consistent and controlled way. This tool fully automates the end-to-end process of deploying test VMs, coordinating workload runs, aggregating test results, and collecting necessary data for troubleshooting purposes.

This solution uses HCIBench to verify the storage performance capacity of the vSAN cluster with NVMe enabled as the cache tier and using SAS SSD as the capacity tier.

Solution Configuration

This section describes the solution architecture, hardware and software resources, network configuration, test tools, and the SQL Server and virtual machine configuration used in this solution.

Architecture Diagram

As shown in Figure 1, the architecture shows four SQL Server standalone instances are hosted on four servers with vSAN 6.7 all-flash running on vSphere 6.7. The management components are hosted in the same environment as the production applications.

Figure 1. Solution Architecture

Hardware Resources

Each vSAN ESXi Server in the vSAN Cluster has the configuration as shown in Server Configuration

Property

SPECIFICATION

Server

Dell PowerEdge R630

CPU

2 sockets, 24 cores each of 2.6GHz

RAM

256 GB DDR4 RDIMM

Network adapter

2 x Intel 10 Gigabit X540-AT2, + I350 1Gb Ethernet

Storage adapter

2 x 12 Gbps SAS PCI-Express (Dell PowerEdge RAID H730 mini)

Disks

NVMe: Samsung  2 x 1.6 TB drives as cache SSD

SSD: 6 x 400 GB drives as capacity SSD

(ATA Intel SSD SC2BX40)

Server

Dell PowerEdge R630

Software Resources

Table 2 shows the software resources used in this solution.

Table 2.  Software Resources

Software

Version

Purpose

VMware vCenter Server® and VMware ESXi™

6.7

ESXi Cluster to host virtual machines and provide the vSAN Cluster. VMware vCenter Server provides a centralized platform for managing VMware vSphere environments

VMware vSAN

6.7

Software-defined storage solution for hyper-converged infrastructure

Microsoft SQL Server

2016 Enterprise Edition, SP1

Database software

Microsoft Windows Server

2016, 64, Standard Edition

Operating System for the VMS:

  • SQL Server database virtual machines
  • Load generation virtual machines
  • Domain controller

DVD Store

Version 3

OLTP database and workload generator

HCIBench

1.6.7.1

HCI performance benchmark tool

Network Configuration

We created a VMware vSphere Distributed Switch™ to act as a single virtual switch across all associated hosts in the cluster.

The vSphere Distributed Switch used two 10GbE adapters for the teaming and failover. A port group defines properties regarding security, traffic shaping, and NIC teaming. To isolate traffic, we used two port groups, one for vSAN, and the other for VM and VMware vSphere vMotion® traffic. We set the default value for both of the port groups.

SQL Server Virtual Machine Configuration

This section introduced the configuration of the VM, including DB size used for the multiple OLTP databases, VM CPU and memory setting for each VM, and OS and SQL Server instance settings, disk layout, and configuration. We highly recommend you following the Best Practice of SQL Server on vSphere before reading this section to achieve the optimal performance of SQL Server on vSphere and vSAN cluster.

Database Space Usage

We configured two sized DB servers for the performance tests. According to the definition of DVD Store 3, a 100 GB sized DB contains 200 million customers, an order history of 10 million DVDs per month, and a catalog of 1 million DVDs. And a 200 GB sized DB contains 400 million customers, an order history of 20 million DVDs per month, and a catalog of 2 million DVDs. The 200 GB database consumed approximately 192 GB of disk space. The 200 GB database consumed approximately 306 GB of disk space.

VM Disk Layout

For the OLTP workload, the database size is based on the real disk space requirement. For the real-case production, additional space should be reserved according to DB growth rate.

  • The virtual disk configuration for the 100 GB SQL Server database is:

100 GB OS, 1 x 250 GB data file disks, 1 x 60 GB log file disk, and 1 x 40 GB tempdb data disk.

  • The virtual disk configuration for the 200 GB SQL Server database is:

100 GB OS, 2 x 250 GB data file disks, 1 x 100 GB log file disk, and 1 x 80 GB tempdb data disk.

We used the default vSAN SPBM policy for the virtual disks of VMs of SQL Server with the optimal performance and availability considerations.

VM Compute Resource Configuration

We assigned 12/24/48 vCPUs and 64 GB memory to the VM hosting 100 GB DVD Store 3 databases. Setting different vCPUs is to evaluate the scale-up performance on one host. We set the maximum server memory to 48 GB for the two SQL Server instances of 100 GB DB. And for the VM hosting 200 GB DVD Store 3 databases, we assigned 48 vCPUs and 128 GB memory, and set the maximum server memory to 96 GB. Table 3 lists the SQL Server 2016 VM configurations.

Table 3. SQL Server 2016 VM Configurations

VM Role

vCPU

memory (GB)

VM1 and VM2 for the 100 GB DBs

12/24/48

64

VM3 and VM4 for the 200 GB DBs

48

128

OS and SQL Server Instance Settings

Windows Server 2016 was used as the guest operating system (OS) for the SQL Server 2016 virtual machines. Each SQL Server virtual machine in the cluster had VMware Tools installed, and each virtual hard disk for data and log was connected to separate VMware Paravirtual SCSI (PVSCSI) adapter to ensure the optimal throughput and lower CPU utilization[1]. The virtual network adapters used the VMXNET 3 adapter, which was designed for the optimal performance.[2]

The settings that can benefit from the DVD Store workload are summarized as follows:

  • Operating system settings:
    • Set the power plan to high performance in the guest OS.
    • Enable SQL Server to use large pages by enabling the Lock pages in the memory user right assignment for the account that runs the SQL Server in Group Policy.
  • SQL Server startup parameters:
    • -x: Disable SQL Server performance monitor counters.
    • -T661: Disable the ghost record removal process.

Solution Validation

This section describes the solution validation: - Test tools and configurations - HCIBench performance - SQL Server OLTP performance - Adaptive resync - DB batch insertion performance and improvement

Test Tools and Configurations

We used the following test tools for solution validation.

DVD Store Version 3 

DVD Store 3 (DS3) is an open-source benchmark tool that simulates an online store that sells DVDs. Customers can log in, browse DVDs, browse reviews of DVDs, create reviews, rate reviews, become premium members, and purchase DVDs. Everything needed to create, load, and stress this online store is included in the DVD Store project. 

Performance Metrics Data Collection Tools

We measured three critical metrics in the performance tests:

  • The OLTP workload throughput measured in orders per minutes (OPM)
  • The average latency of each I/O operation on the guest OS

We also used the vSAN Performance Service and the esxtop utility on the ESXi host to monitor the vSAN performance.

  • vSAN Performance Service: it is used to monitor the performance of the vSAN environment, using the web client. The performance service collects and analyzes performance statistics and displays the data in a graphical format. You can use the performance charts to manage your workload and determine the cause of problems.
  • esxtop utility: esxtop is a command-line utility that provides a detailed view of the ESXi resource usage. Refer to the VMware Knowledge Base Article 1008205 for more information.

HCIBench Performance

To verify the performance of the vSAN with NVMe as the cache tier, we tested HCIBench by using the SQL Server preferred I/O pattern listed in Table 4. Also, we designed three configurations for emulating the SQL Server workload I/O patterns.

As shown in Table 4, we have three configurations to test the vSAN all-flash with NVMe as the cache tier. Each configuration has eight virtual machines, with each having four 5 GB VMDKs. 5GB VMDK for each virtual disk means all writes can be done in write buffer of vSAN and avoid de-staging. The thread number is 8 for all tests. Configuration 1 and 2 emulated the random I/O pattern of the user data file, and Configuration 3 emulated the sequential I/O pattern.

Table 4. Three Configurations for HCIBench Performance Validation

Configuration

Read Rate

I/O Access Pattern

VM Number

VMDK per VM

Threads per VMDK

Disk Size (GB)

I/O Size

Configuration 1

70%

random

8

4

8

5

8KB

Configuration 2

90%

random

8

4

8

5

8KB

Configuration 3

90%

sequential

8

4

8

5

64KB

Table 5 shows the result of the validation of HCIBench:

As shown in Figure 2 and Figure 3, vSAN all-flash with NVMe as the cache tier can support 206k IOPS for 70% read and 8K random I/O pattern and can support 250k IOPS for 90% random read I/O pattern with less than 1.6 milliseconds (ms) latency for both read and write. And it can support 58k IOPS for 90% read and 64K sequential I/O pattern with less than 4.3ms read latency and 6.5ms write latency. In sum, the performance of the vSAN all-flash with NVMe as the cache tier can provide good IOPS with low disk latency.

Table 5.           HCIBench Performance Result

Configuration

IOPS

Bandwidth (MB)

Latency (ms)

Read Latency (ms)

Write Latency (ms)

Configuration 1

206,778

1,615

1.25

1.25

1.24

Configuration 2

250,207

1,954

1.04

0.99

1.52

Configuration 3

58,619

3,663

4.48

4.25

6.48


 

Figure 2.                        HCIBench IOPS and Bandwidth

 

Figure 3. HCIBench Latency in Milliseconds

This validation demonstrated that vSAN all-flash with NVMe as the cache tier can be a good start point for us to run SQL Server OLTP workload.

SQL Server OLTP Performance

Overview

The purpose of this test was to verify that the vSAN can support scale-up performance and consolidated workloads for multiple databases with OLTP workloads.

Test Scenarios

This test measured the impact of stressing an SQL Server 2016 on a Windows Server 2016 guest VM using DVD Store 3 to generate the OLTP workloads. The number of driver threads, which generate OLTP activity, were adjusted as needed to saturate the host. Repeatability was ensured by restoring a backup of the database before each run. Steady state was achieved by ensuring gradually increased throughput during the 10-minute warm-up period, and reached and maintained steady state during the 20-minute run time for each test. 

We measured the key performance indicators include aggregate DVD OPM, and the total IOPS, bandwidth, and latency on the vSAN level were also measured.  The performance tests include the scale-up performance, that is, we ran DVD Store 3 OLTP workloads on the four virtual machines on four ESXi hosts simultaneously, with different vCPU numbers (12, 24 and 48), to measure the performance on different configurations. The consolidate test was running the OLTP workloads on the six virtual machines simultaneously to achieve the test performance as a whole.

Scale-Up Performance

We verified the performance when enabling the hyper-threading on the hosts. Figure 4 shows the OPM on 12, 24, and 48 vCPUs with the same memory configuration (64 GB) and same maximum memory setting for SQL Server instance (48 GB). Figure 4 illustrates the different vCPU configurations for the scale-up performance test.

Figure 4. Scale-up Performance Validation Architecture and vCPU Configuration

Our performance results showed that the database performance scaled up when increasing the vCPU number on each virtual machine, that is 12 vCPUs achieved around 85k(85,984) aggregated OPMs, 24 vCPUs got 157k OPMs and 48 vCPUs got 181k OPMs. The increment of the OPM was near linear as shown in Figure 5.

 

Figure 5.        Scale-up Performance Result on OPM

While the OPM increased, we monitored that the average physical CPU core utilization increased from 46 percent on 12 vCPUs to 93.5 percent on 48 vCPUs, which almost saturated the physical CPU usage, as shown in Figure 6. This demonstrated that this OLTP workload was CPU intensive, and better CPU with more vCPUs could achieve better performance.

Figure 6.  Average Physical CPU Core Utilization (%) in Scale-up Performance Validation

We also compared the performance with hyper-threading disabled and enabled. As shown in Figure 7, the results indicated that enabling hyper-threading could achieve better performance at OPM, with the same 24 vCPUs configured. Figure 5 shows that when we enabled the hyper-threading, the OPM could be at 157k, and it downgraded to around 133k when it was disabled. Considering this is the optimal result after trying different thread number of the test tool, that means disabling hyper-threading cannot benefit the performance, while enabling it can better use the CPU resources.

Figure 7. OPM and Physical CPU Core Utilization (%) with and without Hyper-Threading Enabled

OPM and Physical CPU Core Utilization (%) with and without Hyper-Threading Consolidated Performance

We ran six workloads against six SQL Server DB simultaneously to measure the performance consolidation of different-sized DBs on the same vSAN datastore, emulating the real-world workloads for various DBs. To keep the same OPM for larger database is not easy, which needs excellent storage performance and well-designed configuration SQL Server instance. The following results show the performance of the six DBs from application, vSAN, and esxtop perspectives:

  • Application Performance: The overall OPM was 172,816 in the DVD Store 3 output report and the average OPM on each VM was around 39,100 as shown in Table 6. Running more DB workloads, especially with medium-sized DB, can impact the overall OPM. However, this performance is CPU bound. Better CPU can benefit the performance.

Table 6. OPM of Each VM and Total OPM

VM1

VM2

VM3

VM4

VM5

VM6

Total OPM

23,632

23,457

23,908

24,095

38,600

39,124

172,816

 

  • vSAN performance: When the OLTP workloads were running, the overall IOPS on vSAN was around 12K, with around 7K write IOPS and 5K read IOPS. The average write latency was less than 0.5 ms and the read latency was less than 0.4 ms.
  • esxtop CPU utilization: The average physical CPU core utilization of the four ESXi hosts was more than 92% for the 100 GB DB, and was more than 98% for the 200 GB DB. The physical core CPU utilization was almost saturated on each host as shown in Figure 8.

 

Figure 8.  Average Physical CPU Core Utilization

  • Virtual machine performance: The average data disk latency of all VMs was equal or less than 1 ms for read, equal or less than 2 ms for write. For the log write, all virtual disk latency was equal or less than 4 ms. The average CPU utilization was around 85 percent, and 76 percent of the 100 GB and 200 GB VM respectively. 

Test Result

In conclusion, the consolidated performance was comparable with the four DBs on four VMs on four hosts separately (172k OPM vs 181k OPM, with around 4.7 percent downgrade on the OPM).

From the test result you can find that the overall performance was ultimately CPU bound instead of vSAN as the storage subsystem. Specifically, the physical CPU core utilization of the large DB can be even higher (98%) comparing with the smaller DB on one host, and the OPM decreased accordingly. That means the larger DB needs more powerful CPU to achieve the same performance of the smaller DB.

Recommendations

We recommend you enable the hyper-threading function to achieve better performance at OPM.

Generally, you might run a few SQL Servers in a vSphere virtualized environment:

  • For the single VM on one host, allocate more CPU resources to the VM to achieve better performance.
  • For multiple VMs on one host, allocatie reasonable CPU resources to each VM to achieve better performance consolidations.

Adaptive Resync

Overview

vSAN 6.7 introduces a brand new Adaptive Resync feature to ensure the fair-share of resources are available for VM I/Os and Resync I/Os during the dynamic changes loading on the system. When I/O activity exceeds the capabilities of the bandwidth provided, the Adaptive Resync feature guarantees a level of bandwidth to ensure one type of traffic is not starved for resources. Yet, Adaptive Resync is intelligent enough to allow for the maximum bandwidth to be used during periods in which VM I/O and resync I/O are not contending for resources. This provides an optimal use of resources. 

Test Scenario

In this validation, we enabled the resynchronization in the backend in the wake of adding two flash disks to two disk groups with one disk to one disk group. The resynchronization was triggered automatically because the vSAN datastore usage was more than 80 percent and adding more disks to the capacity tier would enable the space rebalance automatically. We started the 4 x 100 GB DB workloads simultaneously after the resynchronization started, making sure the contention between the VM I/O and resynchronization happened on that host and then monitored the OLTP workload performance during the contention, and the resynchronization I/O on the system.

Test Result

The design of Adaptive Resync feature in vSAN 6.7 is that the I/O contention can be adjusted by adaptive resync mechanism of vSAN 6.7 automatically. That is no resync traffic exists, VM I/O may consume 100% of bandwidth, and under contention, resync I/O is guaranteed at least 20% of the bandwidth. 

We proved the process through triggering the resynchronization automatically by adding more capacity to the system when the capacity reached the threshold (80%) with running DVD Store 3 workloads on the 4 x 100 GB databases simultaneously and measured the OPM influence shown in Figure 9. We find that the peak resynchronization IOPS was suppressed from 5k to 1.5k when VM I/O kicked in while VM I/O increased dramatically during the SQL Server buffer pool filling and was stabilized after that period. During the entire period when the VM workload was running, the resynchronization IOPS was suppressed at around the level of 1.5k. And the OPM downgrade of DVD Store 3 was around 2.6 percent comparing with no resynchronization activities in the backend.  

 

Figure 9.  OPM Comparison with and without Resync

There are four stages monitored from the vSAN performance service using the vSAN 6.7 HTML GUI:

  1. Stage1: There was no VM workload, and the resynchronization IO consumed all the bandwidth.
  2. Stage2: VM workloads kicked in and contention between resynchronization and VM IOs. vSAN queue adjustment limited the I/O of resynchronization and left 80 percent I/Os to VM workload. The SQL Server was filling the buffer pool from the disk therefore the VM read IOPS dominated the total IOPS.
  3. Stage3: SQL Server filled in the buffer pool and the write IOPS dominated the total IOPS; the test was in the stable stage.
  4. Stage4: VM stopped the workload and the resynchronization recovered the bandwidth until the resynchronization finished.

Figure 10 demonstrated the whole process of the validation.

Figure 10. Adaptive Resync on vSAN 6.7

Recommendations

Adaptive resync discovers the sustainable bandwidth of the disk group as a whole. Different number of the disk groups may have different results.

Note that Adaptive Resync in vSAN 6.7 will require an on disk format upgrade to version 6. This is the only feature in vSAN 6.7 that requires an on disk format upgrade. Other than this requirement, the fair-share of resources is completely handled in vSAN backend without manual interferences.

DB Batch Insertion Performance and Improvement Option

Overview

The database batch insertion operation or bulk copy operations are normal to extract data from the plain text or other sources into SQL Server database. Basically, the batch insertion operation to SQL Server is a pure write operation. Thus the write performance of the I/O subsystem is the key factor to the test result. If checking the running process of SQL Server when running the insertion-intensive operations, you can find the wait type is WRITELOG of the user database.

We verified the performance of the batch insertion on the 4-node vSAN all-flash with the two different SPBM policies, and provided an additional method to tune the batch insertion performance from the SQL Server configuration and transaction perspectives. We used the script in the Appendix to create a test table, and to run the batch insertion operation to insert 1 million records to the table. The space usage of the data file was around 136MB for data after batch insertion. We created a database with one data and 1 log virtual disk from vSAN. And the allocated space was 75 GB and 25 GB under different paravirtual SCSI controllers.

vSAN SPBM Policy

We created two SPBM policies to test the DB insertion operation performance, using the vSAN native methods.

  • The first is to use the default vSAN SPBM policy for the data and log virtual disks, as the baseline.
  • The other is to set the FTT=0 to change the double writes to a single write on vSAN to improve the write operation, with the reduced protection level, compared to the default protection level:

These two scenarios are to verify the insertion performance using the default vSAN setting in most situations that provide high protection to the database. And FTT=0 is an experimental validation of the policy from vSAN to demonstrate the result.

Explicit Transaction

This scenario changes the transaction behavior of insertion from the implicit transaction per insertion to the explicit transaction for the whole insertion batch. We used the default vSAN policy for the disks of the test DB.

By default, SQL Server runs each statement in an atomic transaction implicitly, which means every insertion in a batch operation is a transaction.

Using the explicit transaction to enclose the insertion/bulk copy operations can reduce the log flush operation. The following example shows how to enclose the insertion with BEGIN TRANSACTION and COMMIT TRANSACTION.
 

BEGIN TRAN

     {BATCH INSERT/BULKCOPY DATA INTO SQL Server DB}

COMMIT TRAN

 The explicit transaction provides the capability of automatic rollback for the whole transaction, which means the entire batch insertion is taken as one transaction, with succeeding the batch insertion or failure and the rollback of the entire batch insertion.

Test Result

The test results showed that the 1,000,000 DB insertion operations could finish in 11 minutes and 20 seconds in the baseline test scenario. While reducing the protection from FTT=1 to FTT=0, the insertion could finish in 10 minutes and 30 seconds. Looking into the virtual disks of data and log, we found the average disk latency (write only) was around 1 ms, except for few spikes of the write by DB checkpoint. This indicated that the disk performance was good in both scenarios.

Adjusting the setting from the DB side can dramatically reduce the insertion duration; that is, when using the explicit transaction,  the insertion durations decreased to 6 seconds.

Figure 11. DB Batch Insertion Performance in Duration (Minutes:Seconds)

Recommendations

For DB insertions, you can expect the excellent performance with NVMe as the cache tier on an all-flash vSAN. You may take FTT=0 as a reasonable and legitimate option from the vSAN perspective for better performance. However,  the performance benefit was only around 7.3 percent compared to the performance of FTT=0 (10 minutes and 30 seconds) and the default vSAN policy (11 minutes and 20 seconds). Considering the performance difference was not significant and you will lose the redundancy of the data protection, there is no compelling reason to run this configuration.

For DBA, you may want to change the setting on the T-SQL to achieve much better batch insertion performance. And the settings can be performed from the application level without the incurring reconfiguration or resynchronization of the data in the vSAN backend like the SBPM policy change.

Conclusion

This solution architected and demonstrated running SQL Server on vSAN 6.7 all-flash with NVMe as the cache tier:

  • Through HCIBench, we demonstrated the HCI environment can provide high IOPS with low disk latency, giving the credit to the Samsung NVMe SSD.
  • Through the SQL Server performance validations, we demonstrated the high throughput of the OPM and low latency from the virtual disk and from the vSAN backend perspectives, with multiple of DVD Store 3 OLTP workloads consolidated on a 4-node vSAN cluster. 
  • We demonstrated that the OLTP (DVD Store 3) performance was CPU bound. We showcased that enabling hyper-threading, giving more vCPUs to VM could achieve an optimal OLTP performance, that means, better CPU can benefit the performance.  
  • The Adaptive resync feature in vSAN 6.7 manages the different types of traffic in different queues to control the classes of I/O in various ways, which can ensure data is compliant with the assigned storage protection and performance policies. 
  • We validated the DB batch insertion on vSAN 6.7 all-flash with NVMe as the cache tier. We recommend using FTT=1 over FTT=0 for the redundancy consideration. We also verified the additional option to dramatically improve the batch insertion performance.

 

References

[1] VMware, Inc. (2014, June) Configuring disks to use VMware Paravirtual SCSI (PVSCSI) adapters (1010398). http://kb.vmware.com/kb/1010398

[2] VMware, Inc. (2014, July) Choosing a network adapter for your virtual machine (1001805). http://kb.vmware.com/kb/1001805

[3] Performance characterization of Microsoft SQL Server on VMware vSphere 6.5

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/sql-server-vsphere65-perf.pdf

[4.] Using SQL Server Failover Clustering on a vSphere 6.7 vSAN Datastore with vSAN iSCSI Target Service: Guidelines for supported configurations (54461)

https://kb.vmware.com/s/article/54461

[5] SQL Server FCI and File Server on VMware vSAN 6.7 using iSCSI Service

https://core.vmware.com/resource/sql-server-fci-and-file-server-vmware-vsan-67-using-iscsi-service

[6] Architecting Microsoft SQL Server on VMware vSphere

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/sql-server-on-vmware-best-practices-guide.pdf

 

 

 

 

 

Appendix - Batch Insertion Script

The following scripts are for the database batch insertion test purpose which contains the test table creation, batch insertion operations, and select the count from the table.

--DELETE EXISTS TEST TABLE
IF EXISTS (SELECT name FROM sys.tables WHERE name = 'test')
DROP TABLE test
GO
--CREATE TEST TABLE
CREATE TABLE [dbo].test(
       [rec_num] [int] NOT NULL,
       [date] [char](8) NULL,
       [time] [char](8) NULL,
       [reff] [char](10) NULL,
       [acc] [char](10) NULL,
       [stock] [char](6) NULL,
       [bs] [char](1) NULL,
       [price] [char](8) NULL,
       [qty] [char](8) NULL,
       [status] [char](1) NULL,
       [owflag] [char](3) NULL,
       [ordrec] [char](8) NULL,
       [firmid] [char](5) NULL,
       [checkord] [binary](16) NULL,
       [branchid] [char](5) NULL,
       sj datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
       CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
       (
       [rec_num] ASC
))
GO
--PRINT OUT THE TIME STAMP BEFORE THE TEST
PRINT convert(varchar(24),GETDATE(),121)
--INSERT ROW DATA IN LOOP
--begin tran
DECLARE @i INT
SET @i=1
SET NOCOUNT ON
DELETE FROM test
WHILE (@i<=1000000)
       BEGIN
       INSERT INTO test
       (rec_num,date,time,reff,acc,stock,bs,price,qty,status,owflag,ordrec,firmid,checkord,branchid)
       VALUES (@i,'20160316','00:00:00','1234567890','A123456789','600000','B','12.00','3000','0','1','1','111',0,'111')
       SET @i=@i+1
END
--commit tran
--PRINT OUT THE TIME STAMP AFTER THE TEST
PRINT convert(varchar(24),GETDATE(),121)
--VERIFY THE COUNT OF THE INSERTION
select convert (char(20),sj,108),count (*) from dbo.test group by convert (char(20),sj,108) order by 2 desc;
--DROP THE TABLE
DROP TABLE dbo.test;

About the Author and Contributors

Tony Wu, Senior Solution Architect in the Product Enablement team of the Storage and Availability Business Unit, wrote the original version of this paper.

Catherine Xu, Senior Technical Writer in the Product Enablement team of the Storage and Availability Business Unit, edited this paper to ensure that the contents conform to the VMware writing style.

Filter Tags

  • Reference Architecture
  • vSAN