]

Category

  • Reference Architecture

Product

  • Cloud Foundation

Microsoft SQL Server 2017 Database on VMware vSAN 6.7 using VMware Cloud Foundation

Executive Summary

This section covers the executive summary of this guide.

Business Case

Customers today wanting to deploy a virtualized SQL Server infrastructure require a cost-effective, highly scalable, and an easy-to-manage solution. Most importantly, the infrastructure itself must be able to scale with minimal cost impact yet still provide enterprise-class performance. Hyperconverged infrastructure (HCI) makes it easier to plan for multiple enterprise workloads consolidations within several clusters. It helps combine hardware silos, reduce deployment complexity and management difficulty, and save capital and operational expenditures. VMware vSAN™ is VMware’s premier storage solution for HCI, which provides the broadest set of HCI deployment choices for enterprise mixed workload deployment and provisioning. VMware vSAN is also an ideal platform to migrate your current mixed workload environment from traditional infrastructure, associated with high costs and operational compromise for mission-critical applications.

VMware Cloud Foundation™ is an integrated cloud infrastructure that combines compute, storage, networking, security, and cloud management services, VMware Cloud Foundation provides an ideal platform on which to run enterprise workloads and containerized applications across both private and public environments. VMware Cloud Foundation makes it easy to deploy and run a hybrid cloud by delivering common infrastructure that is fully compatible, stretched, and distributed along with a consistent cloud operational model for your on- and off-premises data centers and public cloud environments. 

VMware vSAN, as the key storage component of VMware Cloud Foundation, provides simpler operations, lower costs, and greater agility to customers with infrastructure scalability and without data center complexity. VMware vSAN solves the problems of storage cost and complexity by giving you a high-performance, flash-accelerated datastore you can enable with just a few clicks and grow affordably without large capital investments.

  • vSAN simplifies managing and automating storage for desktops and apps by eliminating traditional, purpose-built storage systems and by letting IT use familiar vCenter tools rather that proprietary storage-management interfaces. vSAN integrates storage policies into the VM-creation workflow, ensuring each virtual desktop automatically has the type of storage it needs. 
  • vSAN delivers the storage performance critical to ensuring virtual desktops and apps meet the expectations of users accustomed to physical devices.
  • vSAN provides a distributed architecture that allows for elastic, non-disruptive scaling. Capacity and performance can be scaled at the same time. This “grow-as-you-go” model provides predictable, linear scaling with affordable investments spread out over time. 

Based on the workload domains automatically deployed by VMware Cloud Foundation, we developed this solution to demonstrate the performance of SQL Server 2017 running on vSAN 6.7 for various requirements of CIOs, SQL Server DBAs, and system administrators of enterprise data centers, including the OLTP workload performance of large database, the performance on Windows and Linux platforms and their performance comparison, the performance scalability on on-premises HCI platform. Meanwhile, Machine learning capabilities of SQL Server 2017 expand, extend, and deepen the integration between the data platform, advanced analytics, and data science for SQL Server users. We explored a use case to demonstrate how to deploy machine learning for virtual disk usage prediction.

Solution Overview

This reference architecture is a showcase of using vSAN all-flash and VMware Cloud Foundation for operating and managing SQL Server in a VMware vSphere® environment. 

This reference architecture explains the reasons for running SQL Server on vSAN and how to choose correctly configured SQL Server in a vSAN environment.

In this reference architecture, the infrastructure for running SQL Server 2017 workloads is automatically deployed and maintained with VMware Cloud Foundation. VMware Cloud Foundation is an integrated software stack that bundles compute virtualization (VMware vSphere), storage virtualization (VMware vSAN), network virtualization (VMware NSX), and cloud management and monitoring (VMware vRealize Suite) into a single platform that can be deployed on premises as a private cloud or run as a service within a public cloud. VMware Cloud Foundation provides the benefits for SQL Server 2017 workloads.

OLTP performance of SQL Server 2017 on Windows and Linux platforms, large databases, and changing Storage Policy Based Management (SPBM) for different service level VMs to meet business, are demonstrated. Also, SQL Server native backup performance on vSAN, and machine learning of SQL Server 2017 are showcased as well.

Key Results

The key highlights for the technical white paper are as follows.

  • Demonstrate that VMware Cloud Foundation can efficiently deploy multiple workload domains and provide an integrated software stack with capability of automation and integration for a data center. 
  • Address the business benefits of running SQL Server in a vSAN and vSphere environment, and clarified the reasons for running SQL Server on vSAN.
  • Provide the high-level guide to configure and deploy SQL Server on vSAN and the recommendations for the performance evaluation.
  • Showcase well performed SQL Server OLTP workloads on vSAN, in Windows, and specially a Linux platform.
  • Demonstrate the performance scale-up in an on-premises vSAN and vSphere cluster.
  • Prove the capability of vSAN supporting large databases (2TB, 4TB) at predictable performance. We evaluated the requirement of the monster VM for equivalent or similar performance as smaller DB like 300GB or less.
  • Demonstrate the backup performance for multiple databases log maintenance.
  • Demonstrate how to use machine learning model for data analytics purpose to predict virtual disk usage using SQL Server 2017.

Modernizing Microsoft SQL Server with VMware Cloud Foundation

Customers wanting to modernize their existing complex Business Critical Applications (BCA) like Microsoft SQL Server environments need a reliable application delivery model to meet business requirements. Basically, they need simplified processes creating a single infrastructure component, and they need process efficiency to reduce the time to manage the infrastructure to hand version changes in infrastructure or applications, and furthermore they need keep track of all infrastructure components to reduce the manage complexity. 

VMware Cloud Foundation is an integrated software stack that bundles compute virtualization (VMware vSphere), storage virtualization (VMware vSAN), network virtualization (VMware NSX® for vSphere), and cloud management (VMware vRealize® Suite) into a single platform that can be deployed on premises as a private cloud or run as a service within a public cloud. 

Figure 1. Full Stack Hyperconverged Infrastructure by VMware Cloud Foundation

Cloud Foundation helps to break down the traditional administrative silos in data centers, merging compute, storage, network provisioning, and cloud management to facilitate end-to-end support for application deployment. VMware Cloud Foundation has the following benefits to modernizing SQL Server using this unified SDDC platform. 

  • Built on standardized and validated infrastructures through the Workload Domain construct
  • Simple to deploy and operate by automated lifecycle management
  • Scale-up and scale-down with ease 
  • Integrated software-defined cloud platform for management and operational simplicity
  • Built-in intrinsic security for databases

Workload Domain

VMware Cloud Foundation abstracts the individual building blocks of the Software-Defined Data Center—compute, storage, networking, and cloud management—through the Workload Domain construct. The Workload Domain is the unit of consumption in the private cloud. It aggregates the physical servers created on the composable infrastructure into logical pools of capacity on top of which the SDDC building blocks of compute, network, and storage virtualization are deployed. Each Workload Domain is automatically deployed based on an architecture designed and validated by VMware architects, providing unparalleled reliability, consistency, and supportability. Workload domain is a proven and rock-solid infrastructure for running Microsoft SQL Server workloads. 

Automated Lifecycle Management 

Cloud Foundation offers Automated Lifecycle Management on a per-Workload Domain basis. Available updates for all components are tested for interoperability and then bundled with the necessary logic for proper installation order. Update bundles are then scheduled for automatic installation on a per-workload domain basis. This allows the cloud administrator to target specific SQL Server workloads or environments (development vs. production, for example) for updates independently of the rest of the environment. It also removes any confusion and risk associated with updating many components in a complex system.

Elastic and Scalable Infrastructure

Customers can deploy new SQL Server workloads, scale up or down capacity for the existing SQL Server workloads easily with Cloud Foundation. A Cloud Foundation solution can support a maximum of 15 workload domains, and each domain supports VMware ESXi™ hosts in accordance with vCenter maximums. As the demand for SQL Server goes up and down, ESXi hosts can be easily added or removed from the Workload Domain.

Management and Operational Simplicity  

VMware Cloud Foundation automatically deploys and maintains vRealize Suite with vRealize Suite Lifecycle Manager. Customers can easily enable and maintain the entire vRealize Suite for their SQL Server deployments and leverage vRealize Suite components to simplify management and operations and provide superior performance.

VMware vRealize Operations (vROps) Management Pack for SQL Server by Blue Medora, collects metrics from SQL Server virtual machines and vSphere and presents aggregated data to the vRealize Operations Manager for monitoring, trending and predictive analysis. Its smart alerts with dynamic thresholds allow customers to easily isolate root causes of issues and proactively optimize performance of the entire stack.  

VMware vRealize Log Insight collects, imports, and analyzes logs to provide real-time answers to problems related to systems, services, and applications, and derive important insights. It is a particularly useful tool in virtualized SQL Server environment where many components generate logs and troubleshooting can prove challenging. vRealize Log Insight and vRealize Operations together provide the most comprehensive environment insight and the best troubleshooting capabilities for SQL Server customers. 

VMware vRealize Automation extends SQL Server as a service by allowing the users to request and provision SQL Server on demand.

Secure Databases

Database is often targeted by cyber-attacks because it is where sensitive information is stored, hence database security is one of the most important considerations in application design. Cloud Foundation automatically installs and configures VMware NSX in each workload domain. NSX-based micro-segmentation secures SQL Server databases through centralized policy controls and distributed stateful firewalling. Customers can use VMware NSX to associate firewall rules at the router or at the virtual machine level to achieve greater granularity.

Reasons for Running Microsoft SQL Server on vSAN

  • Modular scale-out expansion: The ability to buy vSAN in chunks and scale linearly enables customers to expand as needed, one node at a time. The performance of a database VM increases as additional nodes are added, while the underlying clustered storage can provide increased IOPS and throughput with low latency.
  • Enterprise class database performance and application response time: Tightly integrated with the hypervisor, vSAN sits directly in the IO data path which is the position to make rapid data placement decisions. As a result, vSAN can deliver the highest levels of performance without over taxing CPU or memory resources.
  • Multiple architectures to meet the business requirement. vSAN supports regular architecture and stretched cluster architecture; meanwhile you may choose vSAN iSCSI service to access the LUNs provisioned by vSAN through software initiators in the guest OS. That means you can use local vSAN datastore, stretched vSAN datastore with active-active access from two data centers, and remote access from another vSphere cluster for your SQL Server databases.
  • vSAN is a distributed object-based storage system that leverages the SPBM vSphere feature to deliver centrally managed, application-centric storage services, and capabilities. Administrators can specify storage attributes, such as capacity, performance, and availability as a policy on a per-object level, such as a VMDK, or on a virtual machine level. Policy-based management to meet different business requirement: with SPBM, vSAN allows administrators to adapt to ongoing changes of specific application requirements. Unlike feature of traditional storage mainly at LUN or pool level, vSAN can assign a specific storage policy to each application instead of using a one-size-fits-all policy or make sure that each VM in on the right Container/volume/LUN. vSAN allows customers to manage storage SLAs—performance, availability, and security policies per VM and per VMDK. Rapid, simplified provisioning via policy automation non-disruptive policy changes. vSAN has the capability to use SPBM for granular control to assign high-performance policy for production databases and space-efficiency policy for other non-production databases like development and testing.
  • Storage efficiency: customer can choose Erasure coding feature on vSAN for their tier-2 workload for capacity instead of performance dominant usage.  For the SQL Server data disks, using RAID 5/6 erasure coding to reduce space usage might be a choice if space/cost savings are desired and performance is not a primary goal.  
  • Ease of deployment: vSAN leverages internal storage pools that aid in eliminating storage implementation and management complexities such as deploying separate LUN or RAID groups for each data or log file. Traditional networking constructs such as switches, zones, LUN masking, multi-pathing are likely unnecessary on vSAN, as vSAN drastically simplifies the networking infrastructure and improves database access speeds, as a result, improving performance, leveraging high-speed interconnects between nodes.
  • Ease of management: Customers wanting to modernize their existing complex BCA environments can leverage VSAN, to eliminate traditional IT silos of compute, storage, and networking. All intelligence and management move into a single software stack, allowing a VM-and application-centric policy-based control and automation. Customers are able to manage both their vSphere and HCI environment through vCenter from a single pane of glass. In-depth performance monitoring (at disks, disk group, host, cluster, network, read/write cache level) is provided with “Performance Monitoring” tool that was introduced as part of Health Monitoring, and is observable through vCenter. vRealize Operations Native Management Pack, together with 3th party VMware vRealize Operations Management Pack like Blue Medora for Microsoft SQL Server, provides centralized management, monitoring and troubleshooting tool allow users to optimize their SQL Server resources, avoid possible issues for Day 2 Operations.
  • Business Continuity and Data Protection: a combination of the vSAN native Snapshot and Cloning capability along with third-party tools to perform Backup and Recovery, Cloning, data refresh of Always On Availability Groups and Non Always On Availability Groups SQL Server. vSAN Stretch cluster provides active/active continuous availability at a metro distance, coupled with vSphere SRM offering a multisite DR solution. DR solution coupled with SQL Server replication, log-shipping provide replication and recovery at a global distance with a minimal five minutes RTO—near zero data loss.  

Choose Correctly Configured vSAN for SQL Server

Guideline Overview

vSAN is distributed object store datastore formed from locally attached devices from ESXi host. It uses disk groups to pool together flash devices and magnetic disks (in a case of a hybrid configuration) as single management constructs. The following is the general guideline to choose vSAN as the HCI solution for your mission critical SQL Server deployment:

  • It is recommended to use similarly configured and sized ESXi hosts for vSAN cluster to avoid imbalance. This ensures a balance of virtual machine storage components across the cluster of disks and hosts.
  • It is recommended to use 10 GE or higher speed network NIC for vSAN traffic.
  • It is recommended to have multiple VMDKs in each SQL Server VM for large user database. Adding multiple VMDKs one by one to spread database components across disk groups in vSAN Cluster.
  • Smaller sized VMDK less than 255GB will not cause the vSAN internal striping of the virtual disk, however you need to configure multiple identical virtual disks for large database. Using smaller sized VMDK can bring benefits for more granular disk management, you need to balance the management tradeoff between the granularity of the disk size.
  • Use Paravirtual SCSI (PVSCSI) adapter for disks and spread disks between controllers based on IOPS. 
  • Large-scale workloads with intensive I/O patterns might require queue depths significantly greater than Paravirtual SCSI default values. You can increase the number of pages used by the PVSCSI adapter for the request ring from the default value of 8 pages to 32 pages and set the PVSCSI device queue depth to 254. See KB2053145 for more information.

Workflow for Configuration of SQL Server on vSAN

This section described how to configure a SQL Server virtual machine on vSAN with disk configuration, and settings from a scratch.

Figure 2. Workflow of SQL Server Configuration on vSAN

Step 1: Design Compute Resource

We recommend designing rightsized CPU and memory for a SQL Server virtual machine, and allocate adequate memory to SQL Server instance. Do not oversize CPU for SQL Server virtual machine. Set memory reservations on the VM level to reserve all guest memory for tier-1 (high performance or mission critical) SQL Server. Configure MIN/MAX memory on the SQL instance level. See Architecting Microsoft SQL Server on VMware vSphere Best Practices Guide for more details. 

Step2: Design Virtual Disks

When designing your SQL Server drive configuration, use several virtual disks to get the optimal performance, spread multiple virtual disks across the four available PVSCSI controllers. Spread disks between controllers based on IOPS. 

The following is an example for the data, log and tempdb distribution on different PVSCSI controllers. 

Table 1. Example of Virtual SCSI Controller Setting for SQL Server VM

SQL data files (1)

PVSCSI 1

SQL data files (2)

PVSCSI 1

TempDB

PVSCSI 2

TempDB log

PVSCSI 3

Database log files

PVSCSI 3

Step 3: Design Desired SPBM for SQL Server Virtual Machines

You need to understand the SPBM requirement before choosing the right policy for your storage. Consider the following policies as our recommendations: 

  • Failures to tolerate: Ensure that an option with at least “1 (one) failure to tolerate” is selected for mission critical application, and consider using RAID-5 for DATA if storage capacity outweighs performance considerations (for an example, for a tier-2 application). Do not use the option: “No data redundancy”.
  • Number of disk stripes per object: Use default one (1) and consider spreading the data between multiple VMDKs attached to multiple PVSCSI controllers.
  • Set object space reservation—“Thick provisioning” for all VMDKs hosting SQL Server data and log files from SPBM setting. 

RAID-5 for Data Disk and RAID-1 for Log Disk

For the read intensive OLTP database such as TPC-E-like or Data Warehouse workload, the most space requirement is from data including the table and index, and the space requirement for LOG can be maintained in a small portion comparing with the size of the data. In all-flash configurations where saving space is the priority, usage of RAID-5/6 for data VMDKs may be a choice. Perform thoughtful testing as RAID-1 configuration will generally provide better performance. 

Step 4: Create Multiple Data Files for Large Database

For databases with high-performance requirements, split the database into multiple files across multiple virtual hard disks. vSAN provides optimal performance when leveraging more hardware resource to serve I/O by having multiple virtual disks. For example, you can split a VM with four vCPUs hosting a 500 GB database into two 250 GB database files spread across two virtual disks. In general, one database file per vCPU is ideal. In addition, size all database files equally.

To avoid unnecessary complexity, add files to databases only if there is database page latch contention. To look for contention, monitor the PAGEIOLATCH_XX values and spread the data across more files as needed. A number of other factors, such as memory pressure, can cause PAGEIOLATCH_XX latency as well, so investigate the situation thoroughly before adding files.

For databases with high-performance requirements, split the database into multiple files across multiple virtual hard disks. This is intended to avoid database page latch contention. Enable autogrow on the databases and use a value that is reasonable for the projected growth of the database. 

Step 5: Evaluate the Capability of vSAN

The general performance expectation is RAID-1> RAID-5.   We highly recommend you leveraging the latest HCI benchmark tool to evaluate the capability of your vSAN. And consider the following benchmark tools to evaluate your database performance before moving your SQL Server into production environment backed by vSAN.

HCI benchmark—Evaluate the Performance from the Cluster Level

HCIBench is a benchmark tool of HCI solution that automates the use of the popular VDbench or FIO testing tool in larger clusters. Users simply specify the testing parameters they plan to run, and HCIBench instructs Vdbench or FIO what to do on each node in the cluster.

HCIBench aims to simplify and accelerate customer Proof of Concept (POC) performance testing in a consistent and controlled way. This tool fully automates the end-to-end process of deploying test VMs, coordinating workload runs, aggregating test results, and collecting necessary data for troubleshooting purposes.

Evaluate the Performance from the Database Level

After making sure the performance of HCI Benchmark is acceptable, consider to use SQL Server benchmark toolkits to evaluate the SQL Server performance. 

There are a couple of tools for you to test SQL performance. For general free TPC-C like workloads, you may use HammerDB or DVD Store since they are free. But you have multiple choices to do so, depending on what Key Performance Indicator(KPI) you want to drive, such as TPS, IOPS, latency, response time. We briefly introduce the tools for evaluation of the SQL Server performance at the database level in the following section.

Benchmark Factory

TPC-E is the latest TPC benchmark for modern OLTP benchmark, if you want to test the performance, consider Benchmark Factory for Database (BMF). BMF simulates real database application workloads using out-of-the-box, industry-standard benchmarks (TPC-H, TPC-C, TPC-D, TPC-E, ASP3AP and scalable hardware). BMF has rich visualization features for test monitoring and summary (however it is not free to use). 

HammerDB

The HammerDB tool is an open-sourced tool that can run these benchmarking tests against SQL Server, Oracle, MySQL, and PostgreSQL installations. It is open-sourced and freely available at http://hammerdb.com. It generates a benchmark value in the form of ‘transactions placed per minute’ and ‘orders placed per minute’ per test cycle, and tests the performance of the instance configuration and the infrastructure underneath it.

DVD Store Version 3 

DVD Store 3 (DS3) is an open source test / benchmark tool that simulates an online store that sells DVDs. Customers can log in, browse DVDs, browse reviews of DVDs, create new reviews, rate reviews, become premium members, and purchase DVDs. Everything needed to create, load, and stress this online store is included in the DVD Store project.

SQL I/O stress 

For SQL Server I/O stress test, you may consider using diskspd from Microsoft.  Diskspd is a feature-rich and versatile storage testing tool. Diskspd combines robust and granular IO workload definition with flexible runtime and output options, creating an ideal tool for synthetic storage subsystem testing and validation.

Although above tools can be used for database performance evaluation, the best testing is the one with your own data. Consider restoring a copy of your DB and run trace profile and/or job/SPs and compare performance.

Solution Technology Overview

This section provides an overview of the technologies used in this reference architecture, besides VMware Cloud Foundation.

  • VMware vSphere 6.7 Update 1
  • VMware vSAN 6.7 Update 1
  • Microsoft SQL Server 2017

VMware vSphere 6.7 Update 1

VMware vSphere 6.7 is the next-generation infrastructure for next-generation applications. It provides a powerful, flexible and secure foundation for business agility that accelerates the digital transformation to cloud computing and promotes success in the digital economy. vSphere 6.7 supports both existing and next-generation applications through its:

  • Simplified customer experience for automation and management at scale
  • Comprehensive built-in security for protecting data, infrastructure and access
  • Universal application platform for running any application anywhere

With vSphere 6.7, customers can run, manage, connect, and secure their applications in a common operating environment, across clouds and devices.

VMware vSAN 6.7 Update 1

VMware vSAN 6.7 Update 1 is bundled components in VMware Cloud Foundation. VMware vSAN is the industry-leading software powering VMware’s software defined storage and HCI solution. vSAN helps customers evolve their data center without risk, control IT costs and scale to tomorrow’s business needs. vSAN, native to the market-leading hypervisor, delivers flash-optimized, secure storage for all of your critical vSphere workloads. vSAN is built on industry-standard x86 servers and components that help lower TCO in comparison to traditional storage. It delivers the agility to easily scale IT and offers the industry’s first native HCI encryption. vSAN 6.7 U1 simplifies day-1 and day-2 operations, and customers can quickly deploy and extend cloud infrastructure and minimize maintenance disruptions. Secondly, vSAN 6.7 U1 lowers the total cost of ownership with more efficient infrastructure.   New version of vSAN ReadyCare rapidly resolves support requests. vSAN ReadyCare is a marketing name introduced to capture the significant investments VMware has made to support vSAN customers. VMware continues to invest in ReadyCare support, and new ReadyCare simplifies support request resolution and expedites diagnosis of issues.

vSAN 6.7 U1 can automatically reclaims capacity, using less storage at the capacity tier for popular workloads. It has full awareness of TRIM/UNMAP command sent from the guest OS and can reclaim the previously allocated storage as free space.  This is an opportunistic space efficiency feature that can deliver much better storage capacity utilization in vSAN environments.

Microsoft SQL Server 2017

Microsoft SQL Server 2017 enable users to build modern applications using the language of your choice, on-premises and in the cloud, now on Windows, Linux and Docker containers. SQL Server 2017 focuses heavily on developer enablement with Microsoft support for SQL Server on Linux—it can be deployed in container platforms such as Docker, Open Shift, and Kubernetes. Microsoft has added adaptive query processing, which introduces new techniques for adapting SQL Server to specific application workload characteristics. Additionally, organizations gain flexibility when deploying the database, and supporting mixed workload environments. 

With the release of SQL Server 2017, Microsoft changed the name of R Services to Machine Learning Services (MLS) and added support for Python, a widely implemented programming language known for its straightforward syntax and code readability. Having Machine Learning in SQL Server can let you use the same security model to access data. And you can keep the data in the database without movement or refreshing as the data volumes grow. Furthermore, you can take advantage of the multi-threaded architecture of SQL Server.

Solution Configuration

This section introduces the resources and configurations:

  • Architectures
  • Hardware resources
  • Software resources
  • Network configuration
  • SQL Server configuration including:
  • Database space usage
  • vSAN configuration and SPBM policy
  • SQL Serv

Architectures

Figure 3 is the architecture of the infrastructure created by VMware Cloud Foundation. There are three workload domains in total, one management domain, one infrastructure domain for SQL Server, and one VMware Horizon View domain. In this solution, we only used the two former domains. 

Architecture using VMware Cloud Foundation to Manage Workload Domains

Figure 3. Architecture using VMware Cloud Foundation to Manage Workload Domains

Figure 4 is the architecture of the SQL Server workload domain environment. 

We built 11 SQL Servers to host different-purposed SQL Server databases and applications on the same vSAN and vSphere cluster. We built four medium-sized SQL Servers on Linux platform, and four medium-sized SQL Servers on Windows platform. We built two virtual machine to host large databases. One SQL Server machine learning server was built for demonstrating the functionality of Machine Leaning in SQL Server 2017.  

Architecture of SQL Server Workload Domain Environment

Figure 4. Architecture of SQL Server Workload Domain Environment

Hardware Resources

In this solution, we used VMware Cloud Foundation to provision two clusters, one is for the SQL Server workload domain, and the other is for the management working domain. For SQL Server workload domain, we used four DELL PowerEdge R640 servers, a 1U platform for density, performance and scalability, and with optimized the application performance. 

Each VMware ESXi™ host contains two disk groups, and each disk group consists of one cache-tier NVMe SSD and four capacity-tier SAS SSDs. We configured pass-through mode for the capacity-tier storage controller, which is a preferred mode for vSAN with complete control of the local SSDs attached to the storage controller.

Each ESXi server in the vSAN cluster for SQL Server workload domain has the configuration shown in Table 2.

Table 2. Hardware Configuration for vSAN Cluster for SQL Server Workload Domain

PROPERTY

SPECIFICATION

Server model name

4 x DELL PowerEdge

R640

CPU

2 x Intel(R) Xeon(R) Gold

6132 CPU @ 2.60GHz,

14 core each

RAM

288GB

Network adapter

2 x Intel(R) Ethernet Controller 10G X550T port

2 x Intel Corporation

I350 Gigabit Network

Connection port

Storage adapter

1 x Dell HBA330 Mini

2 x NVMe SSD

Controller 172Xa/172Xb

PROPERTY

SPECIFICATION

Disks

Cache—2 x 1.6TB NVMe

Samsung PM1725a

Capacity—8 x 1.92TB

Samsung PM1633a SAS

SSD

For management workload domain, we used four QuantaPlex servers. Each VMware ESXi™ host contains one disk groups, and every disk group consists of one cache-tier SAS SSD and four capacity-tier SAS HDDs. We configured pass-through mode for the capacity-tier storage controller, which is a preferred mode for vSAN with complete control of the local SSDs attached to the storage controller.

Each ESXi server in the vSAN cluster for management workload domain has the configuration shown in Table 3.

Table 3. Hardware Configuration for vSAN Cluster for Management Workload Domain

PROPERTY

SPECIFICATION

Server model name

4 x QuantaPlex T41S-2U

CPU

2 x Intel(R) Xeon(R) CPU

E5-2630 v3 @ 2.40GHz ,

8 core each

RAM

256GB

Network adapter

2 x Intel Corporation

82599EB 10-Gigabit

SFI/SFP+

Storage adapter

1 x Fusion-MPT 12GSAS

SAS3008 PCI-Express

2 x Wellsburg AHCI

Controller

PROPERTY

SPECIFICATION

Disks

Cache—1 x 370GB

HITACHI SAS flash

Capacity—4 x 1.09TB

HITACHI Serial Attached

SCSI Disk

Software Resources

Table 4 shows the software resources used in this solution.

Table 4. Software Resources

Software

Version

Purpose

VMware Cloud Foundation

3.7

Unified SDDC platform

VMware vCenter Server®and VMware ESXi™

6.7 U1

ESXi Cluster to host virtual machines and provide the vSAN Cluster. VMware vCenter Server provides a centralized platform for managing VMware vSphere environments

VMware vSAN

6.7 U1

Software-defined storage solution for hyperconverged infrastructure

Microsoft SQL Server

2017 Enterprise

Edition, RTM-CU13*

Database software

Microsoft Windows Server

2016, x64, Standard

Edition

Operating System for the VMS:

  • SQL Server database virtual machines
  • Load generation virtual machines
  • Test clients

Benchmark Factory for Databases

8.1

OLTP database and workload generator

Red Hat Enterprise Linux

7.4

Operating System for the SQL Server in Linux

* Note that we did the tests on the SQL Server CU13.  Cumulative Update for SQL Server 2017 on Linux provides bug fix and better performance.

Network Configuration

We created a VMware vSphere Distributed Switch™ to act as a single virtual switch across all associated hosts in the data cluster for the SQL Server workload domain.

The vSphere Distributed Switch uses two 10GbE adapters for the teaming and failover. A port group defines properties regarding security, traffic shaping, and NIC teaming. To isolate vSAN, virtual machine, and VMware vSphere vMotion® traffic, we used the default port group settings except for the uplink failover order. We assigned one dedicated NIC as the active link and assigned another NIC as the standby link. vSAN and vSphere vMotion are on separate VLANs and the uplink order is reversed. See the Table 5 for the configurations.  We set MTU to 9000 for optimal network performance for the servers in both SQL Server and management workload domains.

Table 5. Network Configuration for SQL Server Workload Domain

DISTRIBUTED PORT GROUP

ACTIVE

UPLINK

STANDBY UPLINIK

VMware vSAN

Uplink2

Uplink1

Virtual machine and vSphere vMotion

Uplink1

Uplink2

SQL Server Configuration

This section introduced the configuration of the VM, including DB size used for the multiple OLTP databases, VM CPU and memory setting for each VM, and OS and SQL Server instance settings, disk layout, and configuration. We highly recommend you following the Best Practice of SQL Server on vSphere before reading this section to achieve the optimal performance of SQL Server on vSphere and vSAN cluster. 

Database Space Usage

Configuring the Scale Factor decides the size of a database size. We configured three sized DB servers for the performance tests. WE set SF=32 for medium-sized database which created 300GB database, or a TPC-E like OLTP user database with customer number 32,000. And we use SF=210 for 2TB database, or a TPC-E like OLTP user database with customer number 210,000, and SF=420 for 4TB database, or a TPC-E like OLTP user database with customer number 420,000. 

For the TPC-E like OLTP workload, the database size is based on the real disk space requirement. For the real-case production, additional space should be reserved according to DB growth rate.

vSAN Configuration and SPBM Policy

We disable the deduplication and compression, encryption, iSCSI Target service but enable the Performance Service on the vSAN configuration. 

Two SPBM policies were created to test DB performance, following the Considerations for running Microsoft SQL Server workloads on VMware vSAN.

  • The first is to use RAID-1 with 100 percentage space reservation policy for the data and log virtual disks (we used RAID-1 in the whole document for the simplicity sake).
  • The second is to set Erasure Coding (RAID-5) on the data file disks, but keep RAID-1, with 100 percentage space reservation for both data and log disks. 

SQL Server Virtual Machine and Disk Layout

We tested various sized databases on vSAN, Windows and Linux platforms. The virtual machine configuration and disk layouts are as follows.

SQL Server on Windows Platform

We configured four SQL Server virtual machines for the performance tests. Databases were created by the Benchmark Factory for Databases. The database and index files consumed approximately 300GB space. 24 vCPUs and 64GB RAM with memory reservation were assigned to the four VMs.  We set maximum and minimum memory of SQL Server instance to 51GB, this is to align with the memory limit of the Linux platform, and see the explanation in the section – SQL Server on Linux Platform. 

For the TPC-E-like OLTP workload, the database size is based on the actual disk space requirement and additional space for the database growth. Based on this, the virtual disk configuration for the 300GB database is: 1 x 280GB OS, 2 x 250GB data disks, 1 x 80GB log disk, 1 x 60GB tempdb disk and 1 x 160GB backup disk.

Table 6. 4 x 300GB SQL Server Disk Layout

Purpose

Number x Size (GB)

Operating system

1 x 100

Log disk

1 x 80

Data disks

2 x 250

Tempdb

1 x 60

Backup disk

1 x 160

SQL Server on Linux Platform

Starting with SQL Server 2017, SQL Server it is now available on Linux platform.  Following the Quickstart guide to install SQL Server on Linux, and follow the Best Practice guidelines.

The default RAM allocation for a SQL Server virtual machine on Linux platform is 80% by default. Microsoft does not recommend changing the default value, that means for a virtual machine with 64GB memory, the default setting of the SQL Server will be around 51GB. 

We use the same configuration of as the Windows platform but allocating more space for the operating system, from 100GB to 280GB.

SQL Server Large Databases

28 vCPUs and 128GB RAM with memory reservation were assigned to the 2TB database VM. The maximum and minimum memory of SQL Server instance were set to 120GB.  

56 vCPUs and 256GB RAM with memory reservation were assigned to the 4TB database VM. The maximum and minimum memory of SQL Server instance were set to 224GB.  

For large databases, we used the different virtual disk sizes for data and log files. We designed 2TB virtual disk for large databases like 2TB and 4TB for two reasons: one is too many small-sized virtual disks (like less than 255GB) will cause the management complexity, and another is vSAN have the capability to split large VMDK into small pieces to spread across hosts.  Table 7 and Table 8 showed the disk layout and size for the two large databases. 

Table 7. 2TB SQL Server Disk Layout

2TB Database

Number and Capacity in GB

Operating system

100

Log disk

1 x 400

Data disks

2 x 2000

TempDB

1 x 60

Table 8. 4TB SQL Server Disk Layout

4TB Database

Number and Capacity in GB

Operating system

100

Log disk

1 x 800

Data disks

4 x 2000

TempDB

1 x 60

OS and SQL Server Instance Settings

Windows Server 2016 and Red Hat Enterprise Linux 7.4 were used as the guest operating system (OS) for the SQL Server 2017 virtual machines. Each SQL Server virtual machine in the cluster had VMware Tools installed, and each virtual hard disk for data and log was connected to separate VMware Paravirtual SCSI (PVSCSI) adapter to ensure the optimal throughput and lower CPU utilization. The virtual network adapters used the VMXNET 3 adapter, which was designed for the optimal performance.

The settings that can benefit from the Benchmark Factory for Databases workload are summarized as follows: 

  • Operating system settings:
    • Set the power plan to high performance in the guest OS on Windows platform and set high performance and throughput in the kernel setting in the guest OS on Linux platform.
    • Enable SQL Server to use large pages by enabling the Lock pages in the memory user right assignment for the account that runs the SQL Server in Group Policy on Windows platform.
    • Grant Instant File Initialization rights to the SQL Server service during installation on Windows platform.
  • SQL Server MAXDOP 
    • Set the MAXDOP to 1 in the instance level for this OLTP workload can achieve best performance on both Linux and Windows platforms.
  • Disable forced flush on Linux platform
    • Because vSAN can guarantee durable writes across power loss, follow the performance recommendation of running SQL Server on Linux by Microsoft. In this solution, we enabled trace flag (TF) 3979 to disable the forced flush behavior and set the alternatewritethrough and writethrough options in mssql.conf to ZERO.

See Configuring disks to use VMware Paravirtual SCSI (PVSCSI) adapters (1010398)

See Choosing a network adapter for your virtual machine (1001805).

Solution Validation

Test Overview

We validated this solution through running Microsoft SQL Server on the vSAN:

  • Performance test tools and configuration
  • VMware Cloud Foundation configuration
  • Performance test results:
    • SQL Server OLTP performance on Windows platform (including the Erasure Coding for database data virtual disks)
    • SQL Server OLTP performance on Linux platform
    • Performance scale-up 
    • SQL Server OLTP performance of large databases 
    • SQL Server concurrent backup and tail-log backup
  • Machine learning service in SQL Server 2017

Performance Test Tools and Configuration

We used the Benchmark Factory for Databases to run tests with the desired parameters. Benchmark Factory for Databases is a database performance testing tool that enables you to conduct database industry-standard benchmark testing and scalability testing.

Note: Each user simulates the same TPC-E-like workload against the system under test (SUT), so one user performs the workload once, but running 10 users would run the workload 10 times in parallel. The number of users is related to Benchmark Factory load and does not correlate to actual users connecting to a database server. In the performance test, we used 80 users to generate the workload without any transaction delay. 

We focused on the vSAN aggregate performance of the 4-node cluster. Each test duration was set to one hour with 15-minute preconditioning and 45-minute sample period. 

We also used the vSAN Performance Service to monitor the vSAN performance. vSAN Performance Service is used to monitor the performance of the vSAN environment, using the web client. The performance service collects and analyzes performance statistics and displays the data in a graphical format. You can use the performance charts to manage your workload and determine the cause of problems.

Table 9 lists the key performance counters used in the testing.

Table 9. Key Matrix of Benchmark Factory for Databases

Monitor Counter

Description

Transaction per second (TPS)

Measures the transactions in the user database.

The ideal value is as large as possible for a designed SQL Server.

Transaction time

The summation of response time and retrieve time. The retrieve time is the time taken from the server responds to a SQL statement until the last bytes of the data are obtained.

Deploy VMware Cloud Foundation for SQL Server

VMware Cloud Foundation makes it easy to deploy and run a hybrid cloud. VMware Cloud Foundation provides integrated cloud infrastructure (compute, storage, networking, and security) and cloud management services to run enterprise applications in both private and public environments. Figure 5 demonstrated the VMware Cloud Foundation deployment workflow and required components.

Workflow to Deploy VMware Cloud Foundation and Components

Figure 5. Workflow to Deploy VMware Cloud Foundation and Components

Cloud Foundation Builder 

Cloud Foundation Builder is a Photon OS VM that is delivered as an OVA file. It contains all code and product bits to deploy the full SDDC stack for the management domain for your VMware Cloud Foundation instance. The VM can be deployed on any physical device that has connectivity with the ESXi hosts, including personal laptops and external hosts. The VM can be re-used for additional bring-ups or can be deleted after use. Follow the bring-up UI on the VM to deploy the SDDC stack. Input parameters are passed in via a file import.

SDDC Manager

SDDC Manager is the centralized management software in Cloud Foundation used to automate the lifecycle of components, from bring-up, to configuration, to infrastructure provisioning to upgrades/patches. SDDC Manager complements vCenter Server and vRealize Suite products by delivering new functionality that helps cloud admins build and maintain the SDDC.

We continued to use vCenter Server as the primary management interface for the virtualized environment after the deployment.

Figure 6 is the screenshot of the VMware Cloud Foundation—SDDC Manager. Screenshot below shows three domains being available, with vSAN enabled and in active status. 

Note that vSAN is the required primary storage for the management domain. You can deploy Virtual Infrastructure workload domains without vSAN and use external NFS storage instead.

Snapshot of Virtual Infrastructure of SDDC Manager

Figure 6. Snapshot of Virtual Infrastructure of SDDC Manager

Figure 7 is the screenshot of the SDDC dashboard. From the dashboard you can take a quick glimpse of the three domains and their management statuses, such as the cluster type—hybrid or all-flash, the top CPU, memory and storage usage, and the most recent tasks. 

Snapshot of the SDDC Manager Dashboard

Figure 7. Snapshot of the SDDC Manager Dashboard

After the deployment of VMware Cloud Foundation, we can manage the entire infrastructure of SQL Server separately using the vCenter.   

Performance Test Results

SQL Server on Windows Platform

This test measured the impact of stressing a SQL Server 2017 on a Windows Server 2016 guest VM using Benchmark Factory for Databases to generate the OLTP workloads. We performed the performance test on the four virtual machines with each hosting one SQL Server instance with one user database. The number of users, which generate OLTP activity, were adjusted as needed to saturate the host. Repeatability was ensured by restoring a backup of the database before each run.  We kicked off the test concurrently from the test clients on the management cluster remotely, to avoid the side effect of the clients. The test duration was one hour with 15-minutes warm-up duration. Table 10 provides test result, which shows that the TPS on each virtual machine was at the same level; the total TPS was 10,074 and the average transaction time was at 0.031 seconds.  

Table 10. 4 x 100GB SQL Server Performance on Windows Platform with RAID-1

4 x 300GB on Windows (RAID-1)

TPS

Average Transaction Time (ms)

VM-1

2,464

31.9

VM-2

2,470

31.6

VM-3

2,565

30.6

VM-4

2,575

30.6

Total/Average

10,074

31.2

We monitored the vSAN VM performance using vSAN Performance Service when the four SQL Servers were handling the TPCE-like OLTP workload on the Windows platform with RAID-1. When the OLTP workloads were running, the overall read IOPS ranged from 147K to 337K, and the write IOPS ranged from 2.5K to 4.5K. The average write latency was less than 1.7ms (0.67ms-1.69ms) and the read latency was less than 0.9ms (0.36ms-0.84ms).  

Table 11 is the vSAN VM Performance of four 300GB databases on Windows platform with RAID-1.

Table 11. vSAN VM Performance running 4 x 300GB OLTP Workload on Windows Platform

vSAN VM Performance

Read IOPS

Write IOPS

Read Latency

Write Latency

14,700-33,700

2,500-4,500

0.36ms-0.84ms

0.67ms-1.69ms

Erasure Coding (RAID-5) for Data Virtual Disks

Table 12 shows the test results which occurred after changing the SPBM from the RAID-1 to Erasure Coding (RAID-5).  You may find that the overall TPS was kept at the same level of 100,58 and the average transaction time was also kept as 0.031 seconds. 

Table 12. 4 x 100GB SQL Server Performance on Windows Platform with RAID-5 for Data Virtual Disks

4 x 300GB on Windows (RC5)

TPS

Average Transaction Time (ms)

VM-1

2,502

31.5

VM-2

2,494

31.7

VM-3

2,498

31.4

VM-4

2,564

31.0

Total/Average

10,058

31.4

We monitored the vSAN VM performance using vSAN Performance Service when the four SQL Servers were handling the OLTP workload on the Windows platform with the Erasure Coding vSAN policy applied to the data disks (RAID-5). As shown in Table 13, when the OLTP workloads were running, the overall read IOPS ranged from 195K to 381K, and the write IOPS ranged from 2.8K to 4.6K. The average write latency was less than 1.9ms (0.97ms-1.84ms) and the read latency was less than 0.7ms (0.37ms-0.63ms). In comparison with RAID-1, more IOPS caused a slightly higher write latency because of the tradeoff of the RAID-5.

Table 13. vSAN VM Performance of 4 x300GB Databases on Windows Platform with RAID-5 Policy for Data Virtual Disks

vSAN VM Performance

Read IOPS

Write IOPS

Read Latency

Write Latency

19,500-38,100

2,860-4,670

0.37ms-0.63ms

0.97ms-1.84ms

SQL Server on Linux Platform

We tested the performance of using the same-sized database with the same test clients against the instances running in a RedHat Linux, aiming at the performance comparison between the platform of Windows and Linux. From the test result, shown in Table 14, you may find that the average TPS was 9,393 using RAID-1 policy, in comparison with the results on Windows platform the average TPS dropped slightly and the average transaction time increased a bit, under the same workload and configuration of the test client. 

Table 14. 4 x 100GB SQL Server Performance on Linux Platform with RAID-1

4 x 300GB on Linux (RAID-1)

TPS

Average Transaction Time (ms)

VM-1

2,315

34.0

VM-2

2,448

32.1

VM-3

2,304

34.1

VM-4

2,326

33.8

Total/Average

9,393

33.5

We monitored the vSAN VM performance using vSAN Performance Service when the four SQL Servers were handling the OLTP workload on the Linux platform with the default vSAN policy (RAID-1). Shown in Table 15, when the OLTP workloads were running, the overall read IOPS ranged from 193K to 356K, and the write IOPS ranged from 2.6K to 3.9K. The average write latency was less than 0.7ms (0.46ms-0.69ms) and the read latency was less than 0.8ms(0.38ms-0.71ms).  Comparing with the results on the Windows platform, disk performance was similar, and the application performance delivered slightly less at TPS and the Average Transaction Time increased a bit. 

Table 15. vSAN VM Performance of 4 x300GB Databases on Linux Platform with RAID-1

vSAN VM Performance

Read IOPS

Write IOPS

Read Latency

Write Latency

19300-35600

2680-3930

0.38ms-0.71ms

0.46ms-0.69ms

Performance Comparison

As shown in Figure 8, comparing with the baseline—4 x 300GB aggregated performance on Windows platform with RAID-1, enabling Erasure Coding (RAID-5) had less than one percent decline on the TPS (0.16%) and average transaction time in milliseconds (0.63%). Moving to Linux platform with the same workload (user number in BMF) got around 6.8 percentage performance decline on TPS (6.76%), and around 7.4 percentage performance decline on the average transaction time in milliseconds (7.37%).

4 x 300GB SQL Server Aggregated Performance Comparison

Figure 8. 4 x 300GB SQL Server Aggregated Performance Comparison

Note that we did the tests on the SQL Server CU13, and Microsoft continuously works on the updates for SQL Server 2017. Cumulative Update 16 for SQL Server 2017 on Linux provides better benefits on the performance.

Performance Scale-up

We verified the performance scale-up, including from vCPU and memory perspectives. Table 16 shows the performance scale-up result when doubling the vCPU number from 24 to 48. Table 17 shows the performance scale-up result when doubling both the vCPU number and the memory size. 

Table 16. Performance Scale-up Result when Doubling the vCPU Number

4 x 300GB (Windows)—48 vCPU

TPS

Average Transaction Time (ms)

VM-1

2,826

28.0

VM-2

2,927

27.0

VM-3

2,914

27.0

VM-4

2,960

27.0

Total/Average

11,627

27.3

Table 17. Performance Scale-up Result when Doubling the vCPU Number and Memory Size

4 x 300GB (Windows) - 48 vCPU+100GB memory

TPS

Average Transaction Time (ms)

VM-1

2,963

27.0

VM-2

2,883

27.0

VM-3

2,903

27.0

VM-4

2,939

27.0

Total/Average

11,688

27.0

As shown in Figure 9, comparing with the baseline that used 24 vCPU and 51GB memory for SQL Server instance, doubling the vCPU number could increase 15 percentages (15.42%) on the TPS, and improve12 percentages (12.66%) on the average transaction time in milliseconds. However, doubling the memory usage for the SQL Server instance could not gain obvious performance improvement. The performance improvement was 0.52% on TPS and 0.92 on average transaction time. This result demonstrates that the performance relies much more on the CPU resource than the memory for the TPC-E like workload. This conclusion was also verified by increasing the memory size for SQL Server instance only from 51GB to 100GB without increasing the virtual CPU number, which was not shown in Figure 9. In that validation, the TPS and average transaction time in milliseconds had no observable changes. 

Performance Scale-up 

Figure 9. Performance Scale-up 

SQL Server OLTP Performance of Large Databases

This validation demonstrated the performance results of a 2TB and a 4TB large-sized databases. We ran the same workload (user number in BMF) against the large database, and the test was running one single database exclusively, which means no other workloads or VM was actively running when one large DB test started. 

Table 18 showed the performance results on TPS and average transaction time in milliseconds. You may find that the TPS of the large databases can be kept above 2,000 (2,248 and 2,117 respectively), but the average transaction time increased to 0.035 seconds on the 2TB database, and to 0.037 seconds on the 4TB database.  

Table 18. Large Database Performance Results

Database Size

TPS

Average Transaction Time (ms)

2TB

2,248

34.9

4TB

2,117

37.3

We monitored the vSAN VM performance using vSAN Performance Service when the large SQL Server databases were handling the OLTP workload. As shown in Table 19, we found that the single but large database had higher read and write latency, especially when the database size was 4TB. The range of read latency was from 1.1ms to 9.3ms and the range of write latency was from 0.9ms to 6.4ms. 

Figure 19. Large Database vSAN VM Performance

vSAN VM Performance 

Database Size

Read IOPS

Write IOPS

Read Latency

Write Latency

2TB

8,770-12,300

1,050-1,640

0.67ms-1.2ms

1.0ms-1.6ms

4TB

9,000-11,800

890-1,730

1.1ms-9.3ms

0.9ms-6.4ms

Summary

  • Large databases (2TB and 4TB) can have predictable performance on the vSAN. We demonstrated the vSAN can support 2TB or 4TB large database with similar TPS level as smaller database like 300GB, while a slight increment on the average transaction time in milliseconds. Basically, larger databases have lower cache and working set (or the active DB size) compared with smaller database so the latency is considerably related with more outstanding I/O in the backend. 
  • Using large sized disk like 2TB or even larger sized disks for SQL Server databases worked fine and the performance is predicable. While you need to balance the management tradeoff between the granularity of the disk size, that is smaller sized VMDK (size less than 255GB) will not cause the vSAN internal striping of the virtual disk. However, you need configure multiple identical virtual disks for large user databases. Using smaller sized VMDK can bring benefits for more granular disk management, especially when you want to release partial space when a few large VMDKs configured.

SQL Server Concurrent Backup and Tail-Log Backup

This test was to evaluate the performance of backup for the four 300GB databases, using sequential backup order and concurrent backup. Also we practiced the tail-log backup for the database maintenance and provide the options for database administrators to restore database to point in time. 

The test results were shown in Figure 10. You may find that the sequential backup for the four 300GB databases on vSAN took around 1,946 seconds to finish, which was 2.75 time of the concurrent backup that took around 708 seconds to finish. And the throughput of the concurrent was almost 2.75 times of the sequential backup, which was around 1.8GB/sec. The concurrent backup was affected by other backup workloads on the same vSAN cluster that is why the concurrent backup could not achieve one-fourth of the backup time of the sequential backup.

Performance Comparison between Sequential Backup and Concurrent Backup

Figure 10. Performance Comparison between Sequential Backup and Concurrent Backup

Tail-Log Backup

This section is about tail-log backup. A tail-log backup captures any log records that have not yet been backed up (the tail of the log) to prevent work loss and to keep the log chain intact. For more details, refer to the scenario introduction that requires a Tail-Log backup.

When your database is online and operational after taking the backup, there may be more transactional operations happening especially for OLTP workloads. This means when we try to restore the database, it may be possible that we may lose some transactions.  You may also see the error message that your database should take the tail-log backup before restoring the database. To avoid losing the last few transactions, it is always a good suggestion to take tail-log backup with a NORECOVERY option. The following is the sample script to take the tail log backups. Note the option “with compression” is not necessary for backup but it is recommended for space saving efficiency from SQL Server perspective.

--back up database

backup database DB_NAME to disk='C:\DB_NAME.bak' with compression

go

--back up log of the db

backup log bmf300gb to disk ='C:\DB_NAME_log.bak' with compression

go

--back up the tail log of the db

backup log bmf300gb to disk = 'C:\DB_NAME_log_tail.bak' with norecovery, compression

go

-- set the database online and operational

restore database bmf300gb with recovery

go 

After you take your tail log backup, your database moves to restoring state, and no further transactions will be possible on it.  You may use the last script above to restore the database into normal or operational state. Or you want to restore the database to point of time. Restore your database per the following procedures: 

  1. Restore your full backup
  2. Restore any differential backups if available (if not, move to next step)
  3. Restore all the transactional log in the order since last full backup
  4. Restore the Tail-Log Backup with recovery option

Machine Learning Service in SQL Server 2017

Overview

The machine Learning requirement is rapidly growing in the market, where enterprise customers are doing more data analysis and AI. Machine learning takes a lot of computational power and storage for data repository so the costs to support ML can be expensive. Running Machine Learning Service in SQL Server 2017 on VMware vSAN has the following benefits. 

  • vSAN offers a best-in-class, cost effective HCI solution for daily workload supportability and can be better platform if self-data management, analysis and prediction can be done within the platform in a Machine Learning service fashion. 
  • VMware provides rich vSphere and vSAN automation SDK to manage, monitor virtual machine and for data analytics.  
  • SQL Server 2017 Machine Learning Services add the capability to run code to where the data resides, eliminating the need to pull data across the network. And SQL Server provides the space to store the raw and processed in the database on vSAN directly.

SQL Server 2017 Machine Learning Services

SQL Server 2017 Machine Learning Services is an add-on to a database engine instance, used for running R and Python code on SQL Server. The feature includes Microsoft R and Python packages for high-performance predictive analytics and machine learning. Code runs in an extensibility framework, isolated from core engine processes, but fully available to relational data as stored procedures, as T-SQL script containing R or Python statements, or as R or Python code containing T-SQL. 

With SQL Server 2017, we can execute R and Python code remotely in SQL Server from any IDE like Jupyter Notebook. This eliminates the need to move data around. Instead of transferring large and sensitive data over the network or losing accuracy with sample csv files, you can have your R/Python code execute within your database, and then send function execution to SQL Server bringing intelligence to where your data lives.

Figure 11  is an example using Jupyter Notebook to execute the sample code in the Microsoft blog and display the Iris scatter data in image.

Iris Scatter Plot using Jupyter Notebook

Figure 11. Iris Scatter Plot using Jupyter Notebook

And you may also use SQL Server Report Service to show the table, or correlation between features in the same dataset. Figure 12 shows examples using SQL Server 2017 Report Server to display the scatter plot with Iris dataset (relationship between Sepal Length and Sepal Width), and the scatter plot with Iris dataset (relationship between petal length and petal width).  Report Server combining with Python data science library provide multiple choices to exhibit, integrate and analyze the data stored in the database.

Scatter Plot—Relationship between Petal Length and Width, and Relationship between Sepal Length and Width

Figure 12. Scatter Plot—Relationship between Petal Length and Width, and Relationship between Sepal Length and Width

In this solution, we demonstrated how to use machine learning to monitor and predict a virtual disk usage in a virtual machine on vSAN, to provide an example of a practical use scenario. 

Architecture of an Example of Machine Learning in SQL Server 2017 

Figure 13 is an architecture using Machine Learning in SQL Server 2017 to predict the disk usage in a virtual machine in a vSAN environment. We designed the architecture to predict space consumption of a targeted or monitored VM on vSAN. 

We installed the Python client (Jupyter Notebook) with required Python library on the target virtual machine to emulate the disk consumption.  We monitored the disk usage in a given time slot interval from the client and then wrote the disk usage into a SQL Server table for analysis later. 

Architecture of Machine Learning to Predict Virtual Disk Usage

Figure 13. Architecture of Machine Learning to Predict Virtual Disk Usage

Build Model

Figure 14 is the workflow for the prediction of the space usage of a virtual disk in a virtual machine. First we collect the data (space usage of a virtual disk). Two cycles of the data were gathered, and after the first cycle, the disk was cleaned. We used the first cycle of the data as the test data to train for the parameters of the Polynomial model, then used the second cycle data to calculate the prediction accuracy.  

We gathered three columns, and they are time stamp, free space and total space using Python and insert into SQL Server table(see the sample Python code for getting disk information from VM. In the Prefix data stage, we use computed columns in T-SQL to pick up the interested column—timestamp, and used space. 

We used the linear regression model is to find a relationship between the time stamp and used space which is a continuous target variable for the prediction. 

Machine Learning Model Building Workflow

Figure 14. Machine Learning Model Building Workflow

Data Consumption Distribution

To emulate the data usage of a virtual disk, we used the Gaussian distribution using numpy.random.normal function to carve a given sized disk into slice, with the slice size subjected to the Gaussian distribution.  We set the target disk consumption to 100GB, then used the Gaussian distribution to cut off the disk into 240 pieces following the order of the probability density of the normal distribution. Figure 15 showed the slice size in percentage after normalizing the data distribution to the range [0…1]. 

Virtual Disk (space usage) Consumption Distribution

Figure 15. Virtual Disk (space usage) Consumption Distribution

Monitor Disk Usage

We monitored the disk usage periodically and wrote the space usage changes into SQL Server table with time stamp column, used space and total space. In total, we had 240 unique changes recorded in the table. We cleaned up the written files to the virtual disk after 240 unique data gathered and started a new written cycle. Thus, we have two cycles recorded in the table. One is for curve fit and parameter learning, and another is to verify the prediction.  Figure 16 showed the capacity usage of the virtual disk in two cycles. 

Virtual Disk (space usage) Consumption in Two Cycles

Figure 16. Virtual Disk (space usage) Consumption in Two Cycles

We used the polynomial function to perform the curve fit for the samples gathered from the periodically written data which emulated the data written to the monitored VM. Then we used Python application to write data into the virtual disk. We created another application by Python to monitor the space usage and free space of the VM and wrote the data into SQL Server table. Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y). Returns a vector of coefficients that minimizes the squared error in the order degdeg-1, … 0.

Test Result

We tested different degrees of the polynomial for the best fit result with the minimal summation of squares of the fit errors. From Table 20 you can find degree=3 can fit the test data points very well already, although higher degree can have better summation of squares of the fit errors. And you may also find the odd degree can be better comparing with the even degree for curve fit. Higher degrees can help on the accuracy. 

As shown in Figure 17, the predicted function curve with degree=3 went through the collected data, and described the distribution quite well.

Curve Fit Result in Degree 3

Figure 17. Curve Fit Result in Degree 3

Table 20. Sum of Squares of the Fit Errors on Different Degrees in Polynomial Curve Fit

Degree

2

3

4

5

6

7

8

9

Sum of squares of the fit errors

10169.78

896.49

896.49

312.62

313.16

170.86

168.21

115.67

Summary

Using SQL Server 2017 combining with the Python library can be used for the machine learning model building and data analysis, and prediction. Meanwhile VMware provides rich APIs for virtual machine operations, which can be much helpful for using machine learning to do data analytics for the managed virtual machine on vSAN.  

Conclusion

VMware vSAN is optimized for modern all-flash storage with efficient near-line deduplication, compression, and erasure coding capabilities that lower TCO while delivering incredible performance.  

VMware Cloud Foundation makes it easy for us to deploy and run a hybrid cloud. In this solution, we deployed and managed a vSAN environment with ease using VMware Cloud Foundation.

We addressed the common questions, the reasons for running SQL Server on vSAN, and how to start from scratch to build your SQL Server on vSAN. We verified various-sized databases and different scenarios including changing SPBM, scale-up.  vSAN provides great performance in our demonstrations. We substantiated that vSAN can support large databases with large but single VMDK for OLTP workload of SQL Server.  

We proved that OLTP workload was comparable on Linux platform running SQL Server 2017 with the Windows platform, backed by vSAN. We also compared the concurrent and sequential backup performance, and showed log maintenance using Tail-Log backup. 

We also leveraged machine learning to build disk usage prediction model of SQL Server 2017 and this can be widely used for future data analytics works.

Reference

About the Author

Tony Wu, Senior Solution Architect in the Solution Architecture team of the HCI Business Unit wrote the original version of this paper. 

The follow colleagues also contributed to the VMware Cloud Foundation parts of the paper:

  • Charles Lee, Solutions Architect in the Solutions Architecture team of the HCI Business Unit 
  • Jim Senicka, Director of Technical Marketing of the HCI Business Unit
  • Kevin Tebear, Staff Technical Marketing Architect of the HCI Business Unit
  • Kyle Gleed, Manager of technical marketing of the HCI Business Unit

Filter Tags

  • Reference Architecture
  • Cloud Foundation