Microsoft SQL Server Workloads and VMware Cloud on AWS: Design, Migration, and Configuration

Overview

Introduction

This document is dedicated to design, migration, and configuration of Microsoft SQL Server (further referenced as SQL Server) workloads on VMware Cloud™ on AWS. The Architectural Guidelines and Operational Considerations section covers the necessary technical prerequisites, along with the architectural guidelines and operational considerations, to plan migration of SQL Server workloads to VMware Cloud on AWS. The Standalone SQL Server Migration section demonstrates the way in which to migrate a virtual machine (VM), hosting a standalone SQL Server instance, from an on-premises data center to VMware Cloud on AWS. Always On Availability Group Migration and Failover Cluster Instance with shared disk Migration sections are detailing steps to move both predominant types of clustered SQL Server workloads to VMware Cloud on AWS. The Post-Migration Configuration sections provides necessary steps to optimize the migrated SQL Server workloads. The examples and considerations in this document provide guidance only, as varying application requirements can result in many valid configuration possibilities.

Users should consider reviewing Architecting Microsoft SQL Server on VMware vSphere® before continuing with migration planning to ensure that all best practices for hosting SQL Server workloads on VMware vSphere® are met.

Audience

This document assumes a knowledge and understanding of VMware vSphere and SQL Server. Architectural staff can use this document to gain an understanding of how the system will work as they design and implement various components. Engineers and administrators can use this document as a catalog of technical capabilities. Database administrator (DBA) staff can use this document to gain an understanding of how SQL Server might fit into a virtual infrastructure. Management staff and process owners can use this document to help model business processes to take advantage of the savings and operational efficiencies achieved with virtualization and hybrid cloud.

Architectural Guidelines and Operational Considerations for Moving SQL Server to VMware Cloud on AWS

VMware Cloud on AWS allows users to create vSphere-based data centers (SDDCs) on AWS. Each deployed SDDC includes VMware ESXi™ hosts, VMware vCenter Server®, VMware vSAN™, VMware NSX® components and other software. The same HTML5-based vSphere Client vSphere Client used to manage a SDDC once deployed. As shown previously for Oracle and Exchange workloads, this approach allows seamless migration of demanding application workloads without the need to adopt new toolset or refactor application components.

Planning and designing phase is very important to ensure that migrations of mission critical application workloads to the VMware Cloud on AWS complete without negatively impacting application SLAs or affecting the performance, availability, manageability and scalability of the workloads. This document summarizes architectural guidelines which will help enterprises when planning a migration.

Architecture Guidelines

This section provides a summary of guidelines and approaches to consider when planning a migration of SQL Server workloads to VMware Cloud on AWS, including use cases to guide requirements gathering and technical prerequisites to facilitate migration.

Use Case Definition

The following uses cases have been identified as the most frequently employed for SQL Server workloads on VMware Cloud on AWS:

  • Data center extension
  • Cloud Migration
  • Disaster recovery with site recovery
  • Each particular use case, or combination of use cases, influences general solution design and necessitates appropriate requirements gathering. This document focuses on use cases specific to data center extension and evacuation.

Rightsizing

Before considering where to place SQL Server workloads on a cloud platform, ensure your VM container is rightsized. A workload’s performance profile should be collected over a sufficient period of time to reflect applications spikes in resource utilizations. While defining the required time range to collect time series data, consult with DBAs and application owners to understand the workload profile. At least a full month of “non-rolled up” time series data is recommended prior to execute the performance analysis.

Utilizing Blue Medora SQL Server management pack with vRealize® Operations Manager™ is proven to be very helpful in this preparation phase. While analyzing captured data, make sure your rightsizing approach has been agreed upon by administrators, applications owners and business owners, and that it comprehends both spikes (high performance) and average utilization (higher density).

The following should be considered while sizing SQL Server workloads:

  • For CPU and memory resources allocation, check the available host configurations for VMware Cloud on AWS to verify the workload will fit and not overcommit host resources.
  • Account for differences in physical CPU architectures between your current environment and the host instances used in VMware Cloud on AWS.
  • Always size the CPU resource based on the actual workload, as vCPU can be easily added later.
  • The storage layer in VMware Cloud on AWS is provided by VMware vSAN - hyperconverged infrastructure (HCI) solution, if using an Amazon EC2 I3.metal instance. Adding storage will require the addition of compute resources (hosts) as well. As an alternative, for workloads with the primary capacity requirements use Amazon Elastic Block Store (EBS) with Amazon EC2 R5.metal hosts. An I3 instance still should be your primary choice for a performance OLTP workloads.

Requirements

A crucial part of a successful migration is collecting business and technical requirements, allowing you to properly design a cloud platform. For guidance, review Preparing for VMware Cloud on AWS before beginning your requirements gathering.

Business requirements are an important part of the requirements gathering process. Input examples include:

  • RTO/RPO targets
  • Business SLAs for the applications workloads based on SQL Server databases
  • Licensing considerations
  • Security and data-management considerations

Technical requirements will directly influence logical design and should be collected and validated with care. Pay special attention to the following bullet points:

  • Performance requirements of the workload (for example,  transactions-per-second, number of user connections, expected future workloads changes).
  • Capacity requirements (for example, future growth, other projects to be served).
  • Manageability requirements (e.g., providing access to a SDDC to appropriate user groups, reconfiguring monitoring tools, backup solution in use, modifying scripting, vRealize Operations workflows).
  • Scalability requirements (for example, method for increasing capacity of a SDDC, scale-out versus scale-in approach).
  • Availability requirements (for example, SQL Server high-availability solutions in use, DRS groups, host-isolation response, number of availability zones required).
  • Application requirements (for example, type of workloads [e.g., OLTP/data warehouse], dependencies between on-premises components and network flow between them).

Risks, Assumptions, and Constraints

Ensure that risks, assumptions, and constraints are identified and documented and that the risk-mitigation plan has been agreed by all groups involved.

An example of an assumption: is “software assurance” for SQL Server licenses present?

An example of a constraint: available network bandwidth between on-premises and SDDC (Is AWS Direct Connect (DX) available?)

An example of a risk: different CPU generations between on-premises ESXi hosts and hosts in an SDDC.

High-Level Architecture

A high-level solution architecture should include enough information to capture the on-premises environment hosting the SQL Server workloads and planned SDDC(s) with multiple availability zones (AZ) while also providing enough details to work on logical design.

A diagram of a cloud computing workload

Description automatically generated

High Availability Options

VMware Cloud on AWS supports multiple availability options for SQL Server workloads, either platform or SQL Server specific.

Platform based options are the following:

  • vSphere High Availability (HA) – turned on by default on all clusters in a SDDC on VMware Cloud on AWS.
  • Multi-AZ deployment (Stretched Cluster) – provides resiliency for an AZ failure if selected.

SQL Server options are:

Always On Availability Groups (AGs) – provides resiliency on the database level. Fully supported without restrictions for all types of AGs deployments and availability modes. Does not require a shared storage between VMs.

Always On Failover Cluster Instances – provides resiliency on the instance level, require shared storage. Supported for 2/4/8 nodes deployments with up to 64 clustered disk resources. Require setting the SCSI Bus Sharing for a vSCSI controller to Physical. More details can be found in this article and in the dedicated section in this document.

The platform-based and SQL Server native options can be combined to achieve the availability requirements of your SQL Server workloads.

Logical Design

A logical design describes all technical decisions made and addresses identified technical requirements while minimizing risk. The level of detail included should be sufficient to create an implementation guide for the solution. While specifics of each logical design are unique, it is important to ensure all technical prerequisites are met. The following prerequisites have been identified as crucial to the successful migration of SQL Server workloads to VMware Cloud on AWS:

For on-premises-located VMs, ESXi hosts, and/or vSphere clusters hosting SQL server workloads, check and document all advanced settings configured. Ensure corresponding options are available in VMware Cloud on AWS. For example, DRS anti-affinity groups and rules must be re-created in an SDDC as they cannot be migrated.

Check the Hybrid Migration with vMotion Checklist and ensure all requirements are met.

If VMware HCX will be used to migrate the workload, review the HCX in the VMware Cloud on AWS document and ensure that all requirements are met.

Configure L2 VPN between the on-premises and the SDDC and ensure all required VM networks will be stretched. VMware NSX® Edge™ is required to be deployed on-premises to serve as a client for the L2 VPN setup. HCX can be used as an alternative way to provide L2 network extension.

Ensure that the Hybrid Linked Mode will be configured to allow managing both on-premises and public SDDCs within a single vSphere Client interface.

Operational Considerations

Post-implementation maintenance and operation guidelines are a key component of any well-prepared infrastructure architecture. While incorporating VMware Cloud on AWS SDDCs in an existing infrastructure, it is critical that Day 2 operational routines are updated accordingly, including:

  • Backup configurations
  • Monitoring configurations
  • Operational documentations

If vRealize Operations Manager is used to monitor the hybrid environment, confirm that all SDDCs are added to vRealize Operations-managed resources and configured using vCenter Adapter with the Public Cloud option.

Migrating Standalone SQL Server Workloads to VMware Cloud on AWS

The purpose of this section is to demonstrate the migration of a VM hosting a standalone SQL Server instance to VMware Cloud on AWS using migration with vMotion.

Migration Preparation

Verify that your infrastructure is ready to start the migration (for example, check technical prerequisites listed in Architectural Guidelines and Operational Considerations and make sure the appropriate design document and implementation and operational guides are created).

Hybrid Linked Mode

Hybrid Linked Mode allows you to link your VMware Cloud on AWS vCenter Server instance with an on-premises vCenter Single Sign-On domain.

Using hybrid linked mode, you can:

  • View and manage the inventories of both your on-premises and VMware Cloud on AWS data centers from a single vSphere Client interface, accessed using your on-premises credentials.
  • Migrate workloads between your on-premises data center and cloud SDDC.
  • Share tags and tag categories from your vCenter Server instance to your cloud SDDC.
  • You have two options for configuring Hybrid Linked Mode. You can use only one of these options at a time.
  • You can install the vCenter Cloud Gateway Appliance and use it to link from your on-premises data center to your cloud SDDC. In this case, SSO users and groups are mapped from your on-premises environment to the SDDC, and you do not need to add an identity source to the SDDC LDAP domain.
  • You can link your VMware Cloud on AWS SDDC to your on-premises vCenter Server. In this case, you must add an identity source to the SDDC LDAP domain.

NOTE: HCX can also be used to migrate standalone SQL Server workload.

Network Configuration

The following network configurations are recommended before migrating SQL Server workloads:

  • L2 Extended networks for all on-premises VLANs used for SQL Server Instances network configurations. L2 VPN Service provided by NSX-T or HCX can be used to extend networks.

A screenshot of a computer

Description automatically generated

A screenshot of a computer

Description automatically generated

  • HCX provides detailed information on the source Port Group and Destination Network. Native L2 VPN requires to explicitly map Tunnel IDs with on-premises port groups.
  • AWS Direct Connect (DX) to ensure low latency, reliable network connection from on-premises to VMware Cloud on AWS to enable live vMotion.

Configuration Tested

A VM hosting a SQL Server workload was prepared for this test migration using the following settings:

  • 8 vCPU and 32 GB of RAM (within a pNUMA node of the hosting ESXi host)
  • Database disk placed on a separate PVSCSI controller
  • Windows Server 2016 SP1 Operating System
  • Microsoft SQL Server 2016 SP1 RDBMS

A screenshot of a computer

Description automatically generated

Migration of Standalone SQL Server Workload Using Hybrid Linked Mode: Step-By-Step Guide

Perform the following steps for the migration process:

  1. Log in using an account with the access rights to both on-premises and the SDDC (check your Hybrid Linked Mode configuration for more details). Verify that both on-premises and SDDC vCenters are visible in inventory:

  1. Initiate a transition of the VM vSphere Client using Migrate action from the VM Actions menu. This will start the VMware vSphere ® vMotion® wizard which will guide you through the settings necessary to prepare for moving the workload:

  1. Select Change both compute resource and storage (vMotion Without Shared Storage):

  1. Select a compute resource. Expand the vCenter Server managing your SDDC in VMware Cloud on AWS and select the Compute-ResourcePool resource pool as the destination (selecting a cluster or an individual host is not supported):

  1. Select a storage: choose WorkloadDatastore from the list of available storages (this datastore is based on VMware vSAN storage):

  1. Select a folder in which to place the migrated VM: As a best practice, create a folder in advance to store VMs (based on the account or the workload type). You must select the Workload or Template folders to continue:

  1. Select a network: use the information collected before starting the vSphere vMotion wizard to identify the right port group. Use a port group of the Extended network type if changes to the IP address of the VM are not desirable. Select the required Destination Network from the drop-down list:

  1. Finish the wizard:

A screenshot of a computer

Description automatically generated

Migration has been started and the progress is shown on the Recent Task screen in vSphere Client:

  1. Check that the vSphere vMotion operation has been completed and that the VM is now managed by the vCenter Server in VMware Cloud on AWS:

A screenshot of a computer

Description automatically generated

  1. Ensure that you can connect to the SQL Server instance running on the migrated VM using your favorite SQL Server management tool (SQL Server Management Studio [SSMS] in this case):

VMware HCX

VMware HCX, an application mobility platform, simplifies application migration, workload rebalancing, and business continuity across data centers and clouds. VMware HCX enables high-performance, large-scale app mobility across vSphere and non-vSphere cloud and on-premises environments to accelerate data center modernization and cloud transformation.

The Key Capabilities of HCX may include:

  • Perform a bulk migration of live VMs – Simply schedule the movement of hundreds of VMs in parallel.
  • Utilize simple migration planning tools – Easily identify application and workload relationships, and logically group VMs for efficient migration.
  • Enable mobility across data centers and clouds – Move VMs within your data center—from your local data center to the cloud, or across cloud regions or providers—to optimize resource utilization.
  • Migrate with zero downtime – Don’t worry about IP re-architecting.
  • Migrate across any vSphere versions (6.0+) – Eliminate the need to invest in bringing both sites up to parity, enabling you to modernize your data centers with the full software-defined data center (SDDC)/VMware Cloud Foundation™ stack, managed service, or infrastructure as a service (IaaS).
  • Migrate non-vSphere workloads – Migrate KVM and Hyper-V workloads to. current vSphere versions compatible with full VMware Cloud platforms
  • Work across the WAN and LAN – Enable a unique model of infrastructure with a mix of private, public and hybrid clouds, based on workload requirements.
  • Seamlessly extend your network and IP space – Extensively reduce complexity and ensure your IP addressing policies, security policies and administrative boundaries are not broken.

Configuration Tested

A VM hosting a SQL Server workload was prepared for this test migration using the following settings:

  • 8 vCPU and 32 GB of RAM (within a pNUMA node of the hosting ESXi host)
  • Database disk placed on a separate PVSCSI controller
  • Windows Server 2022 Operating System
  • Microsoft SQL Server 2022 RTM RDBMS

For more details about HCX configuration between your on-premise datacenter and VMware Cloud on AWS, please refer to VMware HCX in the VMware Cloud on AWS.

Migration of Standalone SQL Server Workload using VMware HCX: Step-By-Step Guide

Perform the following steps for the migration process:

  1. Log in using an account with the access rights to the on-premises vCenter Server and open the VMware HCX plugin.

Graphical user interface, application

Description automatically generated

  1. Under Services > Migration tab, select Migrate to start migrating you on-prem SQL Server virtual machine to the cloud.

Graphical user interface

Description automatically generated


  1. Select the Microsoft SQL Server virtual machine. Click ADD to the migration list.

Graphical user interface, application

Description automatically generated

  1. In Destination Compute Container, select a cluster or host in the VMware Cloud on AWS where you want to migrate the SQL Server virtual machine.

Graphical user interface, application

Description automatically generated

  1. In Destination Folder Container, select a destination folder for the SQL Server virtual machine. As a best practice, its recommended to separate the different workloads in your VMware Cloud on AWS.

Graphical user interface, text, application

Description automatically generated

  1. In Destination Storage, select WorkloadDatastore from the list of available storages in VMware Cloud on AWS.

Graphical user interface, text, application

Description automatically generated

  1. In Disk Provisioning, keep Same format as source

Graphical user interface, text, application, email

Description automatically generated

  1. In Migration Profile, select vMotion.

Graphical user interface, text, application, email

Description automatically generated

  1. (Optional) In Extended Options, select “Migrate Custom Attributes”.

Graphical user interface, text, application

Description automatically generated

  1. Make Sure the validation is completed successfully without errors.

Graphical user interface, text, application, email

Description automatically generated

  1. Click GO to start the migration. The migration progress may take several minutes depending on your network connection between on-prems to the cloud.

Graphical user interface, text, application, email

Description automatically generated

  1. As migration has completed, access the virtual machine using VMware Remote Console from the vSphere Client in VMware Cloud on AWS. Open SQL Server Management Studio and verify the connection to the standalone SQL Server instance.

Graphical user interface, application, Word

Description automatically generated


Standalone Server Migration Procedure Summary

VMware has successfully tested the live migration of a VM hosting an instance of SQL Server without impact to the workload using Hybrid Link Mode or HCX. Customer-specific experiences may vary due to different VM sizes, workload patterns, and connection-dependent bandwidth. It is recommended that customers test their own live migrations with non-production systems first, or with systems with less strict SLAs. Some workloads may be candidates for live migration while others may be cold migrated (requiring some downtime). The demo of the Migration of SQL Server workload can be found here.

Migrating SQL Server Always on Availability Group from an On-Premises Data Center to VMware Cloud on AWS

This section outlines the technical steps required to migrate SQL Server instances hosting a database(s) configured for Always on Availability Groups (AGs). The following pages demonstrate the migration of two VMs, each of them hosting a SQL Server instance with a database configured for AGs.

To prepare for the migration of VMs hosting SQL Server instances with a database configured for AGs, check first the prerequisites outlined in Architectural Guidelines and Operational Considerations and Standalone SQL Server Migration, as all of these apply. In addition, the steps discussed below must be taken to successfully migrate the workload.

On-Premises DRS Configuration

DRS anti-affinity groups and rules must be configured in on-premises vSphere Cluster to separate VMs hosting a SQL Server Always On workload. Creation of such rules is required to align with the supported cluster configuration on the vSphere platform and to avoid situations when both VMs will reside on a single host, creating a single point of failure. Migrating a VM with DRS overrides is not supported and will fail on attempt, returning the following error message:

A VM must be removed from all DRS rules and groups before migration. Once the workload is migrated, appropriate rules and groups must be re-created on the SDDC where the workload is now hosted. As a best practice, make sure the configured DRS rules and groups are documented before removing VMs.

Sequence of Migration

VMware suggests first moving all VMs hosting SQL Server instances with a secondary replica of the database in an Availability Group. If the replica is configured as a readable secondary, check with your DBA staff, application team, and any consumers of the replica (e.g., backup, reporting suites), to make sure they are prepared to connect to the database at its new location. Transfer of the identified applications, as well as VMs hosting those applications, to VMware Cloud on AWS may be required:

A screenshot of a computer

Description automatically generated

Availability Modes (Always on Availability Group)

A database configured in an Availability Group might have different availability modes (synchronous-commit or asynchronous-commit). Databases using synchronous-commit mode are required for more investigations. In this mode “the secondary replica writes incoming log records to disk (hardens the log) and sends a confirmation message to the primary replica.”

A screenshot of a computer

Description automatically generated

The migration of the AGs infrastructure is a step-by-step process. When the migration of one VM to VMware Cloud on AWS is completed leaving the second VM on-premises, it’s expected that the latency reflected in the metrics Primary Replica Commit Time and Harden Time will increase.

The figure below represents time series data for the referenced metrics when both VMs are located within a single datacenter:

A screenshot of a computer

Description automatically generated

 

The figure below represents time series data when the VM hosting a secondary replica in synchronous-commit mode is migrated to the SDDC:

A screenshot of a computer

Description automatically generated

To avoid potential increased application latency, changing the database Availability Mode to asynchronous-commit during the migration may be required. This change will also require switching Failover mode to manual. If changing the Availability Mode is not possible, plan the migration to occur during the maintenance window or when load on the database is minimal.

NOTE: Under asynchronous-commit mode, the secondary replica never becomes synchronized with the primary replica.

Stretched Cluster for an SDDC

Creating an SDDC with a stretched cluster to span ESXi hosts between two or more AZs in AWS infrastructure provides additional protection for mission-critical workloads on VMware Cloud on AWS. This setup also enables hosting of clustered applications with demanding availability requirements. Note that the minimum number of hosts in the stretched cluster based SDDC is six. VM DRS anti-affinity rules must be created to enforce placing of VMs between hosts in different AZs.

NOTE: For VMs hosting Availability Groups workloads and utilizing the synchronous-commit mode, placement within different AZs may increase Commit Time and Harden Time for the remote replica, thus introducing latency into the workload.

High-Level Solution Architecture and Configuration Tested

Both VMs hosting a SQL Server instance with a database in Availability Group are initially located within the same on-premises datacenter and vSphere HA Cluster, and are configured as follows:

  • 4 vCPU and 8 GB of RAM (within pNUMA of the hosting ESXi host)
  • Database disk placed on separate PVSCSI controller
  • Windows Server 2016 SP1 Operating System
  • Microsoft SQL Server 2016 SP1 RDBMS
  • A database with NVDIMM configured for HA using the synchronous-commit availability mode

The target SDDC in VMware Cloud on AWS (the host for the migrated workload) is configured as a stretched cluster spread between two AZs:

A screenshot of a computer

Description automatically generated

All best practices for hosting SQL Server workloads on VMware SDDC were followed as outlined in Architecting Microsoft SQL Server on VMware vSphere.

The High-Level Solution Architecture is depicted below:

A diagram of a cloud server

Description automatically generated

Migration Procedure

The steps outlined in this section are an addition to those described in Standalone SQL Server Migration for standalone workloads. Notations are included to indicate when the procedure from the Standalone SQL Server Migration should be followed.

Identifying VMs Hosting SQL Server Workloads

Locate all SQL Server instances hosting the availability group. A given database in the availability group may have as many as eight replicas and each instance of SQL Server may host a number of databases as primary or secondary replicas. Use SQL Server Management Studio (SSMS) or consult your DBA team to accomplish this task.

Here, both highlighted SQL Server instances are hosting the single availability group SQL-VMC-AG1-DB1, while server 10.128.137.151 hosts primary replica and server 10.128.137.201 hosts secondary replica:

A screenshot of a computer

Description automatically generated

Locate all VMs hosting identified SQL Server instances. Use the DNS name and the IP address to identify VMs in question or consult your configuration management database (CMDB). As a best practice, and if you have not done so already, place the identified VMs in a separate vSphere Folder.

A screenshot of a computer

Description automatically generated

In this instance, the VM Oleg-VMC-SQLAAG01 is hosting the primary replica and the VM Oleg-VMC-SQLAAG02 is hosting the secondary replica. Oleg-VMC-SQLAAG02 will be migrated first.

Checking DRS Configuration

Verify that VMs to be migrated are included in DRS anti-affinity rules or groups. Checking the Cluster settings menu via vSphere ClientvSphere Client or using the following PowerCLI Code (a PowerCLI version greater than 6.5.2 is required. ) is recommended.

The following PowerCLI Code can be used to locate configured DRS Groups:

Get-VM |
Select Name, @{N='DRSGroup';E={$script:group = Get-DrsClusterGroup -VM $_; $script:group.Name}},
     @{N='GroupType' ;E={$script:group.GroupType}}

The following PowerCLI Code can be used to locate configured DRS Rules:
Get-DrsRule -Cluster 'TSA-WDC-65-Clus01' -VM "Oleg-VMC-SQLAAG02"

A blue screen with white text

Description automatically generated

Remove all VMs to be migrated from all DRS overrides (including rules and groups) using vSphere Client or PowerCLI.

NOTE: A DRS anti-affinity rule containing just two VMs must be removed completely. Removing one VM at a time is not supported.

A screenshot of a computer

Description automatically generated

Ensure that all required networks are available in the target SDDC. It can be helpful to move a test database workload first, to verify that the network connectivity functions as expected. Refer to Standalone SQL Server Migration for detailed information about obtaining the network configuration.

Preparing Target SDDC

Create a new folder in your SDDC for placement of migrated VMs:

A screenshot of a computer

Description automatically generated

Reconfiguring Database

Change Availability Mode to asynchronous-commit and Failover mode to manual (or schedule the migration during off-peak hours):

A screenshot of a computer

Description automatically generated

Workload Migration – Hybrid Linked Mode

It is recommended to first migrate all VMs hosting secondary replica(s), saving the primary replica VM host for last. Refer to the detailed step-by-step guide in Standalone SQL Server Migration Using Hybrid Linked Mode to migrate the workload. It is also useful to do the following while working with the vMotion wizard:

Step 1: Select the folder created within your target SDDC for workload hosting.

Step 2: Select the network port group, which is based on an L2 VPN network. If such network is not available, the database must use asynchronous-commit mode and the VM IP address must be changed after migration.

A screenshot of a computer

Description automatically generated

Ensure that the vMotion operation completes successfully:

A screenshot of a computer

Description automatically generated

Following the steps as outlined above, proceed with the migration of VMs hosting other secondary replicas. Proceed with the migration of a VM hosting primary replica, again following the steps outlined above. Verify that all VMs are migrated, located in the correct vSphere folder, and are in running state:

A screenshot of a computer

Description automatically generated

Workload Migration – VMware HCX

High Level Overview

  1. Make sure the layer 2 VPN tunnel and extended network segment is configured properly on VMware Cloud on AWS so that the AG VM node IP do not require to change.
  2. It is recommended to first migrate one of the VMs hosting the secondary replica with asynchronous-commit mode. This can avoid potential workload impact against primary replica which may still serve the production workload.
  3. After the first VM hosting the secondary replica has been successfully migrated to the VMware Cloud on AWS, reconfigure the replica with synchronous-commit mode to sustain the AG replication until synchronized mode.
  4. Once the synchronization is completed, perform a failover and switch the secondary replica on VMware Cloud on AWS side to the primary replica.
  5. Continue migrate all of the VMs hosting the secondary replicas from on-premises to VMware Cloud on AWS (asynchronous-commit mode during migration).
  6. Do NOT use HCX bulk migration for SQL Server VMs with AG configured.

Migration Steps
 

  1. In SQL Server Management Studio, open availability group dashboard. Navigate the primary replica and secondary replica(s). In this example, we set up an availability group with 3 replicas.

A screenshot of a computer

Description automatically generated

 

  1. Select one of the secondary replicas, in this example – “SQL-Node3”. Change Availability Mode to Asynchronous commit for the secondary replica to be migrated. Modify Failover Mode to Manual for all the replicas before start migration. A screenshot of a computer

Description automatically generated

After applying the changes, the availability replica to be migrated should be in “Synchronizing” state.

A screenshot of a computer

Description automatically generated

  1. Migrate the secondary replica to VMware Cloud on AWS using HCX. Follow the steps described in Standalone SQL Server Migration using HCX. Make sure the migration of the secondary replica is successful.

A screenshot of a computer

Description automatically generated with medium confidence

  1. In availability group wizard, change the Availability Mode back to Synchronous commit. The AG will continue synchronizing data to the migrated secondary replica which is now in VMware Cloud on AWS.

A screenshot of a computer

Description automatically generated with medium confidence

  1. Perform a failover of the primary replica to the SQL instance already existing in VMware Cloud on AWS. In this example, “SQL-Node3” will be the new primary replica.

A screenshot of a computer

Description automatically generated with medium confidence

  1. Repeat step 1 to step 4 for the rest of the replicas in the availability group. After all the replicas are migrated to VMware Cloud on AWS, open SQL Server Management Studio and check the availability group dashboard. Alter the database Failover Mode to Automatic for all the replicas.

A screenshot of a computer

Description automatically generated

     (Optional) Failover the primary replica back to the original SQL instance, in this example, SQL-Node1.

A screenshot of a computer

Description automatically generated

Post-Migration: Reconfiguring Availability Mode

After the migration recheck the availability mode for the availability replica(s), the availability mode of the secondary replica can be changed back to synchronous commit.

Using the SSMS, navigate to the Availability Group menu, select Secondary replica(s) line and change both Availability Mode to synchronous-commit and Failover Mode to automatic:

A screenshot of a computer

Description automatically generated

Migrating Always on Availability Groups Summary

While the migration of tier two and three workloads can be executed live during off-peak hours, the migration of tier one workloads and heavy loaded databases may require downtime or even the usage of a backup and restore operation to ensure a consistent user experience.

Review the instructions in Post-Migration Configuration, to ensure operation and maintenance of the migrated workload is consistent.

Migrating SQL Server FCI Cluster with Shared Disks to VMware Cloud on AWS

This section outlines the technical steps required to migrate a SQL Server Failover Cluster Instance (FCI) with shared disks. To prepare for the migration of VMs hosting SQL Server FCI, check first the prerequisites outlined in Architectural Guidelines and Operational Considerations and Standalone SQL Server Migration sections of this document, as all of these applies.

The new feature, recently introduced in VMware Cloud on AWS – SCSI-3 Persistent Reservations (SCSI3-PRs) native support - makes it possible to host SQL Server Failover Cluster Instances on VMDK on VMware Cloud on AWS. SCSI3-PRs native support enables customers to create a new or move an existing Windows Server Failover Cluster (WSFC) with up to 6 nodes and 64 shared disks to VMware Cloud on AWS. SCSI3-PRs commands, used by WSFC to claim the ownership of a shared disk, are transparently passed to a VMDK and are arbitrated on the vSAN layer to simulate a dedicated LUN. To facilitate the ability to use direct SCSI commands, it is required to set the SCSI bus sharing on a vSCSI controller of a VM, node of a cluster, hosting clustered VMDK(s), to physical.

Note: To illustrate the migration process, we will be using a three-node SQL Server FCI cluster (SQL-DB1) with shared disks, hosted on the following VMs:

  • SQL-DB1Node01
  • SQL-DB1Node02
  • SQL-DB1Node03

Preparing the Environment for the Migration

To prepare the environment for the migration (including on-premises vSphere, target SDDC on VMware Cloud on AWS, SQL Server Instance configured as FCI), use the following steps. Some of them might require a separate action plan, which is outside of the topic of this publication.

  1. Identify VMs
  2. Record the shared disk configuration
  3. Record the network configuration
  4. Prepare SQL Server for maintenance
  5. Take a backup
  6. Record cluster configuration
  7. Configure Hybrid Linked Mode
  8. Remove DRS and HA overrides
  9. Check virtual Compatibility
  10. Prepare the SDDC
  11. Execute a test vMotion

Detailed steps:

  1. Identify VMs to be migrated and place them into a separate vSphere folder:

A screenshot of a computer

Description automatically generated

  1. Take notes of the VM shared disk configuration:

A screenshot of a computer

Description automatically generated

  1. Take notes of the VMs network configuration. Ensure that all routed cluster networks used by cluster nodes are stretched (L2 VPN or HCX) and available in the target SDDC. Non-routed networks (if present) should not be stretched and should be pre-created in your SDDC.

A screenshot of a computer

Description automatically generated

  1. Prepare all applications using databases on the cluster and SQL Server instance(s) for the maintenance. Ensure that the cluster has no errors or warnings in the cluster.log or Windows Event Log.
  2. Take a full backup of the SQL Server data located on disk drives hosted on shared disks (pRDMs). Store the backup outside of virtual machine(s) to be migrated.
  3. Take notes of the cluster configuration using the Failover Cluster Manager mmc, including the quorum configuration.

              And network configuration

  1. Ensure that Hybrid Linked Mode is configured between your on-premises and VMware Cloud on AWS environments and you are able to migrate VMs using vMotion from on-premises to the cloud.

NOTE: HCX could not be used to move a VM with pRDMs attached to VMware Cloud on AWS at this time. The first node of a cluster should be migrated manually using vSphere Client. All other nodes can be migrated with HCX (in this case, for all VMs to be migrated with HCX, pRDMs pointers should be detached and vSCSI controller bus sharing should be changed to none before the migration). If the migration of the first node without HCX is not an option, engage VMware PSO for a possible workaround.

A screenshot of a computer

Description automatically generated

  1. Remove all VMs to be migrated from all DRS Rules and Groups (when all VMs, members of a  DRS Group should be removed, the DRS Group must be removed as well). Take notes of the DRS configurations.

Attempting to migrate a VM being part of DRS Rules or Groups would fail with the error: “permission to perform this operation was denied”:

A screenshot of a computer

Description automatically generated

  1. The virtual Compatibility (vHardware version) for VMs should be at least v11. Upgrade if required.

A screenshot of a computer

Description automatically generated

  1. Prepare the target SDDC:
  1. Create a separate Virtual Machine Folder to place migrated VMs.

A screenshot of a computer

Description automatically generated

  1. Pre-create a vSAN Storage policy for pRDMs to be migrated. The following settings are recommended for performance:
  • RAID1 for performance
  • FTT=1 or higher to ensure data redundancy

  • Use space reservation – thick provisioning to avoid out of disk space situations

 

  1. Check that all required cluster networks are stretched to the cloud and available. Stretch network segments via L2VPN or HCX if required.

A screenshot of a computer

Description automatically generated

  1. Execute a test vMotion of a test VM  to check the relocation time and the network connectivity. Take notes of the time used to relocate data and use the estimate for the migration of the production workload.

Migrating SQL Server Failover Cluster Instance with Shared/Clustered VMDKs by VMware HCX

High Level Overview

Migrating the first/active node of SQL Server FCI using HCX only applies to SQL FCI disks configured with clustering (“shared”) VMDKs (available since vSphere 7.0, see KB79616), or shared disks on vSAN (available since vSAN 6.7 Update 3, see KB74786). HCX does NOT support virtual machines configured with a shared SCSI bus – SCSI bus type must be set to none before migration. And the supported migration type using HCX for SQL FCI is cold migration.

Important Note – this methodology does not support shared disk configured with pRDMs. Refer to Migrate the First Node of the Cluster – Hybrid Linked Mode and the following documentation if pRDMs are present.

Migration Steps

  1. Identify the first/active node (SQL-Node1 in this example) and the rest of the second node(s). Shutdown the windows cluster using the Failover Cluster Manager Microsoft Management Console. Make sure that all nodes are shown in the offline state.

A screenshot of a computer

Description automatically generated

A screenshot of a computer

Description automatically generated with medium confidence

  1. Shut down all VMs (within Guest OS) hosting nodes of your cluster.
  2. In on-premises vSphere client, edit the configuration of the second node(s) to be migrated. Remove all clustered/shared VMDKs from the second node(s). Make sure the Delete files from datastore checkbox is NOT selected. Reconfigure the SCSI Bus Sharing Type from Physical to None.

A screenshot of a computer

Description automatically generated with medium confidence

  1. Edit Settings of the first (active) node of FCI, reconfigure SCSI Bus Sharing from Physical to None.
  2. Migrate the second node(s) one by one to VMware Cloud on AWS using HCX. Follow the steps described in Standalone SQL Server Migration using HCX. Make sure the migration of the second node(s) is set to Cold Migration and is completed successful.

A screenshot of a computer

Description automatically generated

  1. Repeat Step 5 for the first node.
  2. After the first node has been migrated to VMware Cloud on AWS, edit the virtual machine settings. Reconfigure the SCSI bus type of the SCSI controller for the shared disk from None to Physical.
  3. For the second node(s) of failover clustering instance, reconfigure the SCSI bus type of the SCSI controller for the shared disk from None to Physical. Add all the shared disks to the second node(s) and assign them to the SCSI controller. Make sure all the shared disks are in the same configuration as before migration.

A screenshot of a computer

Description automatically generated with medium confidence

  1. Power on the virtual machine of the first node. Start the failover cluster and make sure all the failover cluster resources are online.

A screenshot of a computer

Description automatically generated

A screenshot of a computer

Description automatically generated with medium confidence

A screenshot of a computer

Description automatically generated with medium confidence

  1. Power on the virtual machine of the second node(s). Make sure the second node(s) are online.

A screenshot of a computer

Description automatically generated with medium confidence

Migrating SQL Server Failover Cluster Instance with Hybrid Linked Mode

This methodology can support migration of shared disks using pRDM for SQL Server Failover Cluster Instance. The data located on shared pRDMs will be copied over network to VMDKs located on vSAN storage in the target SDDC as a part of the cold storage vMotion. Any cluster node can be used as the first migrated node. This operation is an offline operation and require the downtime.

NOTE: The VM “SQL-DB1Node01” is used to illustrate the migration of the first node of the cluster

  1. Shut down the cluster
  2. Shut down VMs
  3. Execute vMotion
  4. Configure independent-persistent mode

Migrating the First Node of the Cluster

  1. Shut down the windows cluster using the Failover Cluster Manager Microsoft Management Console. Ensure that all nodes are shown in the Offline state.

 A screenshot of a computer

Description automatically generated

A screenshot of a computer

Description automatically generated

  1. Shutdown all VMs (within Guest OS) hosting nodes of your cluster.
  2. Select the VM “SQL-DB1Node01” in the vSphere Client. Start the vMotion wizard. Check the section Migration of Standalone SQL Server Workload: Step-By-Step Guide early in this document for more details. Follow the additional instructions below:
  1. Assign vSAN policies, use per disk assignment if required.

 A screenshot of a computer

Description automatically generated

  1. Use the extended destination port-groups (L2 VPN based), avoid re-IP of nodes/cluster network.

 A screenshot of a computer

Description automatically generated

  1. Wait for the cold vMotion/Relocation to finish. Depending on the network connectivity, available bandwidth, and pRDMs capacity the process might take a long time.

4)    After completion of the vMotion, check VM virtual disk and SCSI controller configuration sections and ensure that all pRDMs are converted to VMDKs and retained the same configuration as on-premises. Ensure that the SCSI bus sharing for a controller(s) used to connect shared pRDMs is set to physical.

  1. Set independent-persistent mode for all converted VMDKs.

A screenshot of a computer

Description automatically generated

Validating the Cluster Configuration of the First Migrated Node

On this step we will validate the cluster configuration and will start a SQL Server instance. If SQL Server service failed to start, use the roll-back section below in the document to move back to on-premises.

  1. Power on the migrated VM
  2. Connect to the cluster
  3. Start the windows cluster
  4. Check shared disks
  5. Check SQL Server clustered Role
  6. Check the application connection to the database

Detailed steps:

  1. Power on the VM migrated to SDDC (“SQL-DB1Node01”). The second and all consecutive VMs (“SQL-DB1Node02, 03” located on-premises) *must be* powered off.
  2. Login to the Windows OS, start the Failover Cluster Manager Console. Connect to the cluster. For clusters having more than two nodes it’s expected for the cluster status to be down (due to the cluster votes violation).

A screenshot of a computer

Description automatically generated

  1. Start the cluster. Use this procedure to force start cluster if required (depending on the number of nodes and the cluster voting configuration). If it is undesirable to use the “Force Cluster Start”, you can migrate the remaining nodes of the cluster, but it will make the roll-back (if required) more complicated.

 A screenshot of a computer

Description automatically generated

  1. Navigate to the Storage-- Disks section. Check the status of shared disks now located on a vSAN datastore on VMware Cloud on AWS.

 A screenshot of a computer

Description automatically generated

The following warnings for Physical Disk Resource are expected as the underlying physical disk has    been changed:

 A screenshot of a computer

Description automatically generated

  1. Navigate to the Role section. Check if the SQL Server role is online. Investigate any failures if the role is in the offline state. Prevent users to connect to the SQL Server Instance at this point.

 A screenshot of a computer

Description automatically generated

  1. Check application consistency. If any issues found: do not proceed with the migration of other nodes of the cluster and use the Roll back procedure to move back to on-premises.

 A screenshot of a computer

Description automatically generated

Migrating all Conservative VMs and Nodes of the Cluster

On this step we will move all consecutives VMs hosting nodes of the cluster. As shared disks are already migrated, we will migrate non-shared disks only. Do not move a VM if pRDMs are still attached to it.

  1. Remove pRDMs
  2. Migrate VMs
  3. Edit VMs configuration after migration

Detailed steps:

  1. Edit the VM configuration. Remove pRDM disk(s), check the checkbox to remove the pRDMs disk pointer file (“Delete files from datastore”) when removing from the last node of the cluster. The data located on pRDMs will not be affected. After removing from all VMs, pRDMs will be available as row LUNs and can be mounted to other VMs on-premises if required.

You may leave the vSCSI controller(s) used to host shared disks. We will reuse it to mount back shared disks after the migration to VMware Cloud on AWS will be completed. If HCX will be used to migrate VMs, vSCSI controllers should be removed or SCSI bus sharing should be set to none.

 A screenshot of a computer

Description automatically generated

  1. Migrate to the target SDDC using the previously mentioned steps. You can either use the vSphere Client (CGA) or HCX (if the HCX method to be used, vSCSI Controllers should either be removed or SCSI bus sharing property set to none).

 A screenshot of a computer

Description automatically generated

NOTE: Migration of all consecutive nodes will take less time compared to the migration of the first node - no pRDMs will be copied, only non-shared local VM disk(s) will be migrated.

  1. Wait for the migration to complete. Multiple nodes can be migrated at the same time.
  2. Edit the configuration of the migrated VMs. Add shared disk resources back by selecting Add new Device --> Existing Hard Disk and using “shared” disk(s) from the first migrated node (“SQL-DB1Node01” in our example). The following requirements should be met:
  • Use the same SCSI IDs and the same vController while adding shared disks.
  • vControllers used for shared disk must be configured with SCSI bus sharing set to physical.
  • All shared disks must have independent-persistent mode.

 A screenshot of a computer

Description automatically generated

A screenshot of a computer

Description automatically generated

A screenshot of a computer

Description automatically generated

NOTE: If the SCSI bus sharing is not configured to physical the VM could not be powered on and the following error will be observed: "File system specific implementation of OpenFile (file) failed". Recheck that the SCSI bus sharing is configured as “physical” for all controllers hosting shared disks.

 A screenshot of a computer

Description automatically generated

 

Validating the Cluster Configuration after Migrating all Nodes of the Cluster

We will add back the migrated nodes of the cluster and will validate the cluster functionality. The operation includes several steps.

  1. Re-add nodes back to the cluster
  2. Validate the configuration of the cluster
  3. Resume user access

Detailed steps:

  1. Power on the migrated VM, node of the cluster (one VM at a time if the cluster has more than two nodes), and ensure that the node status is “Up” in the Failover Cluster Manager console.

 A screenshot of a computer

Description automatically generated

  1. Proceed with all remaining nodes, one at a time. After all nodes will be re added back to the cluster, validate the windows failover cluster configuration. Ensure that the cluster.log and Windows Event Log report no errors. Move the SQL Server role to a new re-added node and ensure that the SQL Server instance is accessible.

 A screenshot of a computer

Description automatically generated

  1. Resume the user access. Ensure that the application(s) using SQL Server instance(s) are migrated as well. Consider using vRNI to check the application dependencies if required.

With finishing this operation, you completed the migration of a SQL Server FCI with shared disks to the VMware Cloud on AWS SDDC.

Migrating SQL Server FCI – Considerations

Once the cluster will be on-line and in use, with new data written to shared disks, the reverse migration to on-premises is supported if:

On-premises environment has storage provisioned by VMware vSAN version 6.7 Update 3; or

On-premises environment has VMware vSphere 7.x and the storage array in use is connected over Fibre Channel (see this document for more details).

Rollback Plan

Rollback plan should be applied after verifying the application functionality on the first migrated node. Do not proceed with the migration of other nodes of the cluster at this point. All nodes still located on-premises must be powered off.

  1. Log in to Windows OS on the VM migrated to SDDC (“SQL-DB1Node01” in our example).
  2. Shut down the cluster using the Failover Cluster Manager Console.
  3. Shut down guest OS (within Guest OS).
  4. On the VM “SQL-DB1Node01 “: Remove all shared VMDK (migrated to VMDKs from pRDMs). Only non-shared disks should be retained.
  5. Use a reverse migration to move a VM back to on-premises. Use the required network port-group and datastore(s) for non-shared disks.
  6. Edit the configuration of the migrated VM. Add shared disk resources back by selecting “Existing Hard Disk” and using pRDMs pointer file. Ensure to use the same SCSI IDs while re-adding shared disks.
  7. Start VMs one at a time, starting with non-migrated VMs.
  8. Check the windows failover cluster configuration and the application functionality.
  9. Reconfigure DRS Anti-affinity rules and groups.

You have completed the roll-back plan.

Post-Migration: Configuration of SQL Server Workloads

Following successful migration of the workload from on-premises to an SDDC on VMware Cloud on AWS, it’s important to perform the post-migration steps outlined here to ensure Day 2 operations of the SQL Server are not negatively affected.

Creating Compute Policies to Force Anti-Affinity

To avoid creating a single point of failure, it’s important to ensure that VMs hosting clustered SQL Server workload are not running on the same host. If your SDDC is configured as a stretched cluster spanning multiple AZs in the AWS infrastructure, Compute Policies also should enforce the placement of VMs into different AZs.

NOTE: Affinity policies in VMware Cloud on AWS SDDC are not the same as vSphere DRS affinity rules created on-premises. These can be used in many of the same ways but have significant operational differences. A compute policy applies to all hosts in an SDDC and cannot typically be enforced in the same way that a DRS must policy is enforced.

The following high-level steps are required to implement the described compute policies:

  • vSphere tags should be applied to all hosts in the SDDC, one tag per an AZ (if stretched cluster is used)
  • vSphere tags should be applied to all VMs, one tag per SQL Server availability group and one tag per each node

NOTE: For SDDCs residing in the same AZ creation of host-based tags are not required.

Detailed steps:

  1. Create vSphere Categories and Tags.

To create vSphere tags for each AZs:

Navigate to the Menu, then Tags & Custom Attributes.

 

Select Tags, then Categories:

 A screen shot of a computer

Description automatically generated

Proceed with creating new categories as indicated below:

  1. Fault Domain to be used with host tags for AZs

 A screenshot of a computer

Description automatically generated

  1. SQL Server Always On to be used with VMs tags for SQL Server Always On deployments

 A screenshot of a computer

Description automatically generated

NOTE: Assuming the given VM host databases residing in different AGs, selecting many tags allows assignment of multiple tags within the same category to one VM.

Continue creating the required vSphere tags using the previously created categories. Create a tag for each AZ using, as a best practice, the AZ name that can be found on the Host Summary page. In this case, the following tags have been created:

  • us-west-2b
  • us-west-2c

 A screenshot of a computer

Description automatically generated

NOTE: This step should be executed only once. Host-based tags can be reused for each consecutive deployment.

Continue adding tags for each Always On AG. It is recommended to use the availability group name for your tag. Each VM host must be tagged individually with a unique node tag, along with a separate tag for the Always On AG. Here, the following tags have been created:

  • SQL-VMC-AG1-DB1-Node1
  • SQL-VMC-AG1-DB1-Node2
  • SQL-VMC-AG1-DB1

 A screenshot of a computer

Description automatically generated

  1. Assign created tags to required objects.

First, assign tags to all hosts in the SDDC to reflect their AZ membership. Here, us-west-2b is assigned to all hosts in the AZ us-west-2b and the tag us-west-2c to all host located in the AZ us-west-2c.

Navigate to your vSphere HA cluster in the SDDC on VMware Cloud on AWS, highlight the cluster name and navigate to the HOST tab. Use the Show/Hide Column picker to add the Fault Domain column:

 A screenshot of a computer

Description automatically generated

Highlight a host, select the Tags & Custom Attributes menu, choose Assign Tag:

 A screenshot of a computer

Description automatically generated

Select the tag corresponding to the host’s fault domain:

 A screenshot of a computer

Description automatically generated

Continue assigning tags to all host in the cluster.

Confirm all tags are assigned using the Summary page of the host view:

 A screenshot of a computer

Description automatically generated

Now assign tags to all VMs hosting databases in AG. In total, two tags should be assigned to each VM: a node identifier to map to an AZ and a cluster identifier to be used for a VM anti-affinity policy (to separate VMs between hosts within one AZ).

Select the vSphere folder, in which all VMs hosting SQL Server workloads are located:

 A screenshot of a computer

Description automatically generated

Highlight and right-click a VM to assign tags. Alternatively, navigate to the Summary page, locate the Tags widget, and use Assign Tag:

 A screenshot of a computer

Description automatically generated

NOTE: Tag assignment can be automated with PowerCLI.

 

  1. Create Compute Policies

Once tags are assigned to all required objects, move forward with the creation of Compute Policies.

Navigate to the Menu, then Policies and Profiles. Select Compute Policies:

 A screenshot of a computer

Description automatically generated

Start with creating VM-Host affinity policies. Two are required.

Click Add and populate the required fields:

  • Name: SQL Always on Node 1 – AZ 1
  • Policy type: VM – Host affinity (VM should run on the specified hosts)
  • VM tag: Specify Category (SQL Server Always On) and Tag (SQL-VMC-AG1-DB1-Node1)
  • Host tag: Specify Category (Fault Domain) and Tag (us-west-2b)

Click Create to add the new policy. Proceed with creating the policy for Node2:

 A screenshot of a computer

Description automatically generated

NOTE: If the number of nodes and respective VMs in one availability group is more than two, distribute the VMs evenly between AZs.

In addition, it is necessary to create a policy separating VMs hosting availability groups within one AZ to ensure that in each AZ, all SQL Server nodes will run on separate hosts. This can be achieved using the previously added tag with cluster name.

Add the new policy and specify the following settings:

  • Name: name of the policy (SQL Always on Separate VM)
  • Policy type: VM – VM anti-affinity
  • VM tag: Category SQL Server Always On, Tag SQL-VMC-AG1-DB1

 A screenshot of a computer

Description automatically generated

  1. Verify the configuration

Once all policies have been created, check that they are assigned to VMs. To do this, highlight a VM and navigate to the Summary page. Scroll to the Compute Policies widget and verify that the required policies are assigned. Each VM should have two policies assigned:

 A screenshot of a computer

Description automatically generated

  • SQL Always on Node 1 – AZ1: the VM will preferably be running only on the hosts located in the same AZ.
  • SQL Always on Separate VMs: VMs hosting SQL Server workloads and running within the same AZ will preferably run on different hosts.

NOTE: If an SDDC hosting SQL Server AGs is based in a single AZ only, a VM anti-affinity policy must be created to redistribute VMs between hosts.

NOTE: Distributing nodes of SQL Server AGs between AZs may increase replication latency and the execution time of queries.

Networking Considerations

Once all nodes are moved to the same L2 VPN based on network port group, no change of IP addresses is required. It’s important to note that in this configuration, VMs will continue to use the default gateway located on-premises to send traffic to any consumers located outside of home network. It may be effective to also move applications and reporting solutions using the database to VMware Cloud on AWS, to decrease network latency and the amount of traffic traversing a VPN/DX connection.

Post-migration: Operational Considerations

After migrating your SQL Server workloads to VMware Cloud on AWS ensure that the following operational tools are reconfigured accordingly:

  • Monitoring
  • Backup: Moving backup to VMware Cloud on AWS should be done on priority to avoid streaming backup traffic through the VPN/DX connection.
  • Management tools, Scripts, Scheduled jobs

Optimizing the Configuration of a VM Hosting SQL Server Workload

After migrating a Virtual Machine (VM) hosting SQL Server workloads to VMware Cloud on AWS make sure to check VM configuration settings to ensure better operations and performance of your workload. The list below should not be treated as a full list of configurations recommendations but rather depicts the configuration items that might be affected due to the migration to a new environment on VMware Cloud on AWS.

CPU Compute Resource

The correct assignment of CPU resources is vital for CPU-intensive SQL Server workloads. Note the CPU capabilities of the AWS server hardware in your SDDC and the physical NUMA node configuration of your new servers, as they might differ from your on-premises physical servers.

  • i3.metal: 36 physical cores, no Hyperthreading, Intel Xeon E5-2686 v4 (Broadwell), 2,3 GHz; 512 GB RAM; 2 pNUMA – 18 cores/256GB RAM each.
  • i3en.metal: 48 physical cores with Hyper-threading enabled (total of 96 logical cores), Intel Xeon Platinum 8200 series (Cascade Lake), 768GiB RAM, and approximately 45.84 TiB raw storage capacity per host. Two physical NUMA nodes, 24 cores/384 GB RAM each.

NOTE: The list above is subject to change once new instances are made available on VMware Cloud on AWS. Recheck the documentation for your SDDC for the server model in use.

Rightsizing

Control the number of CPUs assigned to a VM hosting SQL Server workloads. You might need to downsize or add CPUs to your VM based on the difference between on-premises and VMware Cloud on AWS. The section in the document provides more information on the rightsizing process.

NOTE: At the time of writing, the CPU limit per VM on VMware Cloud on AWS is 36 CPUs on i3 and 96 CPUs on i3en instances based on the server hardware capabilities.

Checking the VM vNUMA Configuration

Make yourself familiar with the NUMA concept and the VMware implementation of vNUMA. The vNUMA blog series by Frank Denneman is highly recommended. Also, check out his VMworld session: “60 Minutes of Non-Uniform Memory Architecture (HBI2278BE)” and information provided in the SQL Server on VMware document.

SQL Server is a NUMA-aware application capable of performance optimization based on the underlying NUMA configurations. VMware exposes the NUMA configuration with the help of vNUMA, the virtual topology configuration stored in the VM’s vmx file. By default, the vNUMA configuration for a VM is created only once – when a VM was first time powered on. During the VM lifecycle, vNUMA will be changed only when modifying a vCPU resource (using default settings) or other vNUMA advanced options. If the on-premises physical servers configuration differs from one used in VMware Cloud on AWS, it’s expected that after the migration, the vNUMA configuration of your VM may not be aligned with the new hardware. This might introduce performance penalties for SQL Server workload obvious on a VM with more than 8 vCPUs (9 is the lower limit for vNUMA).

To check the vNUMA configuration of your VM, we recommend using the PowerShell script (all credits go to Valentin Bondzio), providing a framework to execute grep against the vmware.log file without directly connecting to a host.

If the vNUMA configuration does not match, you can reconfigure the vNUMA by using either the VM advanced settings (by setting numa.autosize.once = "FALSE”) and powering on the VM or by changing the number of vCPUs assigned. You can find more information about SQL Server workload and vNUMA in the relevant sections of the SQL Server Recommendation Guide and the performance engineering team recommendations resource.

Recheck the cores/socket ratio configured for your VM

As per the best practices, always reflect the physical server CPU configuration while configuring the cores per socket assignment. The following examples might be helpful to illustrate the statement below

  • For a VM with 24 In-Guest CPUs running on an i3.metal server, the recommended configuration will be 12 cores/2sockets (24 VM cores > 18 physical cores on one socket).
  • For a VM with 24 In-Guest CPUs, running on an i3en.metal, the recommended configuration will be 24cores/1socket (24 VM cores == 24 physical cores on one socket).

On VMware Cloud on AWS, avoid configurations with more than two sockets – such configuration might negatively impact SQL Server performance by preventing effective use of the L3 processor cache.

CPU Hot Add

CPU hot-add feature, when enabled, disables the vNUMA configuration for a VM. Hence, if exposing vNUMA configuration is desirable, CPU hot-add should be disabled for all VMs hosting SQL Server workload with high-performance demand. The performance impact of CPU Hot Add is more visible on a VM having a wide vNUMA configuration.

Per VM EVC mode

Pre-VM EVC mode might need to be set before executing a live migration from on-premises to VMware Cloud on AWS or in case of a revert migration. If the EVC mode is configured, but the reverse migration is not expected, consider removing the EVC configuration to expose new CPU features to your VM (VM Compatibility upgrade might also be required).

Memory Compute Resource

The memory resource is of the highest importance for SQL Server workloads. Modern versions of SQL Server benefit from accessing memory resources to use different caches and reduce disk IOPS effectively.

Rightsizing

Check with your DBA team to get memory usage metrics directly from the SQL Server instance (Distributed Management Views (DMVs), particularly sys.dm_os_sys_memory, provide a lot of useful information to accomplish this task). Avoid using OS- or vSphere-based memory metrics for the memory rightsizing task. You can find more details about rightsizing SQL Server memory resources here.

vNUMA Configuration

From the performance perspective, it’s recommended to stay within one physical NUMA node while allocating memory. As both physical server models used in VMware Cloud on AWS have relatively high memory density, it should not be a problem. However, if you allocate more than 256/384 GB on i3/i3en instances, respectively, ensure to align the vNUMA configuration to accommodate for the extra amount of memory.

Memory Overcommitment

Do not overcommit memory for any VM hosting SQL Server workload with high-performance requirements. ESXi memory overcommitment highly impacts SQL Server performance. Therefore, the following metrics on the VM and host objects should be monitored, and an alert created if the value will be above 0:

  • Memory\Balloon (%)
  • Memory\Swapped (KB)

VM Networking Configuration

The migration to another vCenter instance requires re-assigning the network port group as a part of the migration procedure. Ensure that the correct port group(s) has been selected for all traffic types (including backup, SQL Server Always on Availability Groups replication).

It is expected to have many L2 extended (stretched) port groups to be in use for SQL Server workloads. Take care while designing the traffic flow for VMs residing on an extended port group: the default gateway will still be located on-premises. This will cause even the traffic to send to another routed and/or extended port group in VMware Cloud on AWS to be first sent to the on-premises device hosting the gateway and then routed back to VMware Cloud on AWS. The introduction of the proximity routing feature (in limited preview now) will help overcome this situation.

MTU

If a custom MTU size has been configured on the source port group, it is important to recheck the MTU and the packet defragmentation after migrating to VMware Cloud on AWS. For now, the default MTU size for all inter-SDDC networks is 8,950 and from on-premises to VMware Cloud on AWS (for both VPN-based and Direct Access-based connections) is 1,500. On a Windows operating system, use the ping -f -l <packet size> command to check the packet size to be transmitted without fragmentation.

NOTE: As VMware Cloud on AWS is quickly evolving, consider checking the release notes to determine if configurable MTU on the DX is made available.

Network Configuration

The following configuration items should be rechecked after the migration:

  • Virtual network adapter – consider using VMXNET3 for all VMs hosting SQL Server workloads
  • RSS/TSO – consider checking the RSS and TSO settings inside of Windows OS and ensure to turn them on. Starting with VMware tools 10.5, RSS is enabled by default on all VMXNET3 adapters but might require the OS level configurations.

Storage Resources

VMware vSAN is the technology providing the storage resource in VMware Cloud on AWS. Therefore, migrating to VMware Cloud on AWS might require revising the current virtual disk design of VMs hosting SQL Server workloads to achieve the best performance running on vSAN.

Note: You can use the set of recommendations created by the VMware vSAN and SQL Server experts for most of the optimization tasks. Bear in mind that these recommendations are created for on-premises deployments, and not all of them could apply to a managed service like VMware Cloud on AWS due to the nature of the environment. The bullet points below supersede the recommendations in the article.

The following configuration items should be considered:

  • Use the PVSCSI virtual controller type to attach virtual disks hosting SQL Server related data (including logs and tempdb) to achieve the best throughput. Do not use the LSI Logic SAS controller type.
  • Use multiply PVSCSI controllers (up to four) to balance the disk throughput between controllers.
  • Consider a multiple VMDK disk layout to redistribute load between vSAN nodes. This is especially important as vSAN is much more efficient with smaller disks, so a VM with multiple small in size VMDKs distributed between multiple vSCSI controllers is expected to perform better compared to a VM with the same workload but using just a single VMDK on a single vSCSI Controller.
  • We strongly advise using RAID1 for SQL Server transaction log and tempdb disks. 
  • RAID1 should be your primary choice for SQL Server database files if the performance of SQL Server is the main goal of your design.
  • Consider setting the Object Space Reservation (OSR) Advanced Policy Setting to “Thin provisioning”. OSR controls only the space reservation and has no performance advantages. While the control of the capacity is still very important for on-premises solutions, on VMware Cloud on AWS Elastic DRS (eDRS) ensures the cluster will not run out of free capacity. You can check this blog article for more details.

VM Management

To maintain the best performance and be able to fully utilize features of physical hardware on VMware Cloud on AWS:

  • Check and upgrade VMware Tools. VMware Tools version 10.3.x and higher are recommended.
  • Check and upgrade the VM compatibility (aka “virtual hardware”) if required. However, we recommend maintaining the VM compatibility at least on version 11 (ESXi 6.0 or later).

SDDC/VM Configuration Considerations

Make sure to understand the applicable VMware Cloud on AWS Configuration Maximus while planning, sizing, and running your SDDC hosting Mission Critical Applications. While many configuration maximums are the same as on-premises, some of them might influence the way how your SDDC should be designed.

Make sure to consult the list of unsupported VM configurations to ensure that a VM can be started on/moved to VMware Cloud on AWS.

SQL Server Configuration

Common SQL Server configuration recommendations are listed in the SQL Server on VMware document.

The following additional configuration settings are strongly advisable for all SQL Server on VMware Cloud on AWS:

  • Set T1800 trace flag. T1800 trace flag forces 4K IO alignment for SQL Server transaction log. vSAN efficiently greatly improves with 4K aligned IO. We recommend that you enable global trace flags at startup, by using the -T command-line startup option. This ensures the trace flag remains active after a server restart. Restart SQL Server for the trace flag to take effect. You can use procmon system utility to check the IO to make sure that the trace flag is properly enabled on your SQL Server.
  • Dedicate separate disks for SQL Server transaction log. Use multiple disks spread between multiple SCSI controllers if you have multiple databases.
  • Dedicate separate disks for temdb. We recommend using four VMDKs spread between four SCSI controllers with each VMDK hosting two tempdb files (with a total of eight tempdb files per SQL Server instance)
  • Use Database File Group with multiple files. Depending on your database design you can either use multiple File Groups or create multiple files inside of a single primary database group. SQL Server writes parallel to all files within a file group.
  • Avoid cross-region and hybrid (on-premises to SDDC) traffic flow. Ensure that the apps and all components using the database are located within the same cluster in your SDDC. Take care of your SSIS deployment. SSIS server executing packages should be located within the same SDDC as the source and target SQL server database.

Summary

The list of configuration items above highlights the most important VM settings. The difference between your on-premises environment and an SDDC on VMware Cloud on AWS might directly influence application performance. For the full list of recommended VM configuration settings, check the SQL Server on the vSphere Recommendation Guide.

Conclusion

While this document outlines recommendations for planning and architecture and provides in-depth technical details and step-by-step guidance for SQL Server workloads migration, both stand-alone and highly available, it should not be considered a replacement for Architecting Microsoft SQL Server on VMware vSphere -- Best Practices Guide. VMware strongly recommends that all guidance contained in the best practices guide be considered.

As each infrastructure is unique in terms of requirements, implementation and operation, care should be exercised while developing workload migration plans. Explore the variety of high-availability and performance features VMware Cloud on AWS offers to make migration seamless and beneficial to SQL Server workloads.

Acknowledgments

Author: Oleg Ulyanov – Sr. Solutions Architect, Microsoft Applications, VMware

  • Charu Chaubal – Director, Cloud Platform Technical Marketing, VMware
  • Don Sullivan – Product Line Marketing Manager for Business-Critical Applications, VMware
  • Vas Mitra – Staff Solutions Architect, VMware
  • Alex Zou – Sr. Business Development Manager, VMware
  • Deji Akomolafe – Staff Solutions Architect, Microsoft Applications, VMware
  • Sudhir Balasubramanian – Staff Solution Architect – Data Platforms, VMware
  • Valentin Bondzio - Sr. Staff Technical Support Engineer
  • Mark Xu – Sr. Technical Marketing Architect, VMware
  • Catherine Xu – Senior Manager of Workload Technical Marketing team, VMware

Filter Tags

vSphere SQL Server Document Best Practice Reference Architecture Design Migrate