Reference Architecture - Healthcare - Best Practices for VMware vSAN with Epic on Dell EMC VxRail
Executive Summary
Business Case
A typical EHR environment usually consists of several different storage arrays coupled with dozens of hosts. Healthcare IT departments must not only administer and maintain this environment, but they must also publish new applications, provision new hardware, and investigate new technologies. Given these mandates, many Healthcare IT CTOs are looking to lower costs. Expanding IT departments to host large scale data centers isn’t a goal for most hospitals. This presents a challenge to most healthcare providers today: how does a hospital provide its clinicians and patients cutting-edge technology on a robust and responsive infrastructure without increasing costs and impacting patient care?
Dell Technologies can answer the challenge by hosting these core business functions on Dell EMC VxRail. Adopting VxRail minimizes the need to maintain legacy infrastructure and is capable of delivering mission-critical response times. VxRail is an enterprise-class, hyperconverged appliance that allows administrators to manage compute and storage with a single platform. There is no need to deploy or maintain separate arrays and storage networking hardware with VxRail. Its policy-based management removes the burden of provisioning and modifying numerous LUNs and data services. With its consistency and flexibility, VxRail provides the simplest path from server virtualization to hyperconverged infrastructure and true hybrid cloud architecture.
This document showcases VMware best practices and design guidelines for the Epic Operational and Analytical databases on Dell EMC VxRail.
Overview and Purpose
VMware and Epic have conducted the initial testing that has indicated that VxRail provides acceptable performance for small to medium-size environments.
This best practice guide is focused on deploying and managing Epic Operational and Analytical Databases on VxRail. These best practices are determined by:
- Following Epic Operational Database, and Cogito storage best practices.
- Using Epic provided test tools to simulate ODB and Cogito production workloads.
- Determining the optimal performance and configuration for Operational Database and Cogito in a single 6-node VxRail cluster.
For the most recent status of VMware vSAN™ on Dell EMC VxRail with Epic, contact your Epic representative for the most current Storage Products and Technology Status Guide.
Key Takeaways
- Dell EMC VxRail can deliver predictable and consistent performance for Epic Operational and Analytical Databases in well prescribed architectures and scenarios.
- A solution configuration of one 6-node VxRail cluster with two five-disk-group, is capable of hosting both the Epic Operational and Analytical databases.
- vSAN storage policy-based management provided the key to consistent performance during run-away report testing.
- A non-blocking, high-buffer count 10/25GbE or greater switch is required - if you have questions please contact your Dell EMC representative.
- Hosting both databases on a single 6-node cluster provides cost savings in both administration and acquisition of the infrastructure.
- Consult with your Dell EMC and Epic representatives before procuring any hardware or starting your VxRail and Epic project. Selecting the wrong hardware may result in a sub-optimal configuration and possibly impact performance.
- Refer to your Epic Hardware Configuration Guide for hardware and storage specifications tailored to your organization.
Business Values
Here are top 5 benefits to deploying Epic EHR on VxRail:
- Rapid deployment and configuration: Native VxRail deployment process of HCI infrastructure and VxRail Manager in a single deployment workflow of vCenter, the ESXi, and vSAN layers of VxRail.
- High performance and scalable hyperconverged infrastructure ensures consistent performance and predictable scalability for mission-critical EHR workloads, which makes administration and monitoring easier for Epic and infrastructure administrators.
- Automated Lifecycle Management: Minimize Epic workload impact and downtime during necessary patching and upgrading of the full private cloud stack using automated and self-managed services.
- Encryption: Native to vSAN, vSAN Encryption provides data-at-rest security at the cluster level and supports all vSAN features, including space efficiency features like deduplication and compression. Enabled with a few clicks, vSAN Encryption is built for compliance requirements and offers simple key management with support for all KMIP compliant key managers, such as CloudLink, Hytrust, SafeNet, Thales, and Vormetric. vSAN Encryption is FIPS 140-2 validated, meeting stringent U.S. federal government standards.
- Predictable hardware management from procurement to day-to-day administration. The familiarity of synonymous VxRail models allows IT groups to get new hardware up and running with minimal effort. Helping bridge the knowledge gaps between a database cluster and a presentation layer cluster.
Key Results
This reference architecture is a showcase of Dell EMC VxRail for operating and managing Epic EHR operational and analytic database workloads in a fully integrated environment. Key results can be summarized as the following:
- Dell EMC VxRail simplifies and accelerates the necessary virtual infrastructure deployment desired for Epic EHR workloads with a single workflow containing all individual sub-tasks.
- The HCI platform, specifically VxRail in this solution, provides linear scalability and predictable performance capability for EHR mission-critical workloads.
- Using the GenIO test tool, Dell EMC VxRail performs at expectable levels for both small and medium-size Epic customer environments.
Note: The performance results in this solution are validated on the HCI platform of the Dell EMC VxRail, which is also applied to general VMware vSAN with similar configurations.
Audience
This document is intended for architects, application developers, and CTOs in the healthcare ecosystem who are involved in the early phases of planning, design, and deployment of Epic in their environment or upgrading their existing Epic infrastructure. It is assumed that the reader is familiar with the concepts and operations of running Epic software.
Technology Overview
Solution technology components are introduced:
• Dell EMC VxRail E560N
• VMware vSphere 6.7 U3
• VMware vSAN 6.7 U3
• InterSystems Caché and IRIS
• Microsoft SQL Server 2016
Dell EMC VxRail
VxRail systems are jointly developed by Dell EMC and VMware and are the only fully integrated, preconfigured, and tested HCI system optimized for VMware vSAN technology for software-defined storage. Managed through the ubiquitous VMware vCenter Server interface, VxRail provides a familiar vSphere experience that enables streamlined deployment and the ability to extend the use of existing IT tools and processes.
VxRail systems are fully loaded with integrated, mission-critical data services from Dell EMC and VMware including compression, deduplication, replication, and backup. VxRail delivers resiliency and centralized-management functionality enabling faster, better, and simpler management of consolidated workloads, virtual desktops, business-critical applications, and remote office infrastructure. As the exclusive hyperconverged infrastructure system from Dell EMC and VMware
VxRail systems are optimized for VMware vSAN software, which is fully integrated in the kernel of vSphere and provides full-featured and cost-effective software-defined storage. vSAN implements an efficient architecture, built directly into hypervisor. This distinguishes vSAN from solutions that typically install a virtual storage appliance (VSA) that runs as a guest VM on each host. Embedding vSAN into the ESXi kernel layer has advantages in performance and memory requirements. It presents storage as a familiar data store construct and works seamlessly with other vSphere features such as VMware vSphere vMotion
The VxRail software layers use VMware technology for server virtualization and software-defined storage. VxRail nodes are configured as ESXi hosts, and VMs and services communicate using the virtual switches for logical networking. VMware vSAN technology, implemented at the ESXi-kernel level, pools storage resources. This highly efficient SDS layer consumes minimal system resources, making more resources available to support user workloads. The kernel level integration also dramatically reduces the complexities involved in infrastructure management. vSAN presents a familiar datastore to the nodes in the cluster and Storage Policy Based Management provides the flexibility to easily configure the appropriate level of service for each VM.
VxRail HCI System Software, the VxRail management platform, is a strategic advantage for VxRail and further reduces operational complexity. VxRail HCI System Software provides out-of-the-box automation and orchestration for day 0 to day 2 system-based operational tasks, which reduces the overall IT OpEx required to manage the stack. No build-it-yourself HCI solution provides this level of lifecycle management, automation, and operational simplicity. With VxRail HCI System Software, upgrades are simple and automated with a single click. You can sit back and relax knowing you are going from one known good state to the next, inclusive of all the managed software and hardware component firmware. No longer do you need to verify hardware compatibility lists, run test and development scenarios, sequence and trial upgrades, and so on. The heavy lifting of sustaining and lifecycle management is already done for you.
Figure 1. VxRail HCI System Software
This best practice guide was developed using 6.7 U3. However, later versions of vSphere and vSAN are applicable.
VMware vSphere 6.7 U3
VMware vSphere 6.7 provides a powerful, flexible, and secure foundation for business agility that accelerates the digital transformation to cloud computing and promotes success in the digital economy. vSphere 6.7 supports both existing and next-generation applications through its:
- Simplified customer experience for automation and management at scale
- Comprehensive built-in security for protecting data, infrastructure, and access
- Universal application platform for running any application anywhere
With vSphere 6.7, customers can run, manage, connect, and secure their applications in a common operating environment, across clouds and devices.
VMware vSAN 6.7 U3
VMware vSAN is the industry-leading software powering VMware’s software-defined storage and HCI solution. vSAN helps customers evolve their data center without risk, control IT costs, and scale to tomorrow’s business needs. vSAN, native to the market-leading hypervisor, delivers flash-optimized, secure storage for all of your critical vSphere workloads. vSAN is built on industry-standard x86 servers and components that help lower TCO in comparison to traditional storage. It delivers the agility to easily scale IT and offers the industry’s first native HCI encryption.
In vSAN 6.7 U3 release, it provides performance improvements and availability SLAs on all-flash configurations with deduplication enabled. Latency sensitive applications have better performance in terms of predictable I/O latencies and increased sequential I/O throughput. Rebuild times on disk and node failures are shorter, which provides better availability SLAs.
The 6.7 U3 release also support cloud native storage that provides comprehensive data management for stateful applications. With Cloud Native Storage, vSphere persistent storage integrates with Kubernetes.
vSAN 6.7 U3 simplifies day-1 and day-2 operations, and customers can quickly deploy and extend cloud infrastructure and minimize maintenance disruptions. Stateful containers orchestrated by Kubernetes can leverage storage exposed by vSphere (vSAN, VMFS, NFS) while using standard Kubernetes volume, persistent volume, and dynamic provisioning primitives.
InterSystems Caché and IRIS
Epic licenses and uses InterSystems Caché and IRIS for the operational database to store and manage patient records, IRIS being the newer database. Within Caché and IRIS, data can be modeled and stored as tables, objects, or multidimensional arrays. Different models can seamlessly access data—without the need for performance—killing mapping between models. All three access methods can be simultaneously used on the same data with full concurrency. Thus, making it ideal for use in hospital environments where thousands of clinicians could access and manipulate the same data.
Epic leverages many features and functions from Caché and IRIS, however, one important note is the two main deployment architectures that Epic uses:
- Symmetric Multiprocessing (SMP)—The most commonly deployed architecture for Epic customers. The data server is accessed directly.
- Enterprise Cache Protocol (ECP)—A tiered architecture in which users access the data server via a pool of application servers. All data still resides on the data server. ECP is used in some of Epic’s largest customers.
Microsoft SQL Server 2016
Epic Cogito can use either Oracle or Microsoft SQL Server for the underlying database. While testing we use Microsoft SQL Server 2016 which enables users to build modern applications either on-premises or in the cloud. Microsoft has added Always Encrypted, which encrypts data in use and at rest and enhanced SQL Server auditing capabilities. Those capabilities along with many others are the reasons why so many organizations chose to deploy Microsoft SQL Server.
Solution Configuration
This section introduces the resources and configurations: • Hardware resources • Network configuration • Architecture diagram • vSAN storage policy configuration • Storage policies and Epic EHR workloads • Software resources
Hardware Resources
In this solution, we used a total of six VxRail E560N platforms each configured with two disk groups, and each disk group consists of one cache-tier mixed-use NVMe and four capacity-tier read-intensive NVMe.
Each VxRail node in the cluster had the following configuration:
Table 1. Hardware Configuration for VxRail
PROPERTY | SPECIFICATION |
---|---|
Node model name |
VxRail E560N |
CPU |
2 x Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz, 28 core each |
RAM |
512GB |
Network adapter |
2 x Mellanox CX-4 Lx SFP28 |
Disks |
Cache - 2 x 1.6TB Mix Use NVMe Intel |
Capacity - 8 x Intel 4TB NVMe Read Intensive |
Network Configuration
As shown in Figure 1, we created a VMware vSphere Distributed Switch™ for each VxRail cluster to act as a single virtual switch across all associated nodes for the cluster. One dual 25GbE NIC on each of the nodes was configured for vSAN traffic. The other dual 25GbE NIC was configured on the VDS for VM, vMotion, and Management traffic. All the networks are configured on different VLAN IDs.
Figure 2. Network Configuration
Table 2. Virtual Distributed Switch Teaming Policy for 2x25 GbE Profile
Port Group | Teaming Policy | VMNIC0 | VMNIC1 |
---|---|---|---|
Management network |
Route based on Physical NIC load |
Active |
Standby |
VxRail Management |
Route based on the originating virtual port |
Active |
Standby |
VM network |
Route based on Physical NIC load |
Active |
Standby |
vSphere vMotion |
Route based on Physical NIC load |
Active |
Standby |
vSAN |
Route based on Physical NIC load |
Standby |
Active |
Architecture Diagram
In this solution, we use a 6-node VxRail cluster with E560N to validate the Epic database workloads. We consulted with their lead architects and followed Epic best practices. Epic testing and recommendations methodology are quite strict and with the onset of hyperconverged infrastructure, they asked VMware to test additional workloads and scenarios.
Table 3. VM Type and Purpose
Database Type | Database Purpose | VM Count |
---|---|---|
ODB |
Production |
1 |
ODB |
Non-Production |
1 |
Report |
Production |
1 |
Support Release |
Production |
1 |
Clarity |
Production |
1 |
Clarity |
Non-Production |
1 |
Caboodle |
Production |
1 |
Caboodle |
Non-Production |
1 |
Cube |
Production |
1 |
Cube |
Non-Production |
1 |
MST ACE |
Production |
1 |
vCenter |
Production |
1 |
VxRail Manager |
Production |
1 |
Platform Server Controller |
Production |
1 |
Figure 3. Epic ODB and Analytical Database VxRail Cluster
With our Database VxRail cluster, both performance and availability are the driving factors; thus, we use (6) nodes that allow us to use Failures to Tolerate of 2 (FTT=2). The architecture illustrated above will deliver the required performance for Epic ODB. We found that while using RAID1 for the ODB’s VDMK’s and running an aggressive IO test that vSAN maintains the required IO response time. This configuration is designed for a small Epic customer. This cluster size can scale to what is required in Epic’s Hardware configuration guide.
VxRail Best Practice:
- Designate a VxRail cluster for the Database workloads
- Leverage SPBM (Storage Policy Based Management) for performance and availability
- Use RAID1 and FTT=2 for all ODB VMDKs
- Use RAID1 and IOPs Limiter set to 3000 IOPS for all other VMs on the Database VxRail Cluster
- Enable Checksum
- Disable Dedupe and Compression
vSAN Storage Policy Configuration
In our design, we use different storage policies for the Epic workloads. Table 3 shows a detailed configuration.
Table 4. vSAN Storage Policy Configuration for Epic Workloads
Feature |
Value |
Applied to VMs |
Description |
Failure to Tolerate |
2 failure – RAID-1 (Mirroring) |
ODB |
Defines the number of disks, host, or fault domain failures a storage object can tolerate. This set for both PROD and NON-PROD ODB VMs. |
Failure to Tolerate |
1 failure – RAID-1 (Mirroring) |
vCenter, MST ACE# |
Defines the number of disks, host, or fault domain failures a storage object can tolerate. This is set for all other VMs in Database cluster. |
IOPS Limiter |
3,000 |
SUP REL, Report, Clarity, Caboodle, Cube, Non-Prod ODB, Non-Prod Caboodle, Non-Prod Cube, Non-Prod Clarity |
Sets the maximum number of IOPS per VMDK. This is set on all VM’s VMDKS in the Database Cluster except for PROD and NON-PROD ODB. |
Dedupe and Compression |
Disabled |
|
Block-level deduplication and compression for storage efficiency. |
Checksum |
Enabled |
|
Checksum enabled in each VxRail cluster. |
SPBM ODB, RAID-1 with FTT=2 example:
SPBM IOPS Limiter set at 3000 IOPS per VMDK with RAID-1 example:
SPBM RAID-1 with FTT=1 example:
Storage Policies and Epic EHR Workloads
During our test cycles we concluded, with Epic’s guidance, the use of the IOPS Limiter Storage Policy would be beneficial to prevent any performance impact to the ODB environment in the event of an unplanned IO change. As noted in Table 3 we recommend the IOPS Limiter with 3,000 IOPS per VMDK with FTT=1 to be the vSAN Default Storage Policy. We also recommend using FTT=2 for both PROD ODB and NON-PROD ODB. This gives the ODB environment the ability to handle two failures within the VxRail cluster prior to the ODB failing over to the DR site. Lastly, we recommend using RAID1 for all VMs and VMDKs in the Database cluster. RAID1 will deliver the performance that the Operational and Analytical databases require.
With mission-critical workloads such as an EHR, we recommend using the default setting of Checksum. vSANuses an end-to-end checksum to ensure the integrity of data by confirming that each copy of a file is exactly the same as the source file. The system checks the validity of the data during read/write operations, and if an error is detected,vSANrepairs the data or reports the error. If a checksum mismatch is detected,vSANautomatically repairs the data by overwriting the incorrect data with the correct data. Also, we recommend disabling Dedupe and Compression as it may impact application response time. The performance requirements for the Operational database are such that it requires the lowest level of latency. Since the Operational database is a flat-file dedupe would not be of benefit.
Software Resources
Table 5. Software Resources
SOFTWARE | VERSION | PURPOSE |
---|---|---|
VxRail HCI System Software |
4.7.300 |
VxRail HCI System Software provides out-of-the-box automation and orchestration for day 0 to day 2 system-based operational tasks |
VMware vSphere and vSAN |
6.7 U3 - 14320388
|
vSphere Cluster to host virtual machines and provide vSAN Cluster. vSAN is a software-defined storage solution for hyperconverged infrastructure. |
VMware vCenter Server |
6.7 U3 - 14367737
|
Centralized platform for managing VMware vSphere environments |
Centos |
7.6.1810 |
Operating system for Operational database |
InterSystems Caché |
2018.1 |
Operational database platform |
InterSystems IRIS |
2019.1 |
Operational database platform |
Microsoft Windows Server |
2016, x64, Standard Edition |
Operating system for Analytical database |
Microsoft SQL Server 2016 |
2016 Enterprise Edition |
Analytical database platform |
GenIO |
1.10.3 |
Epic Test Tool |
DiskSpd |
2.0.17 |
Microsoft test tool for SQL |
VxRail Best Practice:
- Ensure the ESXi and vCenter builds versions are on Epic’s Target Platform
- Ensure the Linux and Windows versions are on Epic’s Target Platform and supported by VMware
- Ensure the Oracle and/or Microsoft SQL Servers versions are on Epic’s Target Platform
- Contact epic@vmware.com prior to conducting any testing or procuring hardware to ensure the success of your project
Check out the following references:
Operational Database Caché Linux VM Best Practices and Layout
One of the key differences between VxRail and traditional 3 tier architecture with Fibre Channel SAN storage is presenting disk to the OS. With SAN storage a RAW Disk mapping is presented, however, with VxRail a VMDK is presented. This greatly reduces the complexity of both administration and troubleshooting. Follow the Epic Storage Quick Reference Guide for the storage layout starting from the VM configuration which remains the same with VxRail.
- Use Multiple PVSCSI Controllers
- Use VMXNET3 NIC
- Distribute the disks across the PVSCSI controllers
- PVSCSI 0: OS/database VMDKs
- PVSCSI 1: Database, /epic/prd, /epic VMDKs
- PVSCSI 2: Database, /epic/prd, /epic VMDKs
- PVSCSI 3: Database, Journal VMDKs
- Configure the IO Scheduler
- Create Volume Groups
- Create Logical Volumes
- Create File Systems
- Mount File Systems
The VM CPU, RAM, and storage layout will be documented in the Hardware Configuration Guide.
Below is our Prod-ODB VM layout as an example.
SQL Cogito VM Best Practices and Layout
As mentioned previously, the disk type presented to the OS is a key difference with VxRail. This greatly reduces the complexity of administration, troubleshooting, and configuration. We follow the Epic Cogito on VMware Architecture document.
- Use Multiple PVSCSI Controllers
- Use VMXNET3 NIC
- Distribute the disks across the PVSCSI controllers
- Configure Windows Disks, except OS, for 64K Block Size
Solution Validation
There are three typical profiles of Epic customers: small, medium, and large.
- Small customers can generate up to 5M global references and <25K IOPs.
- Medium customers generate between 5M to 10M global references and between 25K – 50K IOPs.
- Large customers generate more than 10M global references and >50K IOPs.
Epic requires consistent and predictable response times for their applications. Notably, the ODB has the following requirements:
- Random reads to the ODB, using the file system response time:
- Average read latencies must be 2ms or less
- 99% of read latencies must be below 60ms
- 99.9% of read latencies must be below 200ms
- 99.99% of read latencies must be below 600ms
- Random writes to the ODB, using the file system response time:
- Average write latencies must be 1ms or less
- Average write cycle must be completed <45 seconds
Using the GenIO test tool VxRail performs at expectable levels for both small and medium-size Epic customer environments. While VxRail also performs well for large size customer profiles, Epic has restricted HCI to small and medium customers for initial support.
Conclusion
However, performance is only one factor for such a demanding and dynamic ecosystem; vSAN also delivers both cost-effectiveness and agility. vSAN powered HCI organically changes the EHR ecosystem and exposes the business application owners to architectural efficiency and simplicity by empowering them with deployment choices without sacrificing performance.
By adopting vSAN on Dell EMC VxRail hyperconverged infrastructure, owners can more readily collaborate and architect for the business problems of today and tomorrow.
Reference
About the Author
Christian Rauber, a staff solutions architect in the Hyperconverged Infrastructure, Product Enablement team wrote the original version of this paper.
The following reviewers also contributed to the paper contents:
- Victor Dery, Senior Principal Engineer of VxRail Technical Marketing in Dell EMC
- David Glynn, Senior Principal Engineer of VxRail Technical Marketing in Dell EMC
- George OTooleIii, Senior Advisor, Product Marketing in Dell EMC