Oracle Database 12c on VMware vSAN 6.2 All-Flash

vSAN 6.2
Oracle 12c

Executive Summary

This section covers the Business Case, Solution Overview and Key Results of the Oracle Database 12c on the VMware vSAN 6.2 All-Flash document.

Business Case

Customers deploying Oracle have requirements such as stringent SLAs, continued high performance and application availability. It is a major challenge for organizations to manage data storage in these environments due to the stringent business requirements. Common issues in using traditional storage solutions for business-critical applications include scale-up and scale-out, storage inefficiency, complex management, high deployment and operating costs.

With more and more production servers being virtualized, the demand for highly converged server-based storage is surging. VMware® vSAN™ aims at providing a highly scalable, available, reliable, and high performance storage using cost-effective hardware, specifically direct-attached disks in VMware ESXi™ hosts. vSAN adheres to a new policy-based storage management paradigm, which simplifies and automates complex management workflows that exist in traditional enterprise storage systems with respect to configuration and clustering.

Solution Overview

This solution addresses the common business challenges that CIOs face today in an online transaction processing (OLTP) and decision support system (DSS) environment that requires predictable performance and cost-effective storage. The solution helps customers design and implement optimal configurations specifically for Oracle database on All-Flash vSAN.

Key Results

The following highlights validate that vSAN is an enterprise-class storage solution suitable for running heavy Oracle workloads:

  • Predictable Oracle OLTP and DSS performance on vSAN with high availability.
  • Significant capacity savings with RAID 5 (erasure coding), deduplication and compression with negligible resource overhead and minimal impact on performance, lowering the total cost of ownership (TCO) and increasing price-performance.
  • Storage Policy Based Management (SPBM) to administer storage resources combined with simple design methodology that eliminates operational and maintenance complexity of traditional SAN.
  • Sustainable solution for Tier-1 Database Management System (DBMS) application platform.
  • Validated architecture that reduces implementation and operational risks.

Oracle on All-Flash vSAN Reference Architecture

This section covers the reference architecture that validates the ability of vSAN All-Flash 6.2 to support industry-standard OLTP and DSS workloads in an Oracle environment.

Purpose

This reference architecture validates the ability of vSAN All-Flash 6.2 to support industry-standard OLTP and DSS workloads in an Oracle environment. Oracle on vSAN All-Flash ensures a desired level of storage performance for mission-critical OLTP and DSS workloads.

Scope

This reference architecture:

  1. Demonstrates storage performance and scalability for Oracle OLTP and DSS workloads in a vSAN All-Flash environment.
  2. Illustrates storage efficiency provided by erasure coding, deduplication and compression features of vSAN in an Oracle database environment.
  3. Validates resiliency of vSAN for an Oracle database environment.
  4. Shows vSAN supporting VMware vSphere® vMotion® with workload running against an Oracle database to achieve workload mobility.
  5. Uses VMware vRealize® Operations Manager™ with Management Pack for Storage Devices and Management Pack for Oracle Database to provide a centralized storage and database view for proactive performance management and troubleshooting.

Audience

This reference architecture is intended for Oracle database administrators, virtualization and storage architects involved in planning, architecting, and administering a virtualized Oracle environment with vSAN.

Terminology

This paper includes the following terminologies.

TERM DEFINITION
Oracle Automatic Storage Management (Oracle ASM) Oracle ASM is a volume manager and a file system for Oracle database files.
Oracle Single Instance Oracle Single Instance database consists of a set of memory structures, background processes, and physical database files, which serves the database users.

 Table 1. Terminology

Technology Overview

This section provides an overview of the technologies used in this solution.

Overview

This section provides an overview of the technologies used in this solution:

  • VMware vSphere 6.0 Update 2
  • VMware vSAN 6.2
  • All-flash architecture
  • Deduplication and compression for space efficiency
  • Erasure coding
  • VMware vSAN Stretched Cluster
  • Oracle Database 12c

VMware vSphere 6.0 Update 2

VMware vSphere is the industry-leading virtualization platform for building cloud infrastructures. It enables users to run business-critical applications with confidence and respond quickly to business needs. vSphere accelerates the shift to cloud computing for existing data centers and underpins compatible public cloud offerings, forming the foundation for the industry’s best hybrid cloud model.

VMware vSAN 6.2

VMware vSAN is VMware’s software-defined storage solution for hyperconverged infrastructure, a software-driven architecture that delivers tightly integrated compute, networking, and shared storage from a single virtualized x86 server.

With the major enhancements in vSAN 6.2, vSAN provides enterprise-class scale and performance as well as new capabilities that broaden the applicability of the proven vSAN architecture to business-critical environments. The new features of vSAN 6.2 include:

  • Deduplication and compression: software-based deduplication and compression optimizes all-flash storage capacity, providing as much as 7x data reduction with minimal CPU and memory overhead.
  • Erasure coding: erasure coding increases usable storage capacity by up to 100 percent while keeping data resiliency unchanged. It is capable of tolerating one or two failures with single parity or double parity protection.
  • QoS (Quality of Service) with IOPS limit: policy-driven QoS limits and monitoring of IOPS consumed by specific virtual machines, eliminating noisy neighbor issues and managing performance SLAs. 
  • Software checksum: end-to-end data checksum detects and resolves silent errors to ensure data integrity; also this feature is policy-driven.
  • Client Cache: leverages local dynamic random access memory (DRAM) to virtual machines to accelerate read performance. The amount of memory allocated is 0.4 percent of total host memory up to 1GB per host to local virtual machines.

With these new features, vSAN 6.2 provides the following advantages:

  • VMware HyperConverged Software (HCS)-powered all-flash solutions available at up to 50 percent less than the costs of other competing hybrid solutions in the market.
  • Increased storage utilization by as much as 10x through new data efficiency features including deduplication and compression, and erasure coding.
  • Future-proof IT environments with a single platform supporting business-critical applications, OpenStack, and containers with up to 100K IOPS per node at sub-millisecond latencies.

All-Flash Architecture

All-Flash vSAN aims at delivering extremely high IOPS with predictable low latencies. In all-flash architecture, two different grades of flash devices are commonly used in an All-Flash vSAN configuration: lower capacity and higher endurance devices for the cache layer; more cost-effective, higher capacity, and lower endurance devices for the capacity layer. Writes are performed at the cache layer and then destaged to the capacity layer, only as needed. This helps extend the usable life of lower endurance flash devices in the capacity layer.

All-Flash Architecture

Figure 1. vSAN All-Flash Datastore

Deduplication and Compression for Space Efficiency

Near-line deduplication and compression happens during destaging from the caching tier to the capacity tier. Customers enable “space efficiency” on a cluster level and the deduplication and compression feature happens per disk group basis. Bigger disk groups will result in a higher deduplication ratio. The blocks are compressed after they are deduplicated.

 vSAN All-Flash Datastore

Figure 2. Deduplication and Compression for Space Efficiency

Erasure Coding

Erasure coding provides the same levels of redundancy as mirroring, but with a reduced capacity requirement. In general, erasure coding is a method of taking data, breaking it into multiple pieces and spreading it across multiple devices, while adding parity data so it may be recreated in the event that one or more pieces are corrupted or lost.

In vSAN 6.2, two modes of erasure coding are supported:

  • RAID 5 in 3+1 configuration, which means 3 data blocks and 1 parity block per stripe.
  • RAID 6 in 4+2 configuration, which means 4 data blocks and 2 parity blocks per stripe.

RAID 5

In this case, RAID 5 requires four hosts at a minimum because it uses a 3+1 logic. With four hosts, one can fail without data loss. This results in a significant reduction of required disk capacity. Normally, a 20GB disk would require 40GB of disk capacity in a mirrored protection, but in the case of RAID 5, the requirement is only around 27GB.

Deduplication and Compression for Space Efficiency

Figure 3. RAID 5 Data and Parity Placement

RAID 6

With RAID 6, two host failures can be tolerated. In the RAID 1 scenario for a 20GB disk, the required disk capacity would be 60GB. However, with RAID 6, this is just 30GB. Note that the parity is distributed across all hosts and there is no dedicated parity host. A 4+2 configuration is used in RAID 6, which means that at least six hosts are required in this configuration.

RAID 5 Data and Parity Placement

Figure 4. RAID 6 Data and Parity Placement

Space efficiency features (including deduplication, compression, and erasure coding) work together to provide up to 10x reduction in dataset size.

VMware vSAN Stretched Cluster

vSAN 6.1 introduced the Stretched Cluster feature. vSAN Stretched Cluster provides customers with the ability to deploy a single vSAN Cluster across multiple data centers. vSAN Stretched Cluster is a specific configuration implemented in environments where disaster or downtime avoidance is a key requirement.

vSAN Stretched Cluster builds on the foundation of fault domains. The fault domain feature introduced rack awareness in vSAN 6.0. The feature allows customers to group multiple hosts into failure zones across multiple server racks to ensure that replicas of virtual machine objects are not provisioned onto the same logical failure zones or server racks. vSAN Stretched Cluster requires three failure domains based on three sites (two active sites and one witness site). The witness site is only utilized to host witness virtual appliances that store witness objects and cluster metadata information and also provide cluster quorum services during failure events.

vSAN Stretched Cluster differs from vSAN Cluster in the following perspectives:

Write latency: In a vSAN Cluster, mirrored writes incur the same latency. In a vSAN Stretched Cluster, you need to prepare the write operations at two sites. Therefore, write operation needs to traverse the inter-site link, and thereby incur the inter-site latency. The higher the latency, the longer it takes for write operations to complete.

Read locality: The vSAN Cluster does read operations in a round robin pattern across the mirrored copies of an object. The Stretched Cluster does all reads from the single-object copy available at the local site.

Failure: In the event of any failure, recovery traffic needs to originate from the remote site, which has the only mirrored copy of the object. Thus, all recovery traffic traverses the inter-site link. In addition, since the local copy of the object on a failed node is degraded, all reads to that object are redirected to the remote copy across the inter-site link.

See more information in the VMware vSAN 6.2 Stretched Cluster Guide.

Oracle Database 12c

Oracle database is a relational database management system deployed as a single instance or as RAC (Real Application Clusters), ensuring high availability, scalability, and agility for any application.

Oracle Database 12c provides many new features including multi-tenant architecture that simplifies the process of consolidating databases in the cloud, enabling customers to manage many databases as one without changing their application.

Oracle database accommodates all system types, from data warehouse systems to update-intensive OLTP systems.

Solution Configuration

This section introduces the resources and configurations for the solution including architecture diagram and hardware & software resources.

Overview

This section introduces the resources and configurations for the solution including:

  • Architecture diagram
  • Hardware resources
  • Software resources
  • Network configuration
  • Oracle database VM and database storage configuration

Architecture Diagram

This solution had two architectures: one was vSAN Cluster as shown in Figure 5 and the other was vSAN Stretched Cluster as shown in Figure 25In the configuration of vSAN Stretched Cluster, the same servers used for vSAN Cluster were used but split evenly into two sites.

The key designs for the vSAN Cluster solution for Oracle database were:

  • A 4-node vSAN Cluster with two vSAN disk groups on each ESXi host.
  • Each disk group was created from 1 x 800GB SSD (cache) and 3 x 800GB SSDs (capacity).
  • For vSAN Policy used, see vSAN Configurations Used in this Solution.
  • Two different VM sizes were used:
  • Medium VM—4 vCPU and 64GB memory with Oracle SGA set to 53GB and PGA set to 10GB
  • Large VM—8 vCPU and 96GB memory with Oracle SGA set to 77GB and PGA set to 10GB
  • Oracle Linux 7.0 operating system was used for database VMs.
  • Oracle VM configurations used in the tests.
TEST VM CONFIGURATION
OLTP tests 2 x medium VM
2 x large VM
DSS test 1 x large VM
1 x medium VM
Mixed workload test (OLTP and DSS) 2 x medium VM
2 x large VM
vSAN resiliency and vSphere vMotion test 1 x large VM
vSAN Stretched Cluster test 2 x large VM

 

vSAN Cluster for Oracle Database

Figure 5. vSAN Cluster for Oracle Database

 

Hardware Resources

Table 2 shows the hardware resources used in this solution.

DESCRIPTION SPECIFICATION
Server 4 x ESXi Server
Server Model Cisco UCS-C240-M4S
CPU 2 sockets with 16 core each of Intel CPU E5-2698 v3 2.30GHz with hyper-threading enabled
RAM 384 GB DDR4 RDIMM
Storage controller 1 x 12G SAS Modular RAID Controller
Disks 8 x 800GB SAS SSD, 6GBps
Network 2 x 10Gb port

The storage controller used in the reference architecture supports the pass-through mode. The pass-through mode is the preferred mode for vSAN and it gives vSAN complete control of the local SSDs attached to the storage controller.

Software Resources

Table 3 shows the software resources used in this solution.

Table 3. Software Resources

SOFTWARE VERSION PURPOSE
VMware vCenter Server® and ESXi 6.0 Update 2 ESXi cluster to host virtual machines and provide vSAN Cluster. VMware vCenter Server provides a centralized platform for managing VMware vSphere environments.
VMware vSAN 6.2 Software-defined storage solution for hyperconverged infrastructure
Oracle Linux 7.0 Oracle database server nodes
Oracle Database 12c 12.1.0.2.0 Oracle database
Oracle Workload Generator for DSS Swingbench 2.5 (Sales History)   To generate DSS workload
Oracle Workload Generator for OLTP SLOB 2.3 To generate OLTP workload
XORP (open source routing platform) 1.8.5 To enable multicast traffic routing between vSAN Stretched Cluster hosts in different VLANs (between two sites)

Network Configuration

A VMware vSphere Distributed Switch™ (VDS) acts as a single virtual switch across all associated hosts in the data cluster. This setup allows virtual machines to maintain a consistent network configuration as they migrate across multiple hosts. The vSphere Distributed Switch uses two 10GbE adapters per host as shown in Figure 6.

 vSphere Distributed Switch Port Group Configuration in All-Flash vSAN

 

Figure 6. vSphere Distributed Switch Port Group Configuration in All-Flash vSAN

A port group defines properties regarding security, traffic shaping, and NIC teaming. Jumbo frames (MTU=9000 bytes) was enabled on the vSAN and vSphere vMotion interface and the default port group setting was used. Figure 6 shows the distributed switch port groups created for different functions and the respective active and standby uplinks to balance traffic across the available uplinks. Three port groups were created:

  • VM Management port group for VMs
  • vSAN port group for kernel port used by vSAN traffic
  • vSphere vMotion port group for kernel port used by vSphere vMotion traffic

In the case of vSAN Stretched Cluster, a metropolitan area network was simulated in the lab environment. Figure 7 shows a diagrammatic layout of the setup. Three sites were placed in different VLANs. A Linux VM was configured with three network interfaces each in one of the three VLANs acting as the gateway for inter-VLAN routing between sites. Static routes were configured on ESXi vSAN VMkernel ports for routing between different VLANs (sites). The Linux VM leverages Netem functionality already built into Linux for simulating network latency between the sites. Furthermore, XORP installed on the Linux VM provided support for a multicast traffic between two vSAN fault domains.

 Network Simulation for 2+2+1 Stretched Cluster in a Lab Environment

Figure 7. Network Simulation for 2+2+1 Stretched Cluster in a Lab Environment

Figure 8 shows the VMware vSphere Distributed Switch port group configuration. Additional port groups 4024 and 4025 for site B and Site C respectively were created to separate the vSAN traffic in different sites.

For vSAN network design best practices, see VMware vSAN 6.2 Network Design Guide.

VMware vSAN 6.2 Network Design Guide.

VDS Port Group Configuration in the All-Flash vSAN Stretched Cluster

Figure 8. VDS Port Group Configuration in the All-Flash vSAN Stretched Cluster

Oracle Database VM and DB Storage Configuration

Oracle Single Instance Database VM was installed with Oracle Linux 7.0 and was configured as follows:

  • Medium VM—4 vCPU and 64GB memory
  • Large VM—8 vCPU and 96GB memory

Oracle ASM data disk group with external redundancy was configured with allocation unit size of 1M. Data and redo ASM disk groups were presented on different PVSCSI controllers.

See Oracle best practices in the Best Practices of Oracle Database on All-Flash vSAN chapter.

Table 4 provides Oracle VM disk layout and ASM disk group configuration.

Table 4 Oracle Database VM Disk Layout

NAME SCSI TYPE SCSI ID (CONTROLLER, LUN) SIZE (GB) ASM DISK GROUP
Operating System (OS) and Oracle binary disk LSI Logic SCSI (0:0) 50 Not Applicable
Database data disk 1 Paravirtual SCSI (1:0) 100 DATA
Database data disk 2 Paravirtual SCSI (1:1) 100 DATA
Database data disk 3 Paravirtual SCSI (2:0) 100 DATA
Database data disk 4 Paravirtual SCSI (2:1) 100 DATA
Online redo disk 1 Paravirtual SCSI (3:0) 20 REDO
Online redo disk 2 Paravirtual SCSI (3:1) 20 REDO

Solution Validation

In this section, we present the test methodologies and processes used in this operation guide.

Test Overview

The solution designed and deployed Oracle Single Instance Database on a vSAN Cluster focusing on ease of use, performance, resiliency, and availability. In this section, we  present the test methodologies and processes used in this reference architecture.

The solution validates the performance and functionality of Oracle database running in a vSAN environment.

The solution tests include:

  • vSAN performance with OLTP and DSS workloads by using new features in vSAN 6.2:
  • Oracle workload testing using SLOB to generate OLTP-like  workload.
  • Oracle workload testing using Swingbench to generate DSS-like  workload.
  • vSAN resiliency during failures.
  • vSphere vMotion on vSAN.
  • vSAN Stretched Cluster for continuous data availability during a site failure.
  • vSAN and Oracle health and performance management using VMware vRealize Operations Manager™.

 

Test and Performance Data Collection Tools

Test Tools and Configuration

Oracle OLTP Workload

SLOB is an Oracle workload generator designed to stress test storage I/O capability, specifically for Oracle database using OLTP workload. SLOB is not a traditional transactional benchmark tool. We used it to validate performance of the storage subsystem without application contention.

SLOB and Database Configuration

  • Two medium VM (4 vCPU and 64GB memory) and two large VM (8 vCPU and 96GB memory).
  • Each VM was on a separate ESXi host of a 4-node cluster.
  • Each VM with a 300GB database.
  • SLOB multiple schema model was used for testing. This method of SLOB testing is a form of multitenant testing.
  • Two different SLOB configurations were used:
  • “SLOB Medium workload”, which has number of users set to 48 with think time frequency set to 5 to hit each database with maximum requests concurrently to generate OLTP workload
  • “SLOB Heavy workload”, which has number of users set to 128 with think time frequency set to 0 to hit each database with maximum requests concurrently to generate extremely intensive OLTP workload
  • Detailed SLOB configuration is included in Appendix A SLOB Configuration.
  • Workload is a mix of 75 percent reads and 25 percent writes to mimic a transactional database workload.
  • Except in the OLTP “SLOB Medium workload” baseline test, “SLOB Heavy workload” configuration was used for all SLOB tests.

Oracle DSS Workload 

In this reference architecture, we used Swingbench DSS to generate I/O throughput intensive DSS workload.

Swingbench DSS Sales History is an Oracle workload generator designed to test the throughput of the underlying storage.

Swingbench and Database Configuration

  • One medium VM (4 vCPU and 64GB memory) and one large VM (8 vCPU and 96GB memory).
  • Each VM was on a separate ESXi host.
  • A 350GB Swingbench Sales History schema was created in each database VM.
  • Sales History workload using Swingbench is by default 100 percent read IO throughput intensive.
  • The default Sales History configuration file with 24 users was used with the following transactions:
  • Sales Rollup by Month and Channel
  • Sales Cube by Month and Channel
  • Sales Cube by Week and Channel
  • Product Sales Cube and Rollup by Month
  • Sales within Quarter by Country
  • Sales within Week by Country

Oracle Mixed Workload (OLTP and DSS)

We used SLOB to generate Oracle OLTP workload and Swingbench to generate DSS workload concurrently.

During this test, four Oracle database VMs were online with one on each ESXi host. OLTP workloads were on two VMs and DSS workloads were on the other two VMs.

Mixed Configuration 

  • One medium VM (4 vCPU and 64GB memory) and one large VM (8 vCPU and 96GB memory) running SLOB with a 300GB database.
  • One medium VM (4 vCPU and 64GB memory) and one large VM (8 vCPU and 96GB memory) running Swingbench with a 350GB Sales History schema.
  • Each VM was on a separate ESXi host of a 4-node cluster.
  • Same “SLOB Heavy workload” and Swingbench configurations described in the previous sections applied.

Performance Metrics Data Collection Tools

We measured three important workload metrics in all tests: I/Os per second (IOPS), average latency of each IO operation (ms), and IO throughput (MB/s). IOPS and average latency metrics are important for OLTP workload. IO throughput is a key metric in DSS workload.

 We used the following testing and monitoring tools in this solution:

  • vSAN Observer

vSAN Observer is designed to capture performance statistics and bandwidth for a VMware vSAN Cluster. It provides an in-depth snapshot of IOPS, bandwidth and latencies at different layers of vSAN, read cache hits and misses ratio, outstanding I/Os and congestion. This information is provided at different layers in the vSAN stack to help troubleshoot storage performance. For more information about the VMware vSAN Observer, see the Monitoring VMware vSAN with vSAN Observer documentation.

  • esxtop utility

esxtop is a command-line utility that provides a detailed view on the ESXi resource usage. Refer to the VMware Knowledge Base Article 1008205 for more information.

  • Oracle AWR reports with Automatic Database Diagnostic Monitor (ADDM)

Automatic Workload Repository (AWR) collects, processes, and maintains performance statistics for problem detection and self-tuning purposes for Oracle database. This tool can generate report for analyzing Oracle performance. The Automatic Database Diagnostic Monitor (ADDM) analyzes data in AWR to identify potential performance bottlenecks. For each of the identified issues, it locates the root cause and provides recommendations for correcting the problem.

vSAN Configurations Used in this Solution

Several vSAN 6.2 feature combinations were used during the tests. Table 5 shows the abbreviations used to represent the feature configurations.

Table 5. Feature Configurations and Abbreviations

NAME RAID LEVEL CHECKSUM DEDUPLICATION AND COMPRESSION
(SPACE EFFICIENCY)
R1 1 Deactivated No
R1+C 1 Enabled No
R1+C+SE 1 Enabled Yes
R15+C+SE[4] Data disks—5
OS disks—5
Redo disks—1
Enabled Yes

Unless otherwise specified in the test, the vSAN Cluster was designed with the following common configuration parameters:

  • Default Failure to Tolerate (FTT) of 1
  • Default Object Space reservation set to 0 percent
  • Stripe width of 1
  • Default cache policies were used and no cache reservation was set

Oracle OLTP Performance on vSAN

Test Overview

This test focused on extremely heavy Oracle OLTP workload on vSAN. SLOB was used to stress four Oracle databases concurrently in the vSAN Cluster. A 300GB SLOB database was loaded on each of the four Oracle VMs (two large VMs and two medium VMs). A VM was placed on every ESXi host in the 4-node vSAN Cluster as shown in Figure 5. The same workload was running on all four database VMs for 60 minutes. The baseline tests used R1 vSAN configuration.

Two different SLOB workload tests were run as part of the baseline test:

  • SLOB Medium workload

SLOB Medium workload with number of users set to 48 with think time frequency set to 5 to hit each database with maximum requests concurrently to generate OLTP workload.

  • SLOB Heavy workload

SLOB Heavy workload with number of users set to 128 with think time frequency set to 0 to hit each database with maximum requests concurrently to generate extremely intensive OLTP workload.

While users can use SLOB to simulate a realistic database workload, we chose to stress the vSAN Cluster by setting the number of users to 128 without any think time to hit each database with the most intensive database requests.

To understand the performance and resource utilization impact introduced by the new vSAN features under extremely intensive OLTP workload, SLOB Heavy workload was used to test different vSAN configurations as shown in Table 5.

Baseline Test Results and Observations

SLOB Medium Workload Results

We measured the key metrics for the OLTP workload. Figure 9 shows the IOPS generated by four Oracle database VMs during the test. The IOPS reached a peak of over 65 thousand with an average IOPS of 64 thousand. There was a 20-minute warm-up period and the performance reached a steady state after that. Notice the workload was a mix of 75 percent read and 25 percent write IOPS, which mimicked a transactional database workload.

This shows that vSAN provides reliable performance for a business-critical application such as Oracle database for IO intensive workload.

 vSAN IOPS in R1 Configuration Test with SLOB Medium Workload

Figure 9. vSAN IOPS in R1 Configuration Test with SLOB Medium Workload

Latency in an OLTP test is a critical metric of how well the workload is running. Lower IO latency reduces the time CPU waits for IO completion and improves application performance. Figure 10 shows the average read latency was 1.3ms and the average write latency was 6.1ms during IO intensive workload.

vSAN Average Latency during R1 Configuration Test with SLOB Medium Workload

Figure 10. vSAN Average Latency during R1 Configuration Test with SLOB Medium Workload

SLOB Heavy Workload Results

We measured the key metrics for extremely intensive Oracle OLTP workload. Figure 11 shows the IOPS generated by four Oracle database VMs during the test. The IOPS reached a peak of over 100 thousands with an average IOPS of 95 thousand. There was a 20-minute warm-up period and the performance reached a steady state after that. Notice the workload was a mix of 75 percent read and 25 percent write IOPS, which mimicked a transactional database workload.

This shows that vSAN provides reliable performance for a business-critical application such as Oracle database despite high intensity of the workload.

 vSAN IOPS in R1 Configuration Test with SLOB Heavy Workload

Figure 11. vSAN IOPS in R1 Configuration Test with SLOB Heavy Workload

Latency was relatively low for this solution considering the extremely heavy IOs generated concurrently by four Oracle databases. Figure 12 shows the average read latency was 4ms and the average write latency was 17ms during a peak workload scenario, more realistic real-world database environments running in steady state will see much lower latencies.

Figure 12. vSAN Average Latency during R1 Configuration Test with SLOB Heavy Workload

 

Compare Baseline Configuration with Other vSAN Configurations

vSAN 6.2 introduced a host of new features like checksum and built-in data reduction technologies including erasure coding, deduplication and compression.

To understand the performance and resource utilization impact introduced by the new features, SLOB Heavy workload was used and we compared the baseline configuration (R1) with the other three vSAN configurations shown in Table 5.

Figure 13 shows the IO metrics comparison while the space efficiency and CPU utilization comparisons were shown in Figure 14 and Figure 15 respectively.

  • In R1+C configuration, software-based checksum feature was enabled to ensure data integrity. Average IOPS reduced from 95 thousand to 84 thousand, which was a reduction of 11 percent. In terms of capacity overhead, checksum has extra but negligible meta-data. Hence, the capacity used by R1+C was almost same as the baseline R1 configuration.
  • In R1+C+SE configuration, along with checksum, deduplication and compression was enabled. Average IOPS reduced from 95 thousand to 65 thousand with a reduction of 31 percent. However, this configuration provided a space saving of 45 percent.
  • In R15+C+SE configuration, erasure coding (RAID 5) feature was also used along with deduplication and compression. In this test, the IOPS observed was 72 thousand, which was a reduction of 24 percent comparing to that of baseline (R1). This configuration provided the best space efficiency with 50 percent space saving.

Figure 13 shows the latency under different vSAN configurations. They are tested under peak IO utilization scenario with concurrent workload from four OLTP database VMs. Typical real-world environments should have lower I/O latencies.

In case of latency-sensitive application, use RAID 1 (Mirror) for data and redo disks; otherwise, use RAID 5 (erasure coding) for data and RAID 1 for redo to provide space efficiency with reasonable trade off of performance.

OLTP Workload IO Metrics Comparison with Different vSAN 6.2 Features

Figure 13. OLTP Workload IO Metrics Comparison with Different vSAN 6.2 Features

Overall, while erasure coding provides a predictable amount of space savings, deduplication and compression provides a varying amount of reduction in capacity depending upon the workload and vSAN disk group configuration. Because the domain for deduplication is at the disk group level, smaller number of large disk groups typically yield higher overall deduplication ratios than larger number of smaller disk groups. The disadvantage of having smaller number of large disk groups is less write-buffer capacity relative to disk group size and more data migration and resync traffic during maintenance operations (disk replacement, failure). If database native compression is used, vSAN compression may provide reduced benefits. The space saving obtained due to deduplication and compression is highly dependent on the application workload and data set composition.

Space Efficiency with Different vSAN 6.2 Features

Figure 14. Space Efficiency with Different vSAN 6.2 Features

Under Oracle OLTP workload, we observed the resource overhead (CPU and memory) because space efficiency and checksum features was minimal. Figure 15 shows the average ESXi CPU utilization across the tests: it was between 20 percent and 25.6 percent. In the case of memory, the overhead was negligible.

Average ESXi Host CPU Utilization

Figure 15. Average ESXi Host CPU Utilization

Summary

The figures above show various heavy OLTP workload tests with different vSAN configurations. Table 6 summarizes all the test results.

The IOPS and latency data in the table are from vSAN Observer. Matching IOPS and latency data was observed from the Linux Operating system iostat command in each database VM.

The performance data is a result of the combination of hardware configuration, software configuration, test methodology, test tool, and workload profile used in the testing. Our test applied extreme stress workload to demonstrate what a 4-node vSAN cluster is capable of in terms of storage performance with four databases running concurrently. We set SLOB configurations to generate the heaviest load possible by using 128 users with zero think time to hit each of the four databases. The objective is to measure the maximum IOPS with reasonable latency during a peak utilization scenario, typical real-world database environments running at steady state should have much lower latencies. In fact, you will see later in the mixed workload test results that the average latency of two OLTP databases drops to single digit milliseconds even with two DSS databases running concurrently.

Table 6. Summary of OLTP Workload Tests and Key Metrics 

vSAN CONFIGURATION AVERAGE TOTAL IOPS AVERAGE READ IOPS AVERAGE READ LATENCY (MS) AVERAGE WRITE IOPS AVERAGE WRITE LATENCY (MS) SPACE EFFICIENCY (%) AVERAGE ESXI CPU UTILIZATION (%)
R1 94,627 72,320 3.9 22,307 16.9 None 22.5
R1+C 84,108 64,266 4 19,842 18.3 None 23.7
R1+C+SE 64,853 49,608 9.7 15,245 13.4 44 20.0
R15+C+SE 72,033 55,082 6.3 16,951 23.6 52 25.6

Oracle DSS Workload on vSAN

Test Overview

This test focused on Oracle DSS workload performance on vSAN. Swingbench was used to generate DSS workload. A 350GB database schema was loaded on two Oracle VMs (large VMs). Each VM resided on a separate ESXi host of a 4-node vSAN Cluster.

Based on the OLTP testing, R15+C+SE vSAN configuration was chosen to provide required performance as well as space efficiency in DSS testing. The same DSS workload was run on two database VMs concurrently for 60 minutes. The objective of this test is to measure the IO throughput in vSAN during this workload.

Test Results and Observations

Figure 16 shows the aggregate IO throughput in the Oracle database VMs during this run. This is a 100 percent read-intensive workload. We observed the peak IO throughput was 935 MB/s and the average IO throughput was 636 MB/s.

At the database level as observed from the AWR reports of both the database, the total “physical read total bytes” of the combined database was 644MB/s. This test proves that vSAN is a feasible solution not only for OLTP systems but also for IO throughput-intensive DSS workload. 

IO Throughput during DSS Test

Figure 16. IO Throughput during DSS Test

Oracle OLTP and DSS mixed workload on vSAN

Test Overview

For a well performing enterprise storage system, it should be able to handle OLTP and DSS workloads at the same time with predictable IOPS, throughput, and latency. In this test, OLTP and DSS workloads were applied concurrently on vSAN using SLOB and Swingbench respectively.

We used four Oracle database VMs with one VM on each ESXi host. We ran OLTP workload on a medium VM and a large VM; similarly, we ran DSS workload on a medium VM and a large VM. Both workloads were running concurrently for 60 minutes. We used the same configuration as used in the previous DSS test: R15+C+SE vSAN configuration for the best combination of performance and storage efficiency. This configuration included all the key features such as checksum, RAID 5, deduplication and compression.

Test Results and Observations

With both workloads running concurrently, the IO throughput reached a peak throughput of 1,526 MB/s and an average throughput of 1,000 MB/s as shown in Figure 17. The peak IOPS during this test was 64,036 and the average was 49,774 as shown in Figure 18.  

Another key metric for OLTP performance is to have predictable latency so the transactions are processed fast. Figure 19 shows the average IO latency from OLTP VMs. The read latency was 2.6ms and the write latency was 9ms. Even though DSS workload was also running, the latency on OLTP VMs had no or minimal impact. These results demonstrate that vSAN All-Flash is an ideal HCI platform for mixed workloads.

IO Throughput during Mixed Workload

Figure 17. IO Throughput during Mixed Workload

 

Total IOPS during Mixed Workload

Figure 18. Total IOPS during Mixed Workload

 

OLTP DB VM Average Latency during Mixed Workload

Figure 19. OLTP DB VM Average Latency during Mixed Workload

vSAN Resiliency

Test Overview

This section validates vSAN resiliency and impact on Oracle database when handling disk and host failures. We designed the following scenarios to emulate potential real-world component failures during OLTP workload.

During this test, an Oracle database on a large VM was used. We ran a baseline test with a steady OLTP workload for 30 minutes without any failure to compare with the failure test results. The vSAN configuration used for this testing was R15+C+SE. This policy provides the benefits of space efficiency while maintaining the performance level required for business-critical Oracle database. The same heavy SLOB workload configuration was applied.

We tested a disk failure scenario and a host failure scenario. With the introduction of deduplication and compression in vSAN 6.2, there is a behavior change on how a disk failure affects a cluster. In a cluster without deduplication and compression enabled, a capacity disk failure only affects the components on the disk. If deduplication and compression is enabled, the whole disk group is affected and a disk group failure occurs. In a deduplication and compression enabled cluster, a single disk failure at the capacity layer is also considered as a complete disk group failure like a cache-layer disk failure in vSAN Cluster.

Disk group failure

In this test, vSAN is enabled with deduplication and compression. A disk failure either in cache or capacity tier results in a complete disk group failure. We used disk-fault-injection script to generate a permanent disk failure on a capacity tier SSD to simulate a disk group failure. A host selected for this disk failure did not host the Oracle database VM. However, the SSD that was failed had components of the data and redo disk objects. A disk failure was introduced at the 15th minute of the 30-minute test run, and the impact on IO performance was measured.

Host failure

In this test, one of the ESXi hosts was shut down abruptly using the Cisco UCS manager during workload to simulate host failure. The host that was powered down did not run the Oracle database VM. The host failure was introduced at the 15th minute of the 30-minute test run, and the impact on IO performance was measured.

Test Results and Observations

In the case of disk group failure, as soon as the permanent disk error was injected on a disk, the disk group failed. After the failure occurred, vSAN rebuilt the failed objects to be compliant with the protection policy. The average IOPS drop of 14 percent was recorded comparing to the scenario without failure as shown in Figure 20. However, the virtual machine objects and components remained accessible and the transactions continued.

This disk group failure was caused by a permanent disk error injection so the rebuild traffic started immediately. Figure 21 shows the rebuild traffic after the error was injected, the background rebuild started immediately and completed after 36 minutes. When the SLOB workload was running, the background rebuild rate was low at an average of 50.6 MB/s and once the SLOB workload finished, vSAN increased the rebuild rate to 89.8 MB/s. vSAN automatically uses this intelligent prioritization and resync/rebuild throttling mechanism to reduce the impact on production workloads. The rebuild and resynchronization time depends on the amount of data that needs to be rebuilt or resynchronized as well as other factors including production workload level and cluster capacity in terms of compute and disk group configurations.

In the case of host failure after the host was abruptly powered down, Oracle database continued to serve transactions. However, the IO performance was affected more because both disk groups on the host failed. As shown in Figure 20, the average IOPS dropped by 28 percent comparing to the scenario without failure. Because there might be cases of host reboot due to maintenance or upgrade, the rebuild start time is governed by the default repair delay time, which is 60 minutes. This helps to avoid unnecessary data rebuild and resynchronization during the planned maintenance of hosts. The default repair delay value can be modified. See the VMware Knowledge Base Article 2075456 for steps to change it.

Figure 22 shows the IO latency based on Oracle disk type. The write latency of redo disk with disk group failure was slightly higher due to rebuild traffic in the background. In the case of the host failure, there was no immediate rebuild and thus no impact on write latency. However, the data disk read latency increased due to the failure of two disk groups. The variations in IO latency with and without failure was minimal during these failure tests. None of the tests reported IO error in the Linux VM or Oracle user-session disconnections, which demonstrated the resiliency of vSAN during component failures.

IOPS Comparison with and without Failure

Figure 20 . IOPS Comparison with and without Failure

 

vSAN Traffic Resync during Disk Group Failure

Figure 21. vSAN Traffic Resync during Disk Group Failure

 

Oracle Disk Latency Comparison with and without Failure

Figure 22. Oracle Disk Latency Comparison with and without Failure

vSphere vMotion on vSAN

Test Overview

vSphere vMotion live migration allows moving an entire virtual machine from one physical server to another without downtime. We used this feature to migrate Oracle database instance seamlessly between the ESXi hosts in a vSAN Cluster.

In this test, we performed vSphere vMotion to migrate one of the Oracle database VMs (large VM) from one ESXi host to another (ESXi 2 to ESXi 3) in the vSAN Cluster as shown in Figure 23. The migration was initiated while there was SLOB OLTP workload running against the database. We used the R15+C+SE vSAN configuration and the test was running for 30 minutes. The same heavy SLOB workload configuration was applied.

vMotion Migration of Oracle Database VM on vSAN

Figure 23. vMotion Migration of Oracle Database VM on vSAN

Test Results and Observations

While SLOB was generating OLTP workload, vMotion was initiated to migrate the Oracle database VM from ESXi 2 to ESXi 3 in vSAN Cluster. The migration took around 201 seconds. During vMotion, there was a momentary reduction in IOPS during the last phase of the migration and the IOPS got back to the normal level afterwards. As shown in Figure 24, we compared the result of vMotion with a similar test without any vMotion operation for the same duration of time. The average IOPS without vMotion was 54,708 and it was 49,563 with one vMotion operation. The average IOPS drop during the vMotion test was 9 percent comparing to that of the scenario without vMotion. This test demonstrated mobility of Oracle database VMs deployed in a vSAN Cluster using vMotion.

IOPS Comparison with and without vMotion

Figure 24. IOPS Comparison with and without vMotion

 

Oracle Database 12c on vSAN Stretched Cluster

Test Configuration and Overview

We set up vSAN Stretched Cluster using five ESXi hosts: two at site A, two at site B, and a Witness ESXi Appliance at Site C. We used two Oracle database large VMs: one VM at site A and the other VM at site B as shown in Figure 25. We used R1+C+SE configuration in this test, which had checksum, deduplication and compression enabled. Erasure coding is not supported on vSAN Stretched Cluster because it needs four fault domains for RAID 5 while there are only three fault domains in the vSAN Stretched Cluster configuration.

We used SLOB to generate OLTP workload in both database VMs concurrently for 30 minutes. The same heavy SLOB workload configuration was applied. The inter-site round trip latency between data sites varied from 2ms to 4ms, and to understand the performance impact by such latency, inter-site round trip latency from the witness to data sites was kept constant at 200ms. In the tests, we only modified the latency between the data sites.

Stretched Cluster Configuration with Oracle Database

Figure 25. Stretched Cluster Configuration with Oracle Database

 

Stretched Cluster Inter-Site Latency Test Results and Observations

During these tests, we observed the IO metrics on the database VMs. A vSAN Stretched Cluster without simulated WAN latency between data sites was used as the baseline for performance comparison. With the increase in inter-site latency, we observed a small decrease in IOPS and an increase in IO latency. As shown in Figure 26, the IOPS dropped 6 percent and 7 percent with 2ms and 4ms inter-site latency respectively. We also observed the increased IO latency, which was more visible for write IO comparing to read IO. By default, vSAN Stretched Cluster reads locally within the site. However, write operations need to be mirrored at both sites before the acknowledgement. This explains that the VM write latency increases when the inter-site latency increases. The behavior of read in stretched cluster is governed by the advanced setting parameter “DOMOwnerForceWarmCache” and by default it is set to read locally which is ‘0’ (False). This behavior is recommended for stretched cluster as it removes additional overhead of reads traversing the inter-site link. For more information about this setting, see Read Locality in vSAN. vSAN supports inter-site latency up to 5ms, the distance between data sites is primarily dictated by the sensitivity of the application to IO latency. For guidance on calculating the network bandwidth requirements between sites, see VMware vSAN Stretched Cluster Bandwidth Sizing Guidance.

Stretched Cluster IO Metrics under Different Inter-site Latency

Figure 26. Stretched Cluster IO Metrics under Different Inter-site Latency

 

vSAN Stretched Cluster Resiliency during Site Failure

Site Failure Test Overview

This test demonstrated one of the powerful features of the vSAN Stretched Cluster: maintaining data availability even under the impact of a complete site failure.

In this test, we used two Oracle database large VMs: VM1 at site A and VM2 at site B. We used SLOB to generate OLTP workload in both database VMs concurrently for a period of 60 minutes. After 20 minutes, site A was failed by powering off both ESXi hosts at the site as shown in Figure 27.

Stretched Cluster Site Failure

Figure 27. Stretched Cluster Site Failure

Site Failure Test Results and Continuous Data Availability

After site A was down, the database VM (VM1) at site A was offline temporarily before it was restarted at site B. However, database VM (VM2) at site B continued transactions. The site outage did not affect data availability because a copy of all the data at site A existed at site B so VM2 continued transactions. The IOPS on vSAN Stretched Cluster during this test was shown in Figure 28. At the beginning, there was no failure and Oracle workload was running in both VMs. After running the test for 20 minutes, site A failure caused VM1 to the offline state temporarily. From the 20th to the 29th minute, the workload in the cluster was only from VM2 at site B, during which VM1 was restarted at site B by vSphere HA. VMware vSphere Distributed Resource Scheduler™ (DRS) placed VM1 on the ESXi host at site B that did not have any workload. Subsequently, SLOB workload was resumed in VM1 and both VMs continued with workloads.

Figure 29 shows the average IOPS on the cluster before (from 0 to 20th minutes) and after (30th to 60th minutes) site failure when the OLTP workload was running. As shown in Figure 28, the IOPS reduced and read latency increased after the site failure because less vSAN disk groups were available due to two failed hosts at site A. However, the write latency was improved, which came at a cost of not mirroring the data across sites for protection during write operations.

After the test was completed, site A was brought back by powering on both ESXi hosts. The vSAN Stretched Cluster started the process of site A resynchronization with the changed components at site B after the failure. The results demonstrated how vSAN Stretched Cluster provided storage high availability during site failure by automating the failover and failback process leveraging vSphere HA and vSphere DRS. This proves vSAN Stretched Cluster’s ability to survive a complete site failure in an Oracle database environment.

Conversely, in an Oracle Real Application Clusters (RAC) environment, a near zero RPO and RTO can be provided for application by leveraging the application-level high availability from Oracle RAC and storage-based availability from vSAN Stretched Cluster.

For information about this Oracle Extended RAC on vSAN Stretched Cluster solution, see Oracle Real Application Clusters on VMware vSAN.   

IOPS of the Stretched Cluster Site Failure Test        

Figure 28. IOPS of the Stretched Cluster Site Failure Test        

IO Metrics Comparison before and after Site Failure in the Stretched Cluster

Figure 29. IO Metrics Comparison before and after Site Failure in the Stretched Cluster

Best Practices for Site Failure and Recovery

Performance after site failure depends on the adequate resources such as CPU and memory to be available to accommodate the virtual machines that are restarted by vSphere HA on the surviving site.

In the event of a site failure and subsequent recovery, vSAN will wait for some additional time for all hosts to become ready at the failed site before it starts to synchronize components. This avoids repeatedly resynchronizing a large amount of data across the sites. Therefore, instead of bringing up the failed vSAN hosts in a staggered fashion, it is recommended to bring all hosts online approximately at the same time. After the site is recovered, it is also recommended to wait for the recovery traffic to complete before migrating virtual machines to the recovered site. Therefore, it is recommended to change vSphere DRS policy from fully automated to partially automated in the event of a site failure.

Oracle Database 12c on VSAN Centralized Management

Test Overview and Setup

We used vRealize Operations Manager to monitor vSAN Cluster and an Oracle database VM. vRealize Operations Management Pack for Storage Devices and Blue Medora Management Pack for Oracle Database were installed to provide insights into vSAN and Oracle database.

  • vRealize Operations Management Pack for Storage Devices: provides visibility into vSAN storage environment for performance, capacity, and health monitoring. Predefined dashboards allow you to follow the path from a virtual machine to vSAN and identify any problems that might exist along that path.
  • Blue Medora Management Pack for Oracle Database: extends VMware vRealize Operations by enabling customers to gain comprehensive visibility and insights into the performance, capacity, and health of their Oracle database workloads. With the Management Pack for Oracle Database, administrators are able to manage, monitor, and troubleshoot their entire Oracle database environment within a single console.

Leveraging these management packs extends the capability of vRealize Operations Manager to provide an end-to-end view of the solution, to take informed decisions, and to avoid costly application downtime.

To generate performance alerts in the vRealize Operations Manager, we ran an OLTP workload on an Oracle database using SLOB for two hours.

Test Results and Observations

This section shows some of the ready-to-use dashboards available for health, performance monitoring, and troubleshooting for vSAN and Oracle database. We recorded the dashboard views after OLTP workload was run on an Oracle database backed by an All-Flash vSAN Cluster.

Centralized Dashboard View

Get global visibility across vSAN Clusters for monitoring and proactive alerts and notifications on an ongoing basis. As shown in Figure 30, this view provides the alerts generated from vSAN and Oracle 12c Database. During a heavy OLTP workload, notice the warning alerts from Oracle for “High executions” and “High Redo Generated”.

Dashboard View Showing the Health and Performance Alerts in a Centralized Pane

Figure 30. Dashboard View Showing the Health and Performance Alerts in a Centralized Pane

IO Metrics at VM and Oracle Database Level with OLTP Workload

vRealize Operations Manager with vSAN Management Pack for Storage Devices provides IO metrics at the VM level and at the VMDK (Virtual disk) level. The Blue Medora Management Pack for Oracle Database provides workload statistics from the database level, which can be read through the AWR reports by DBAs.

This will help to collect performance, capacity reports, and troubleshooting from both storage and database levels.

Figure 31 shows the VM level IO metrics during the two-hour OLTP run. During the same period, Figure 32 shows the database-level IOPS. The average IOPS was 60 thousand at the database level and VM level during this period of time. This end-to-end view and correlation can help identify key trends and troubleshoot bottlenecks.

VM Level Read and Write IOPS

Figure 31. VM Level Read and Write IOPS

Oracle Database Read and Write IOPS

Figure 32. Oracle Database Read and Write IOPS

Figure 33 illustrates the virtual machine level IO latency during the OLTP workload. The average write latency was 8ms and the average read latency was 1.5ms.

Virtual Machine Level IO Latency

Figure 33. Virtual Machine Level IO Latency

Best Practices of Oracle Database on All-Flash vSAN

This section highlights the best practices to be followed for Oracle Database on All-Flash vSAN.

Best Practices of Oracle Database on All-Flash vSAN

A well designed and deployed vSAN is key to a successful implementation of mission-critical Oracle database. The focus of this reference architecture is vSAN best practices for Oracle database. For information about setting up Oracle database on VMware vSphere, refer to the Oracle Databases on VMware Best Practices Guide along with vSphere Performance Best Practices Guide for specific version of vSphere.

vSAN All-Flash Configuration Guidelines

vSAN 6.2 Design and Sizing Guide provides a comprehensive set of guidelines for designing vSAN. A few key guidelines relevant to Oracle database are provided below:

  • vSAN is distributed object-store datastore formed from locally attached devices from the ESXi host. It uses disk groups to pool together flash devices as single management constructs. Therefore, it is recommended to use similarly configured and sized ESXi hosts for vSAN Cluster to avoid imbalance. For scale-ups, consider an initial deployment with enough cache tier to accommodate future requirements. For future capacity addition, create disk groups with similar configuration and sizing. This ensures a balance of virtual machine storage components across the cluster of disks and hosts.
  • Design for availability. Depending on the Protection (FTT) setting, design with additional host and capacity that enable the cluster to be automatically recovered in the event of a failure and to be able to maintain a desired level of performance.
  • vSAN SPBM provides storage policy management at virtual machine object level. Leverage it to turn on specific features like checksum, erasure coding, and QoS for required objects.
  • In case of very latency-sensitive application, use RAID 1 (Mirror) for data and redo disks. Otherwise, use RAID 5 (erasure coding) for data disk and RAID 1 for redo to provide the balance between space efficiency and performance. Erasure coding can be independently applied to different virtual machine objects, which provides simplicity and flexibility to configure database workloads.
  • Deduplication and compression in vSAN can help provide space efficiency and can be used in case application-level compression was not used. The space saving obtained due to deduplication and compression is specific to the application workload and data set composition. Since the domain for deduplication is at the disk group level, smaller number of large disk groups typically yield higher overall deduplication ratios than larger number of smaller disk groups do.
  • With the increase in stripe width, you may notice IO performance improvement because objects spread across more vSAN disk groups and disks. However, in a solution like Oracle where we recommend multiple VMDKs for the database, the database is spread across vSAN Cluster components even with the default stripe width of 1 for each of the VMDKs, essentially achieving the same objective of a larger stripe width for one large VMDK holding the entire database. So increasing the vSAN stripe width might not provide tangible benefits. Moreover, there is an additional Oracle ASM striping at the Oracle VM level as well. Therefore, it is recommended to use the default stripe width of 1 unless there are performance issues observed during the destaging process.

Conclusion

This section provides a summary on how the reference architecture validates vSAN to be an HCI platform capable of delivering scalability, and high-performance to Oracle database environments.

Conclusion

vSAN is a cost-effective and high-performance HCI platform that is rapidly deployed, easy to manage, and fully integrated into the industry-leading VMware vSphere platform.

In this solution, we ran extremely heavy Oracle workload with space efficiency and data integrity features enabled and demonstrated that vSAN provided excellent performance with minor resource overhead, while significantly lowering TCO of the solution. The mixed workload test results validated vSAN as a viable platform for running both OLTP and DSS Oracle workloads together.

We verified vSAN’s resiliency and high availability capabilities under heavy workloads, which proves vSAN to be a great solution for running business-critical Oracle database.

In summary, this reference architecture validates vSAN to be an HCI platform that is capable of delivering scalability, resiliency, availability, and high-performance to Oracle database environments.

Appendix A SLOB Configuration

This section provides informations on the SLOB configuration file we used in our testing.

Appendix A SLOB Configuration

The following file is the SLOB configuration file we used in our testing:

SLOB Medium Workload

UPDATE_PCT=25
RUN_TIME=3600
WORK_LOOP=0
SCALE=210000
WORK_UNIT=64
REDO_STRESS=LITE
LOAD_PARALLEL_DEGREE=4
THREADS_PER_SCHEMA=1
# Settings for SQL*Net connectivity:
#ADMIN_SQLNET_SERVICE=slob
#SQLNET_SERVICE_BASE=slob
#SQLNET_SERVICE_MAX=2
#SYSDBA_PASSWD=change_on_install
#########################
#### Advanced settings:
#
# The following are Hot Spot related parameters.
# By default Hot Spot functionality is disabled (DO_HOTSPOT=FALSE).
#
DO_HOTSPOT=FALSE
HOTSPOT_MB=8
HOTSPOT_OFFSET_MB=16
HOTSPOT_FREQUENCY=3
#
# The following controls operations on Hot Schema
# Default Value: 0. Default setting disables Hot Schema
#
HOT_SCHEMA_FREQUENCY=0
# The following parameters control think time between SLOB
# operations (SQL Executions).
# Setting the frequency to 0 disables think time.
#
THINK_TM_FREQUENCY=5
THINK_TM_MIN=.1
THINK_TM_MAX=.5
#########################

The following is the command we used to start SLOB workload with 48 users:

“/home/oracle/SLOB/runit.sh 48”

SLOB Heavy Workload

UPDATE_PCT=25
RUN_TIME=3600
WORK_LOOP=0
SCALE=210000
WORK_UNIT=64
REDO_STRESS=LITE
LOAD_PARALLEL_DEGREE=4
THREADS_PER_SCHEMA=1
# Settings for SQL*Net connectivity:
#ADMIN_SQLNET_SERVICE=slob
#SQLNET_SERVICE_BASE=slob
#SQLNET_SERVICE_MAX=2
#SYSDBA_PASSWD=change_on_install
#########################
#### Advanced settings:
#
# The following are Hot Spot related parameters.
# By default Hot Spot functionality is disabled (DO_HOTSPOT=FALSE).
#
DO_HOTSPOT=FALSE
HOTSPOT_MB=8
HOTSPOT_OFFSET_MB=16
HOTSPOT_FREQUENCY=3
#
# The following controls operations on Hot Schema
# Default Value: 0. Default setting disables Hot Schema
#
HOT_SCHEMA_FREQUENCY=0
# The following parameters control think time between SLOB
# operations (SQL Executions).
# Setting the frequency to 0 disables think time.
#
THINK_TM_FREQUENCY=0
THINK_TM_MIN=.1
THINK_TM_MAX=.5
#########################
export UPDATE_PCT RUN_TIME WORK_LOOP SCALE WORK_UNIT LOAD_PARALLEL_DEGREE REDO_STRESS
export DO_HOTSPOT HOTSPOT_MB HOTSPOT_OFFSET_MB HOTSPOT_FREQUENCY HOT_SCHEMA_FREQUENCY THINK_TM_FREQUENCY THINK_TM_MIN THINK_TM_MAX

The following is the command we used to start SLOB workload with 128 users:

 “/home/oracle/SLOB/runit.sh 128”

About the Author and Contributors

This section provides a brief background on the author and contributors of this document.

About the Author and Contributors

  • Palanivenkatesan Murugan, Solution Architect, works in the Product Enablement team of the Storage and Availability Business Unit. Palani specializes in solution design and implementation for business-critical applications on VMware vSAN. He has more than 11 years of experience in enterprise storage solution design and implementation for mission-critical workloads. Palani has worked with large system and storage product organizations where he has delivered Storage Availability and Performance Assessments, Complex Data Migrations across storage platforms, Proof of Concept, and Performance Benchmarking.
  • Sudhir Balasubramanian, Staff Solution Architect, works in the Global Field and Partner Readiness team. Sudhir specializes in the virtualization of Oracle business-critical applications. Sudhir has more than 20 years’ experience in IT infrastructure and database, working as the Principal Oracle DBA and Architect for large enterprises focusing on Oracle, EMC storage, and Unix/Linux technologies. Sudhir holds a Master Degree in Computer Science from San Diego State University. Sudhir is one of the authors of the “Virtualize Oracle Business Critical Databases” book, which is a comprehensive authority for Oracle DBAs on the subject of Oracle and Linux on vSphere. Sudhir is a VMware vExpert.
  • Catherine Xu, Technical Writer in the Product Enablement team, edited this paper to ensure that the contents conform to the VMware writing style.

 

Filter Tags

vSAN Reference Architecture