Running Microsoft SQL Server Big Data Clusters on VMware Tanzu Kubernetes Grid

Executive Summary

Note: This reference architecture provides general design and deployment guidelines of running Microsoft SQL Server Big Data Clusters on VMware Tanzu™ Kubernetes Grid™ on Dell EMC VxRail. The reference architecture also applies to any compatible hardware platforms running VMware Tanzu Kubernetes Grid on vSAN™.

Business Case

Today's distributed systems are constructed of multiple microservices, usually running a large number of Kubernetes Pods and VMs, and the traditional applications even stateful ones like databases are getting containerized on Kubernetes to embrace the modern architecture for fast deployment and simplified management. Microsoft SQL Server Big Data Clusters is the latest big data platform based on Kubernetes, which provides flexibility to query external data sources, store big data in Hadoop Distributed File System (HDFS) managed by SQL Server, and use data for Artificial Intelligence/Machine Learning through a unified administration portal with consistent experiences.

VMware Tanzu Kubernetes Grid is a multi-cloud Kubernetes footprint that is tested, signed, and supported by VMware. It includes the signed and supported versions of open-source applications to provide the registry, networking, monitoring, authentication, ingress control, and logging services that a production Kubernetes environment requires. Running SQL Server Big Data Clusters on Tanzu is a compelling new way to utilize SQL Server to bring high-value relational data and high-volume big data together on a unified, scalable data platform managed by VMware Tanzu portfolio.

Dell EMC VxRail™, powered by Dell EMC PowerEdge server platforms and VxRail HCI System Software, features next-generation technology to future proof your infrastructure and enables deep integration across the VMware ecosystem. Advanced VMware hybrid cloud integration and automation simplifies the deployment of a secure VxRail cloud infrastructure.

In this solution, we provide deployment procedures, design and sizing guidance, scalability performance validation, and best practices for enterprise infrastructure administrators and application owners to run SQL Server Big Data Clusters on Tanzu Kubernetes Grid powered by VxRail.

Business Values

Here are the top 5 benefits for deploying SQL Server Big Data Clusters on Tanzu Kubernetes Grid:

  • Simplified installation of Kubernetes: Tanzu Kubernetes Grid is engineered to include the tools and open source technologies needed to deploy and consistently operate a scalable Kubernetes environment wherever you need it to run—in your data center and VMware private cloud, in the public cloud, at the edge, or across multiple clouds.
  • Automated multi-cluster operations: With declarative, multi-cluster lifecycle management, a CLI tool, and streamlined upgrades and patching, Tanzu Kubernetes Grid helps you more easily manage large-scale, multi-cluster Kubernetes deployments and automate manual tasks to reduce business risk and focus on more strategic work.
  • Integrated platform services: Tanzu Kubernetes Grid streamlines the deployment of local and in-cluster services to simplify the configuration of container image registry policies, monitoring, logging, ingress, networking, and storage, and ready your Kubernetes environment for production workloads.
  • Open source alignment: Run your containerized applications on an upstream-aligned Kubernetes distribution and key open-source technologies like Cluster API, Fluentbit, and Contour, so that you can enable portability and benefit from the support and innovation of the global Kubernetes community.
  • Production Ready: Tanzu Kubernetes Grid is tuned for running production workloads. A developer can run SQL Server Big Data Clusters workloads without the need to perform any additional configuration.

Key Results

This reference architecture is a showcase of Tanzu Kubernetes Grid on VxRail for operating and managing SQL Server Big Data Clusters in a fully integrated SDDC environment. Key results can be summarized as following:

  • Tanzu Kubernetes Grid on VxRail simplifies and accelerates the necessary Kubernetes infrastructure deployment desired for SQL Server Big Data Clusters and provides streamlined development, agile operations, and self-service lifecycle management.
  • Tanzu Kubernetes Grid is upstream Kubernetes compatible for SQL Server Big Data Cluster deployment with both flexibility and resiliency.
  • With the 4-node VxRail cluster of Intel® Optane™ NVMe as cache-tier, a SQL Server Big Data Cluster deployed on Tanzu Kubernetes Grid is capable of servicing complex TPC-DS-like decision support system workloads with the predictable query completion time and compelling throughput results as well. The validation result demonstrated a linear scalable performance in terms of Spark SQL TPC-DS-like benchmark for different scale factors of dataset size.

Note: The performance results in this solution are validated on the HCI platform of Tanzu Kubernetes Grid on VxRail, which is also applied to general VMware vSAN with similar configurations.

Audience

This solution is intended for IT administrators, DevOps engineers, SQL Server DBAs, and storage experts involved in the early phases of planning, design, and deployment of SQL Server Big Data Clusters on Tanzu Kubernetes Grid on VxRail. It is assumed that the reader is familiar with the concepts and operations of SQL Server Big Data Cluster, Tanzu Kubernetes Grid related components, and VxRail.

Technology Overview

Solution technology components are listed below:

  • VMware Tanzu Kubernetes Grid
  • Dell EMC VxRail
    • VxRail HCI System Software
  • Microsoft SQL Server Big Data Clusters

VMware Tanzu Kubernetes Grid

VMware Tanzu Kubernetes Grid is a multi-cloud Kubernetes footprint that you can run both on-premises in VMware vSphere® and in the public cloud on Amazon EC2 and Microsoft Azure. In addition to Kubernetes binaries that are tested, signed, and supported by VMware, Tanzu Kubernetes Grid includes signed and supported versions of open source applications to provide the registry, networking, monitoring, authentication, ingress control, and logging services that a production Kubernetes environment requires. If you are using Tanzu Kubernetes Grid, see the VMware Tanzu Kubernetes Grid Documentation.

Dell EMC VxRail

The only fully integrated, pre-configured, and pre-tested VMware hyperconverged integrated system optimized for VMware vSAN and VMware Cloud Foundation™, VxRail transforms HCI and simplifies VMware cloud adoption while meeting any HCI use case—including support for many of the most demanding workloads and applications. Powered by Dell EMC PowerEdge server platforms and VxRail HCI System Software, VxRail features next-generation technology to future proof your infrastructure and enables deep integration across the VMware ecosystem. The advanced VMware hybrid cloud integration and automation simplifies the deployment of a secure VxRail cloud infrastructure.

VxRail HCI System Software

VxRail HCI system software is integrated software that delivers a seamless and automated operational experience, offering 100% native integration between VxRail Manager and vCenter®. Intelligent lifecycle management automates non-disruptive upgrades, patching, and node addition or retirement while keeping VxRail infrastructure in a continuously validated state to ensure that workloads are always available. The HCI System Software includes SaaS multi-cluster management and orchestration for centralized data collection and analytics that uses machine learning and AI to help customers keep their HCI stack operating at peak performance and ready for future workloads. IT teams can benefit from the actionable insights to optimize infrastructure performance, improve serviceability, and foster operational freedom.

Microsoft SQL Server Big Data Clusters

SQL Server Big Data Clusters is the data platform that helps deploy scalable clusters including SQL Server, Apache Spark, and Hadoop HDFS containers running on Kubernetes. It allows to read, write, and process big data from Transact-SQL or Spark, and provides flexibility to query data from external SQL Server, Oracle, Teradata, MongoDB, and ODBC data sources with the external tables. It also simplifies the process to combine and analyze high-value relational data with high-volume big data.

SQL Server Big Data Clusters contains the following key components as described in Figure 1.

image-20210810170526-1

Figure 1. SQL Server Big Data Clusters Components

  • Controller: Provides management and security for the cluster. It contains the control service, the configuration store, and other cluster-level services such as Kibana, Grafana, and Elastic Search.
  • Compute pool: Provides computational resources to the cluster. It contains nodes running SQL Server on Linux Pods. The pods in the compute pool are divided into SQL Compute instances for specific processing tasks.
  • Data pool: Used for data persistence and caching. The data pool consists of one or more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark jobs. SQL Server Big Data Clusters data marts are persisted in the data pool.
  • Storage pool: Consists of storage pool pods, including SQL Server on Linux, Spark, and HDFS. All the storage nodes in a SQL Server Big Data Cluster are members of an HDFS cluster.

Solution Configuration

This section introduces the resources and configurations:

  • Architecture diagram
  • Hardware resources
  • Software resources
  • Virtual machine configuration
  • Deploying Tanzu Kubernetes Grid Management Cluster
  • Deploying Tanzu Kubernetes Cluster
  • Deploying SQL Server Big Data Cluster

Architecture Diagram

In this solution, we deployed SQL Server Big Data Clusters test environment using Tanzu Kubernetes Grid on a 4-node VxRail P570F cluster. There are two supported options to deploy a Tanzu Kubernetes Cluster for Big Data Cluster workloads, and Ubuntu 20.04 is required as the Tanzu Kubernetes Grid OVA.

  • Tanzu Kubernetes Grid (TKGm) on vSphere: This option creates the Tanzu management cluster with Tanzu CLI if vSphere with Tanzu is not enabled.
  • Tanzu Kubernetes Grid Service (TKGS) for vSphere with Tanzu: This option uses the Supervisor Cluster as the management cluster for Tanzu Kubernetes Grid if vSphere with Tanzu is enabled.

Figure 2 shows the solution architecture designed in this solution to run a SQL Server Big Data Cluster on Tanzu Kubernetes Grid.

A screenshot of a computer</p>
<p>Description automatically generated with medium confidence

 Figure 2. Architectural Diagram

Hardware Resources

Table 1. Hardware Configuration

PROPERTY

SPECIFICATION

 

Server model name

 

4 x VxRail P570F

CPU

2 x Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz, 28 cores each

RAM

512GB

Network adapter

2 x Broadcom BCM57414 NetXtreme-E 25Gb RDMA Ethernet Controller

Storage adapter

1 x Dell HBA330 Adapter

2 x Dell Express Flash NVMe ColdStream P4800x 375GB PCIe U.2 SSD Controller

Disks

Cache - 2 x 375GB Intel Optane P4800X Series NVMe Disks

Capacity - 8 x 3.84TB Read Intensive SAS SSDs

Software Resources

Table 2 shows the software resources used in this solution.

Table 2. Software Resources

Software

Version

Purpose

VMware vSphere

7.0 Update 2

Virtualization platform that provides the capability to run Kubernetes workloads directly on ESXi hosts and create upstream Kubernetes clusters within the dedicated resource pools.

Dell EMC VxRail

7.0.200

Turnkey Hyperconverged Infrastructure for hybrid cloud.

VMware Tanzu CLI

1.3.1

Command line interface that allows deploying CNCF conformant Kubernetes clusters to vSphere and other cloud infrastructure.

Kubectl cluster CLI

v1.20.5+vmware.1

Kubernetes command line interface that is compatible with Tanzu CLI.

Kubernetes OVA for Tanzu Kubernetes Grid

Ubuntu 20.04 Kubernetes v1.20.5 OVA

A base image template for the Kubernetes Operating System of Tanzu Kubernetes Grid management and workload clusters.

SQL Server Big Data Clusters

2019-CU12-ubuntu-20.04

A SQL Server cluster of Linux containers orchestrated by Kubernetes.

Azure Data Studio 1.31.1 Cross-platform graphical tool for querying SQL Server.
Azure Data CLI 20.3.7 Command-line tool for installing and managing a big data cluster.

Databricks Spark SQL Perf benchmark

N/A

Performance testing framework of TPC-DS Benchmark for Spark SQL.

Virtual Machine Configuration

The virtual machine configuration in this solution is described in table 3.

Table 3. Management Domain Virtual Machine Configuration

VM Role

vCPU

Memory (GB)

VM Count

VxRail Manager

4

16

1

vCenter Server

2

12

1

Tanzu Kubernetes Grid management cluster - control plane VM

2

8

1

Tanzu Kubernetes Grid management cluster- worker node VM

2

8

1

Tanzu Kubernetes Cluster (workload cluster) – control plane VM

4

16

1

Tanzu Kubernetes Cluster (workload cluster) – worker node VM

56

384

4

Deploying Tanzu Kubernetes Grid Management Cluster

Tanzu Kubernetes Grid management cluster for vSphere 7 supports the following two options for deployment:

In this solution, we adopted the first option to deploy the management cluster with Tanzu CLI by either an installer UI or a configuration file. For the first deployment, it is highly recommended to deploy through the installer interface. Refer to Deploy Management Clusters with the Installer Interface for more details. Figure 3 shows the installer UI that has successfully deployed a management cluster.

Graphical user interface, text, application</p>
<p>Description automatically generated

Figure 3. Deploy Tanzu Kubernetes Grid Management Cluster with installer UI

Deploying Tanzu Kubernetes Cluster

After you deploy a management cluster to vSphere or you connect the Tanzu CLI to a vSphere with Tanzu Supervisor Cluster, use the Tanzu CLI to deploy Tanzu Kubernetes clusters. The Tanzu Kubernetes Cluster is deployed through a configuration file with customized variables for cluster settings and scalability. Refer to Deploy Tanzu Kubernetes Clusters to vSphere for more details.

To support SQL Server Big Data Cluster workloads, it is important to configure the Tanzu Kubernetes Cluster worker node variables in the configuration file:

  • VSPHERE_WORKER_NUM_CPUS/ VSPHERE_WORKER_MEM_MIB: A minimum of 8 vCPU and 64GB memory is required.
  • VSPHERE_WORKER_DISK_GIB: A minimum value of 100GB is required for pulling all big data cluster Docker images.

We deployed four Tanzu Kubernetes Cluster worker nodes with the 56 vCPU and 384GB memory each to support TPC-DS-like scalability performance validation. Refer to Tanzu CLI Configuration File Variable Reference for more details about configuration file variables.

Once the deployment is successful, you can connect to the Tanzu Kubernetes Cluster by using the kubectl command and start deploying the workloads. We created a sample storage class for SQL Server Big Data Clusters with the default vSAN storage policy as shown below:

kind: StorageClass

apiVersion: storage.k8s.io/v1

metadata:

  name: sql-bdc-storage-class

  annotations:

    storageclass.kubernetes.io/is-default-class: "true"

provisioner: csi.vsphere.vmware.com

parameters:

  storagepolicyname: "vSAN Default Storage Policy"

Deploying SQL Server Big Data Cluster

SQL Server Big Data Clusters can be deployed through Azure Data CLI with the configuration script. It is recommended to generate the configuration script with Azure Data Studio. In this solution, we used the “kubeadm-dev-test” profile with the customized values to deploy a SQL Server Big Data Cluster. The sample deployment profile used for the scalability performance validation in this solution is as shown in Table 4. We separated Spark pool from the storage pool and used the sql-bdc-storage class, which represents vSAN default storage policy with FTT=1 – RAID 1 Mirroring.

Table 4. Sample SQL Server Big Data Clusters Deployment Profile

SQL Server Big Data Cluster Role

Count

Storage Class

Claim size for data (GB)

Claim size for log (GB)

SQL Server master instances

1

sql-bdc-storage-class

2000

200

Compute pool instances

1

sql-bdc-storage-class

2000

200

Data pool instances

2

sql-bdc-storage-class

2000

200

Spark pool instances

4

sql-bdc-storage-class

5000

200

Storage pool (HDFS) instances

4

sql-bdc-storage-class

10000

200

We created a SQL Server Big Data Cluster with the Azure Data CLI command shown below:

# azdata bdc create --accept-eula yes --config-profile path-to-script

It might take some time to deploy the SQL Server Big Data Cluster depending on the network bandwidth to fetch the image from the Microsoft Registry and the deployment profile. After the deployment is successful, you can obtain the service endpoint addresses of the SQL Server Big Data Cluster as shown in Figure 4.

Table</p>
<p>Description automatically generated

Figure 4. SQL Server Big Data Cluster Service Endpoints

Solution Validation

Test Overview

We created the SQL Server Big Data Cluster using the deployment profile as described in Appendix A. In the scalability performance test, we simulated a Spark SQL TPC-DS-like benchmark workload to demonstrate the performance scalability of the SQL Server Big Data Cluster deployed on the Tanzu Kubernetes Grid.

Workload Generation and Benchmark Tools

Spark SQL TPC-DS-like benchmark

In this solution, we adopted a Spark SQL TPC-DS-like benchmark to simulate a database system that models several generally applicable aspects of a decision support system, including queries and data maintenance. The benchmark can illustrate the decision support systems that run on big data solutions such as RDBMS as well as Hadoop/Spark based systems, execute queries of various operational requirements and complexities characterizing high CPU and IO load. The benchmark result is typically measured by the query response time and the query throughput.

Spark SQL performance test tool

The Spark SQL performance tool is a testing framework for Spark SQL and Apache Spark. We used this tool to generate datasets with various scale factors to validate the scalability of SQL Server Big Data Clusters on Tanzu Kubernetes Grid. The tool is customized to generate TPC-DS-like dataset to the HDFS in the storage pool and execute the TPC-DS-like queries within the Spark pool of SQL Server Big Data Clusters.

Monitoring tool - vSAN Performance Service

vSAN Performance Service is used to monitor the performance of the vSAN environment using the vSphere web client. The performance service collects and analyzes performance statistics and displays the data in a graphical format. You can use the performance charts to manage your workload and determine the root cause of problems.

vSAN Trim/Unmap

We enabled vSAN Trim/Unmap to help allow space reclamation for the previously allocated storage as free space. During both data generation and TPC-DS-like workload testing, there will be temporary data generated on the Spark nodes. Those temporary data will be only deleted automatically at the operating system level after the query execution is completed. By enabling vSAN Trim/Unmap, we issued the fstrim command from Ubuntu worker nodes to reclaim the vSAN storage space during the 30TB scale factor validation. For more details about the vSAN Trim/Unmap feature, refer to UNMAP/TRIM Space Reclamation on vSAN.

Spark Parameter Settings

The Spark parameter settings are the key performance factor for running Spark SQL TPC-DS-like benchmark on SQL Server Big Data Clusters. To view a complete list of supported and unsupported parameters, see the Apache Spark & Apache Hadoop (HDFS) configuration properties.

We optimized the Spark parameters listed in table 5 to improve the overall query performance. We tried different sets of configurations of Spark executor resource allocation and found that 8 CPU cores per Spark executor instance had the best performance. We also enabled cost-based optimizations (CBO) with joinReorder, which had up to 60x improvement for certain queries like Query 72. Note the Spark parameters tuning recommendation is subject to the Spark SQL TPC-DS-like benchmark tool validation only as described in this solution.

Table 5. Spark Parameter Settings

Parameters and optimized value

Description

spark-defaults-conf.spark.driver.cores

Number of cores to use for the driver process.

spark-defaults-conf.spark.driver.memory

Amount of memory to use for the driver process.

spark-defaults-conf.spark.driver.memoryOverhead

Amount of non-heap memory to be allocated per driver process

spark-defaults-conf.spark.driver.maxResultSize

Limit of the total size of serialized results of all partitions for each Spark action (such as collect) in bytes.

spark-defaults-conf.spark.executor.instances

Number of cores to use for the driver process, only in cluster mode.

spark-defaults-conf.spark.executor.cores

The number of cores to use on each executor.

spark-defaults-conf.spark.executor.memory

Amount of memory to use per the executor process.

spark-defaults-conf.spark.executor.memoryOverhead

The amount of off-heap memory to be allocated per executor.

spark-defaults-conf.spark.sql.cbo.enabled

Enables the cost-based optimizations (CBO) for estimation of plan statistics when set true.

spark-defaults-conf.spark.sql.cbo.joinReorder.enabled

Enables join reorder in CBO.

spark-defaults-conf.spark.scheduler.listenerbus.eventqueue.capacity

The default capacity for event queues.

yarn-site.yarn.nodemanager.resource.memory-mb

Amount of physical memory, in MB, that can be allocated for containers.

yarn-site.yarn.nodemanager.resource.cpu-vcores

Number of CPU cores that can be allocated for containers.

yarn-site.yarn.scheduler.maximum-allocation-mb

The maximum allocation for every container request at the resource manager.

yarn-site.yarn.scheduler.maximum-allocation-vcores

The maximum allocation for every container request at the resource manager, in terms of the virtual CPU cores.

Scalability Test Result

We set the TPC-DS-like dataset with the different scale factors and populated it directly onto the HDFS storage pool of the SQL Server Big Data Cluster. Table 6 shows the time consumed for data generation of different scale factor settings. The data generation time also includes the post data analysis process that calculates the table statistics.

Table 6. Scalability Test Results

Scale Factor

Dataset size

Data Generation time

1000

1TB

45 minutes

3,000

3TB

2 hours and 15 minutes

10,000

10TB

8 hours and 10 minutes

30,000

30TB

18 hours and 30 minutes

Then we ran the TPC-DS-like benchmark testing to validate the Spark SQL performance scalability with 101 pre-defined user queries in total that are characterized with different user patterns. The query sets can be found here. We excluded 3 queries: q14a, q14b, and q64 are not included in the result due to memory limitations for large scale factor (30TB). Figure 5 shows the performance scalability test result. The result demonstrated that running SQL Server Big Data Cluster on Tanzu Kubernetes Grid has linear scalability for different datasets, which provides a consistent and predictable performance for all kinds of Spark SQL TPC-DS-like workloads. The detailed test result for each query is listed in Appendix B.

Chart, line chart</p>
<p>Description automatically generated

Figure 5. 1TB/3TB/10TB/30TB Spark SQL TPC-DS-like Benchmark—SQL Server Big Data Cluster on Tanzu Kubernetes Grid (4 worker nodes)

A picture containing text, antenna</p>
<p>Description automatically generated

Figure 6. VxRail Cluster CPU Utilization (30TB scale factor)

Figure 6 shows the VxRail cluster CPU utilization during the 30TB scalability test. The average cluster CPU utilization was around 40 percent while the peak CPU utilization was up to 90 percent.

Figure 7 and Figure 8 show the throughput results within vSAN performance service at the virtual machine level and the vSAN backend level, respectively. Depending on the query type, the throughput result may differ during the test execution. The peak read throughput reached 4.56 GB/s from the virtual machine level, and the peak write throughput was up to 5.26 GB/s at the vSAN backend level.

Graphical user interface, chart, histogram</p>
<p>Description automatically generated

Figure 7. Throughput Results at the Virtual Machine Level

Graphical user interface, chart</p>
<p>Description automatically generated

Figure 8. Throughput Result at the vSAN Backend Level

Best Practices

In this solution, we demonstrated the deployment procedures of running SQL Server Big Data Clusters on Tanzu Kubernetes Clusters managed by vSphere with Tanzu on VxRail and validated the Spark SQL performance with TPC-DS-like benchmark.

The following recommendations provide the best practices and sizing guidance to run SQL Server Big Data Clusters on vSphere with Tanzu on VxRail.

  • Tanzu Kubernetes Grid:
    • Use Ubuntu 20.04 image to create Tanzu Kubernetes Grid management and workload cluster that runs SQL Server Big Data Cluster.
    • Create the Tanzu Kubernetes Cluster with a minimum of 100GB disk size for each worker nodes to allow successful pulling of Big Data Cluster images.
    • Customize and pre-allocate enough CPU and memory resources for the Tanzu Kubernetes Cluster. Refer to Performance Best Practices for Kubernetes with VMware Tanzu for sizing guidance for Tanzu Kubernetes Grid.
  • vSAN Storage:
    • vSAN supports the dynamic persistent volume provisioning with Storage Policy Based Management (SPBM). Create a dedicated storage policy for persistent volumes desired for SQL Server Big Data Clusters.
    • Failure to Tolerate (FTT) is recommended to set to 1 failure - RAID-1 (Mirroring).
    • Enable vSAN Trim/Unmap to allow space reclamation for persistent volumes associated with Big Data Cluster Spark and Storage pool.
  • SQL Server Big Data Clusters:
    • Choose the appropriate deployment profile:
      • kubeadm-dev-test: Use this profile for a testing environment starting with the minimum requirements.
      • kubeadm-prod: Use this profile for SQL Server Big Data Clusters production environment with a high availability configuration.
    • Recommend generating the customized deployment scripts within Azure Data Studio. You can also connect to SQL Server Big Data Clusters, manage and submit Spark jobs, and monitor running tasks in this GUI portal.
    • Pre-allocate enough disk size of persistent volumes for each Big Data Cluster component since it does not support post-deployment resizing.
    • Optimize the relevant configuration settings for Big Data Clusters to meet your Spark workload requirements. Refer to the Spark parameter settings section and the Spark Configuration page for more details. SQL Server Big Data Clusters support customizing the pre-deployment scripts or the post-deployment settings, which starts with CU9.

Conclusion

Running SQL Server Big Data Clusters on Tanzu Kubernetes Grid on VxRail is a simplified and fast way to get started with the modernized big data workloads running on Kubernetes. It allows running modern containerized workloads using the existing IT infrastructure and processes, whereas big data scientists innovate and build with the agility of Kubernetes and IT administrators manage the secure workloads in their familiar vSphere environment.

In this solution, we deployed SQL Server Big Data Clusters on Tanzu Kubernetes Grid that provides the simplified operation of servicing cloud native workloads and can scale without compromise. IT administrators can implement the policy for namespaces and manage access and quota allocation for application-focused management. This helps build a developer-ready infrastructure with enterprise-grade Kubernetes with advanced governance, reliability, and security. We also validated SQL Server Big Data Clusters with Spark SQL TPC-DS-like benchmark with the optimized parameters. The test results showcased Tanzu Kubernetes Grid on VxRail provides linear scalability for complex TPC-DS-like decision support workloads consisting of different query types with predictable query response time and high throughputs.

About the Author

Mark Xu, Solutions Architect in the Solutions Architecture team of the Cloud Platform Business Unit (CPBU), wrote the original version of this paper.

The following reviewers also contributed to the paper contents:

 

Appendix A: Sample Deployment Profile for SQL Server Big Data Clusters on Tanzu Kubernetes Grid

 

bdc.json

control.json

 

Appendix B: Spark SQL TPC-DS like Benchmark Results

Figure 9. 1TB/3TB/10TB/30TB Spark SQL TPC-DS-like Benchmark Results (101 queries) for SQL Server Big Data Cluster on Tanzu Kubernetes Grid (4 worker nodes) on VxRail

result

 

 

Filter Tags

Modern Applications Storage vSphere 7 Document Reference Architecture