Splunk SmartStore on Cloudian and VMware

Business Case

Splunk software helps to create hidden value from ever-growing machine data. Splunk has emerged as one of the top solutions for assisting organizations to index and utilize machine data. Splunk is listed as a leader in the Garner Magic Quadrant for Security Information and Event Management (SIEM) for the eighth consecutive time. As data volume increases and decisions are being made based on this data, retention periods are getting longer. As a result, organizations are looking for more cost-effective storage solutions to manage their growing Splunk data stores. To address this, Splunk introduced a feature called SmartStore, which offers enhanced storage management functionality. SmartStore allows moving warm data buckets to S3-compatible object stores. Moving the data from expensive indexer storage achieves the following benefits:

  • Decouple storage and compute layers
  • Elastically scale compute on-demand for search and indexing workloads
  • Grow storage independently to accommodate retention requirements
  • Cost savings with more flexible storage options

Cloudian’s enterprise-grade, fully native, S3-compatible object storage software integrated with the VMware vSAN™ Data Persistence platform (DPp) enables new efficiencies and savings by allowing enterprises to run both modern, cloud-native, and traditional applications on a single, shared storage environment at any scale, on-premises, and in the private cloud.

The solution extends Cloudian HyperStore’s simple-to-deploy and easy-to-manage, exabyte scalable, highly secure, multi-tenant storage to any application while reducing the total ownership cost (TCO). The integration supports Cloudian HyperStore with VMware Cloud Foundation™ with VMware Tanzu™. It combines the industry-leading VMware hyperconverged infrastructure (HCI) and Cloudian HyperStore into a single and shared-nothing data platform.

In this solution, Splunk SmartStore and Cloudian HyperStore with VMware Cloud Foundation with Tanzu let you decouple the compute and storage layers so you can independently scale those resources to serve workload demands best.

Audience

This solution is intended for IT administrators, Splunk architects, virtualization, and storage architects involved in planning, architecting, and administering a virtualized Splunk workload on VMware.

Technology Overview

Solution technology components are listed below:

  • VMware Cloud Foundation with Tanzu:
    • VMware vSphere®
    • VMware vSAN
    • VMware vSAN Data Persistence platform
  • Cloudian HyperStore Object Storage
  • Splunk Enterprise

VMware Cloud Foundation with Tanzu

VMware Cloud Foundation with Tanzu is the best way to deploy Kubernetes at scale. VMware Cloud Foundation is a ubiquitous hybrid cloud platform for traditional enterprise and modern apps, providing a complete set of secure software-defined services for compute, storage, network security, Kubernetes management, and cloud management. VMware Cloud Foundation with Tanzu automates full-stack deployment and operation of Kubernetes clusters through integration with VMware Tanzu Kubernetes Grid. This helps eliminate manual steps for configuring hosts, creating logical relationships, and managing hypervisors for faster deployment of applications at scale. The most exciting feature added to the VMware Cloud Foundation architecture is the integration of Kubernetes directly into the vSphere hypervisor, which delivers an entirely new set of VMware Cloud Foundation Services, a new Kubernetes and RESTful API surface that empowers developers to have self-service access to Kubernetes clusters, vSphere Pods, virtual machines, persistent volumes, stateful services, and networking resources. The result is an agile, reliable, and efficient hybrid cloud platform that bridges the gap between app developers and IT administrators.

VMware vSphere

VMware vSphere is the next-generation infrastructure for next-generation applications, which provides a powerful, flexible, and secure foundation for business agility that accelerates the digital transformation to cloud computing and promotes success in the digital economy. VMware vSphere embeds containers and Kubernetes into vSphere, unifying them with virtual machines as first-class citizens. This enables all VI admins to become Kubernetes admins and easily deliver new services to their developers. VMware vSphere addresses key challenges faced by the IT admins in areas of lifecycle management, security, and performance and resiliency needed by business-critical applications, AI/ML applications, and latency-sensitive applications. With VMware vSphere, customers can run, manage, connect, and secure both traditional and cloud-native applications in a common operating environment, across clouds and devices.

VMware vSAN

VMware vSAN is the industry-leading software powering VMware’s software-defined storage and HCI solution. vSAN helps customers evolve their data center with reduced risk, control IT costs, and scale to tomorrow’s business needs. vSAN, native to the market-leading hypervisor, delivers flash-optimized, secure storage for all of your critical vSphere workloads, and is built on industry-standard x86 servers and components that help lower TCO in comparison to traditional storage.

vSAN simplifies Day 1 and Day 2 operations, and customers can quickly deploy and extend cloud infrastructure and minimize maintenance disruptions. Together with Cloudian HyperStore, vSAN modernizes HCI by providing admins with a unified storage control plane for all block, file and object protocols, and provides significant enhancements that make it a great solution for VMs as well cloud native applications. vSAN helps reduce the complexity of monitoring and maintaining infrastructure and enables admins to rapidly provision storage for Kubernetes-orchestrated cloud native applications.

VMware vSAN Data Persistence platform

The vSAN Data Persistence platform (DPp) provides an as-a-service framework for VMware partners that offer modern stateful services to integrate with the underlying virtual infrastructure, allowing you to run stateful services with high velocity scaling, simplified IT operations, and optimized TCO. You can deploy a stateful service alongside traditional applications on a regular vSAN cluster with vSAN-SNA (vSAN support for Shared Nothing Architecture) policy, or deploy it on a dedicated vSAN cluster with VMware vSAN Direct Configuration, a technology enabling direct access to the underlying direct-attached hardware which can be optimized for the application needs. Both options benefit from optimal storage efficiency for stateful services by leveraging service-level replication, as well as unified management of services in VMware vCenter®.

The platform offers a way for the IT admin to enable, manage, and monitor all aspects of the stateful service from vSphere Interfaces (API/UI) while the developers get public cloud-like simple self-service consumption experience.

The regular vSAN cluster with vSAN-SNA policy or a dedicated vSAN cluster with vSAN Direct Configuration makes it easy for cloud-native services such as Cloudian integrated into the vSAN Data Persistence platform to co-locate its compute and storage on the same physical ESXi host. This host-local placement then allows us to do replication only at the service layer and not at the storage layer.

Graphical user interface, timeline</p>
<p>Description automatically generated

Figure 1. Cloudian Deployment on vSAN DPp

  • vSAN with SNA Storage Policy

With this technology, you can use a distributed replicated vSAN datastore with the vSAN host-local SNA policy. The technology makes it easy for the stateful service to co-locate its compute instance and a storage object on the same physical ESXi host. The compute instance such as a pod has to come up first on one of the nodes in the vSAN cluster and then the vSAN object created with vSAN-SNA policy (vsan-sna storage class) will automatically have all of its data placed on the same node where the pod is running.

Although vSAN with SNA storage policy can place local data to the compute, we still leverage a distributed vSAN data path between the application and the raw physical disk. Also, the application can only specify affinity at the node granularity and cannot use different disks attached to the same node as independent fault domains.

  • vSAN Direct Configuration

vSAN Direct Configuration provides optimal host local storage to shared nothing cloud native services, it creates an independent datastore on every disk attached to a physical host and makes it available as a placement choice to modern applications. vSAN Direct Configuration extends the simplicity of HCI management to host local VMFS disks. It manages and monitors VMFS-L disks and provides insights into health, performance, and capacity of these disks. The VMFS datastores that vSAN Direct Configuration manages are exposed as storage pools in Kubernetes. vSAN Direct Configuration allows modern applications direct access to the disks while giving the ease of management to the VI admin, these disks are consumed just as vSAN with minimal overhead.

Graphical user interface, application</p>
<p>Description automatically generated

Figure 2. vSAN Direct

Cloudian HyperStore

Cloudian HyperStore is highly secure, enterprise-grade, fully native, S3-compatible object storage software. HyperStore for containers is a containerized version of HyperStore, designed to run on the VMware vSAN Data Persistence Platform. It enables new efficiencies and savings by allowing enterprises to run both modern, cloud-native applications and traditional applications on a single, shared storage environment, at any scale, on-prem and in both private and public clouds. The solution extends HyperStore’s simple-to-deploy and easy-to-manage, exabyte scalable, highly secure, multi-tenant storage to any application, while also reducing costs by 60% and more. The integration supports Cloudian HyperStore with VMware Cloud Foundation with Tanzu, and combines industry-leading VMware HCI platform and Cloudian HyperStore into a single, shared-nothing data platform.

HyperStore supports modular growth with flexible deployment options and scales non-disruptively across multiple locations to exabytes. HyperStore stays simple with a single storage fabric and a single, global namespace enabling a unified view and control of all data across locations. HyperStore supports this geo-distribution of data with flexible storage-level data protection using erasure coding and replication to ensure the desired level of data durability. With the industry’s highest S3 API compliance, interoperability with S3-compatible applications is assured.  For hybrid and multi-cloud needs, data tiering and replication to remote locations including the public cloud are feature options. Secure multi-tenancy features are native to HyperStore and include QoS and billing. HyperStore data security certifications include Common Criteria, FIPS 140-2, SEC 17a-4(f), among others, providing data immutability, encryption for data-at-rest and data-in-flight, among other capabilities. HyperStore is also hardened storage with access controls, secure shell, and integrated firewall.

Splunk Enterprise

The Splunk platform uses machine data—the digital exhaust created by the systems, technologies, and infrastructure powering modern businesses—to address big data, IT operations, security, and analytics use cases. The insights gained from machine data can support any number of use cases across an organization and can also be enriched with data from other sources.

Splunk SmartStoreSplunk SmartStore is the latest evolution of the distributed scale-out model that provides a data management model that brings data closer to the compute on-demand. It provides a high degree of compute/storage elasticity and makes it incredibly cost-efficient to achieve longer data retention at scale. SmartStore dynamically places data either in local storage, remote storage, or both, depending on access patterns, data age, and data/index priority. SmartStore uses AWS S3 API to plug into the remote storage tier.

Furthermore, SmartStore decouples storage from the indexers so that each of them can scale storage independently of the indexers. SmartStore stores indexes and raw data externally on an S3-compatible object store while repurposing local caching storage to accelerate searches.

Validation Strategy

We validate that VMware Cloud Foundation with Tanzu can support Splunk SmartStore Solution by deploying a Splunk distributed deployment in a VMware Cloud Foundation workload domain and Cloudian HyperStore on containers (an S3-compatible object storage system). We ran representative workloads such as data ingestion, SmartStore migration, and searches against the Splunk SmartStore enabled environment. This solution validation uses Dell vSAN ReadyNode; however, this applies to other vSAN ReadyNode partners and Dell VxRail. The test ensures that VMware Cloud Foundation with Tanzu and Cloudian can meet Splunk infrastructure requirements and validate design assumptions about the infrastructure.

Validation Environment Configuration

This section introduces the solution, resources, and configurations:

  • Solution Architecture
  • Hardware resources
  • Software resources
  • vSAN configuration
  • Cloudian HyperStore configuration
  • Splunk VM configuration
  • Splunk roles and Application configuration
  • Monitoring tools

Solution Architecture

Splunk’s deployment model differs based on the size of the deployment. Some of the standard deployment models as per Splunk are:

  • Departmental: A single instance that combines indexing and search management functions.
  • Small enterprise: One search head with two or three indexers.
  • Medium enterprise: A small search head cluster, with several indexers.
  • Large enterprise: A large search head cluster, with large numbers of indexers.

Splunk Traditional Architecture

In traditional Splunk architecture, depending on the data ingestion rate and retention period, the storage provisioned to the Splunk instance can be one of the following two models:

  • vSAN storage ( hot, warmpath, and coldpath  )
  • vSAN storage (hot and warmpath, high performance) + NFS storage (coldpath, large capacity)

For departmental, small enterprise and some of the medium enterprise deployments, use vSAN storage without additional NFS storage, which provides good TCO and ease of management.  In this deployment, the retention period is typically less so less cold storage is consumed.

For large enterprise deployments and those require multiple years of data retention, use the second option of vSAN storage for hot and warm buckets and external NFS storage for cold buckets. vSphere VM is flexible to consume vSAN storage and external NFS storage. This choice of options helps to achieve a balance of TCO and ease of management.

Figure 3 shows the traditional Splunk Enterprise on VMware vSphere and vSAN.

See Reference architecture Splunk on VMware vSAN for more details.

Timeline</p>
<p>Description automatically generated with medium confidence

Figure 3. Traditional Splunk Enterprise Environment

Splunk SmartStore Architecture

Figure 4 shows the Splunk SmartStore on Cloudian and VMware stack. In this solution, high-performance vSAN storage is used to deploy hot data buckets, and the warm buckets are moved to an S3-Compatible object store. This solution helps by moving the data from expensive indexer storage to a less expensive S3 object store.

Diagram, timeline</p>
<p>Description automatically generated

Figure 4. Splunk SmartStore on Cloudian and VMware

Solution Test Environment

In this solution, Splunk SmartStore architecture is deployed on VMware Cloud Foundation. VMware Cloud Foundation test environment is composed of a management domain and a workload domain, as shown in Figure 5. The infrastructure VMs are deployed in the management domain. In the workload domain, there are two vSphere Clusters. vSphere Cluster A hosts the Splunk VMs, and vSphere Cluster B hosts the Cloudian HyperStore.

Graphical user interface, diagram, Teams</p>
<p>Description automatically generated

Figure 5. Splunk on VMware Cloud Foundation and Cloudian HyperStore

Hardware Resources

We used eight PowerEdge R640 servers for the workload domain in this solution. Two vSphere Clusters were created with 4 ESXi hosts each. Each node in the workload domain node had the below configuration as shown in Table 1.

Table 1. Hardware Configuration

Number of Servers

Server

PowerEdge R640

CPU

2 x Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz, 14 core each

Logical Processor (including hyperthreads)

56

Memory

512 GB

Cache device

2 x NVMe PM1725a 1.6TB SFF

Capacity device

8 x 1.75TB Samsung SSDs

Network

2 x 10GBps Intel(R) Ethernet Controller 10G X550

Note: Table 1 does not include the hardware details of the VMware Cloud Foundation management domain since it hosts only the management components such as the vCenter Server Appliance and it is not directly related to Splunk workload.

Software Resources

The software resources used in this solution are shown in Table 2.

Table 2. Software Resources

Software

Version

Purpose

VMware Cloud Foundation

4.3

A unified SDDC platform that brings together VMware ESXi, vSAN, NSX

and optionally, vRealize Suite components into a natively integrated

stack to deliver enterprise-ready cloud infrastructure for the private and

public cloud.

See BOM of VMware Cloud Foundation 4.3 for details.

Guest Operating System

Ubuntu

Operating system

Cloudian HyperStore for containers

1.3.0

Cloudian HyperStore is an S3-compatible object storage system. HyperStore for containers is a containerized deployment of HyperStore, designed to run on the VMware vSAN Data Persistence platform.

Splunk Enterprise

8.1.2

Splunk Enterprise is the fastest way to aggregate, analyze, and get answers from your data with the help of machine learning and real-time visibility.

 

vSAN Configuration

We configured vSAN on both vSphere clusters per the following configurations.

vSphere Cluster A: In this cluster, Splunk VMs (Indexers and search heads) and the respective storage for Splunk hot buckets were provisioned:

          vSAN datastore uses a mix of NVMe (vSAN Cache Device) and Samsung SSDs (vSAN capacity devices).

          Two vSAN disk groups were configured per host. Each disk group used one NVMe for the cache tier and four SSDs for the capacity tier, resulting in a datastore capacity of 55.89 TB. vSAN dedupe and compression functions were deactivated.

          Storage Policy Based Management (SPBM) allows you to align storage with the application demands of the virtual machines. Below are some of the key SPBM parameters set for disks provisioned from vSAN datastore.

          vSAN FTT (Failures to Tolerate): With vSAN FTT, availability is provided by maintaining replica copies of data, to mitigate the risk of a host failure resulting in lost connectivity to data or potential data loss. For instance, FTT=1 supports n+1 availability by providing a second copy of data on a separate host in the cluster. However, the resulting impact on capacity is doubled.

          Hot bucket storage is provisioned from vSAN datastore using vSAN mirror policy (Failures to tolerate = 1) as shown in Figure 6 and Figure 7.

Graphical user interface, text, application</p>
<p>Description automatically generated

Figure 6. vSAN Storage Policy Availability Settings—Mirror

Graphical user interface, text, application</p>
<p>Description automatically generated

Figure 7. vSAN Storage Policy Advanced Settings—Mirror

vSphere Cluster B: VMware Cloud Foundation with Tanzu is enabled in this Cluster, which is a prerequisite for deploying Cloudian HyperStore for containers. Cloudian HyperStore was deployed in this cluster to provide the S3 Object storage service exclusively.

vSAN DPp platform supports two deployment options as discussed in the VMware vSAN Data Persistence platform section. In this solution, vSAN Direct configuration was used. vSAN Direct configuration allows modern stateful services to use the availability, efficiency and security features built into the modern stateful service layer, and have direct access to the underlying direct-attached hardware.

There are four ESXi hosts in total; each host provides the following disks for vSAN Direct and regular vSAN datastore.

vSAN Direct:  8 x 1.75TB SSD disks configured for vSAN direct. This was used for Object Data vSAN DPp platform, which allows the Cloudian HyperStore to take care of the data protection using the default HyperStore RF (Replication Factor)=3 setting.

Standard vSAN datastore :2 x 1.6TB NVMe disk was used to create a vSAN disk group with one cache and capacity device. This standard vSAN datastore was created and used for HyperStore Metadata.

In VMware Cloud Foundation deployments, vSAN automatically claims all local storage devices on your ESXi host. Using the procedure mentioned here, you can make the devices ineligible for regular vSAN and available for vSAN Direct.

Figure 8 and Figure 9 show the vSAN Direct storage policy used with the namespace where Cloudian object data was deployed. Notice that the Storage Policy creation wizard has an option to select vSAN direct as a storage placement type.    

Graphical user interface, text, application, email</p>
<p>Description automatically generated

Figure 8. VM Storage Policy—Enable Tag Based Placement Rules

Graphical user interface, text, application</p>
<p>Description automatically generated

Figure 9. VM Storage Policy—vSAN Direct Tagging

Cloudian HyperStore Configuration

Cloudian object storage was deployed on vSphere Cluster B. The prerequisites for enabling HyperStore for containers on vSAN DPp are discussed here. HyperStore manages persistent storage and persistent volume claims. HyperStore issues commands to vSAN to create persistent volumes that map back to HyperStore pods. A persistent volume exists on a single hard drive allowing HyperStore to manage data protection schemes. HyperStore storage policies are set with replica factor (RF) or erasure coding (EC). The default setting includes RF3 and EC4+2 in HyperStore for containers , depending upon the number of nodes.   

A four-node, HyperStore instance was deployed with 16 CPU and 64GB memory per node. The vSAN direct disks were provisioned to HyperStore for Object data. Based on the number of nodes by default, the HyperStore sets the Replica Factor to three. Each instance is assigned eight disks of 1,446GB to provide a total capacity of approximately 55TB. Figure 10 and Figure 11 from VMware vCenter UI show this configuration setting.

The regular vSAN datastore provides the storage for HyperStore metadata. As shown in Figure 10, it was provisioned using the default vSAN mirror policy here named as “vsan-default-policy-cl05”.

Failure and maintenance scenarios: Cloudian HyperStore responds automatically to events associated with the nodes on which a HyperStore instance is running. Details of HyperStore failure behavior can be found in the Cloudian HyperStore Quick Start Guide.

Table 3 shows the summary of the Cloudian HyperStore configuration.

Graphical user interface</p>
<p>Description automatically generated

Figure 10. HyperStore Node Configuration Parameters

Graphical user interface, text, application</p>
<p>Description automatically generated

Figure 11. HyperStore Configuration Summary

Table 3. HyperStore Nodes—Resource Configuration

Resource

Value

Number of Cloudian Nodes

4

CPU/Node

16

Memory/Node

64

Metadata Storage (vSAN Mirror) Total

1 TB

Object Storage ( vSAN Direct ) Total

55.5 TB ( 1776 GB x 4 host x 8 disks )

Figure 12 shows the capacity managed and health from the Cloudian management console (CMC) dashboard. Figure 13 shows the Buckets and Objects tab in the console, an S3 storage bucket called “migration” was created and was used by Splunk SmartStore.

 

Figure 12. Cloudian Management Console—Capacity and Health Status

Graphical user interface, text, application, website</p>
<p>Description automatically generated

Figure 13. HyperStore S3 Bucket for Splunk Usage

Splunk VM Configuration

On vSphere Cluster A, eight VMs (splunk001 – 008) were installed, two on each server, each Splunk VM with 14 vCPUs and 128GB memory. Table 4 shows the Splunk VM resource allocation. One OS disk and four data disks were provisioned to each VM. The data disk was used for Splunk hot buckets, and the disk uses the vSAN using mirror (FTT=1) policy.

Table 4. Splunk VM Configuration

VM Resources

Splunk Master Search (splunk001)

Splunk Indexers 7 Numbers

 ( splunk002 to splunk008) 

vCPU

14

14

Memory (GB)

128

128

Disk (vSAN)

1 x 100GB (OS)

4 x 600GB (Data)

1 x 100GB (OS)

4 x 600GB (Data)

Four data disks were spread evenly over four PVSCSI controllers. Linux logical volume manager (LVM) combined the four VMDKs on each Splunk VMs and was formatted using the ext4 filesystem. The resulting data disks were used to create the Splunk mount point (/opt/splunk) for hot data. Below are the sample commands used to create the logical volume and filesystem:

pvcreate /dev/sdb /dev/sdc /dev/sdd /dev/sde

vgcreate vgsplunk /dev/sdb /dev/sdc /dev/sdd /dev/sde

lvcreate -l +100%free -i 4 -n lvsplunk vgsplunk

mkfs.ext4 /dev/vgsplunk/lvsplunk

Figure 14 shows the screenshot from one of the Indexer VM after the filesystem was created and mounted.

A picture containing text</p>
<p>Description automatically generated

Figure 14. Disks Assigned to Indexer VM for Hot Data

Splunk Roles and Application Configuration

Figure 15 shows the key Splunk roles performed by the VMs.

Virtual machine splunk001 performs the search head role, and the rest of the 7 VMs perform the indexer role.   

Splunk was configured in distributed deployment mode. The search head distributes search requests to other instances called search peers or indexers in this method. Indexers perform the actual search as well as the data indexing finally, the search head merges the results back to the user.

Splunk Indexer clustering was enabled to provide high availability at Indexer level, while data indexing the peer node replicates data from other peer nodes governed by the Splunk replication factor. The Splunk Replicator factor in an indexer cluster is the number of data copies that the cluster maintains. During this configuration, you also designate a search factor. The search factor determines the number of searchable copies of data the indexer cluster maintains, and it must be less than or equal to the replication factor. As shown in Figure 16, the Splunk replication and search factor is configured as 2 respectively.

A screenshot of a computer</p>
<p>Description automatically generated

Figure 15. Splunk Instances and Roles

Graphical user interface, application</p>
<p>Description automatically generated

Figure 16. Splunk Configuration Summary

Splunk SmartStore migration: To validate the SmartStore migration, we initially deployed the traditional Splunk architecture with all hot, warm and cold buckets located on the Indexer. The Splunk cluster ingested 2TB of data a day for seven days. After this period, SmartStore was configured. This allowed for a migration scenario to be validated. During this time, warm and cold Splunk buckets were uploaded to HyperStore. Note this is a one-way operation within Splunk. Figure 17 shows the Splunk indexes.conf settings. It is an example of where SmartStore is enabled for the index “[migration].” It allows you to configure various settings specific to your remote storage. The settings begin with “remote.s3” are settings specific to the Cloudian S3 object storage. After the SmartStore is enabled on the existing Splunk index, the migration started and was validated.

For more details about indexes.conf parameters, see Configure SmartStore Setting.

Text</p>
<p>Description automatically generated

Figure 17. Sample indexes.conf for Splunk SmartStore Migration

Figure 18 shows the Splunk data moved to the S3 bucket called “migration”.

Graphical user interface, text, website</p>
<p>Description automatically generated

Figure 18. Splunk Data Migrated to Cloudian S3 Object Store

Monitoring Tools

We used the following monitoring tools in the solution testing:

vSAN Performance Service

vSAN Performance Service is used to monitor the performance of the vSAN environment using the vSphere web client. The performance service collects and analyzes performance statistics and displays the data in a graphical format. You can use the performance charts to manage your workload and determine the root cause of problems.

vSAN Health Check

vSAN Health Check delivers a simplified troubleshooting and monitoring experience of all things related to vSAN. Through the vSphere web client, it offers multiple health checks specifically for vSAN including cluster, hardware compatibility, data, limits, and physical disks. It is used to check the vSAN health before the mixed-workload environment deployment.

Splunk Monitoring Console

Splunk Monitoring console provides many features for monitoring the health and troubleshooting the Splunk Enterprise deployment.

Platform Validation

Before the deployment, it is highly recommended to validate the performance capabilities of the intended platform.

HCIBench is the preferred tool to validate both overall and I/O specific profile performance using synthetic I/O. HCIBench provides the ability to run user-defined workloads and a series of pre-defined tests, known as the EasyRun suite. When leveraging EasyRun, the HCIBench appliance executes four different standard test profiles that sample system performance and report key metrics. This is the preferred tool to validate vSAN block storage.

Cosbench is a benchmarking tool to measure the performance of Cloud Object Storage services such as Cloudian HyperStore.

Beyond the synthetic testing, it is advised to leverage the data ingestion tools provided by Splunk like eventgen or, when feasible, split production ingests data with Splunk heavy forwarder to a testing environment. This will help users to simulate representative ingestion and search workloads before deploying in production.

Production Criteria Recommendations 

vSphere and vSAN Configuration

Virtual Machine vCPU and Memory

Splunk virtual machines such as the search head and indexers are CPU and memory intensive, and the workloads require proper sizing of the VM vCPU and memory to achieve optimal performance. In vSphere environments running mixed workloads, the use of vCPU and memory reservations should be considered to ensure adequate compute resources.

Recommendation: Avoid CPU and memory overcommitment

Network

Splunk workloads are network intensive and network port bandwidth is consumed by application traffic ( data ingestion, searches), S3 object storage access, and vSAN distributed storage traffic. Considering this size enough port with enough network bandwidth and choose network switches with non-blocking architecture with high buffers.

Recommendation: Use minimum 4 x 10Gbps port; preferably use a larger bandwidth port like 25Gbps or higher.

vSAN FTT 

The Number of Failures to Tolerate capability addresses the key customer and design requirement of availability. With FTT, availability is provided by maintaining replica copies of data, to mitigate the risk of a host failure resulting in lost connectivity to data or potential data loss.  

Recommendation: FTT=1 (This applies to the block storage provisioned to Splunk VMs. For Cloudian S3 Object storage, follow the “Cloudian HyperStore Configuration” section.

vSAN RAID 

vSAN can use RAID 1 for mirroring or RAID 5/6 for Erasure Coding. Erasure coding can provide the same level of data protection as mirroring (RAID 1), while using less storage capacity. 

Recommendation: RAID 1 to avoid performance overhead when compared with Erasure Coding. This applies to the block storage provisioned to Splunk VMs. For Cloudian S3 Object storage, follow the “Cloudian HyperStore Configuration” section.

vSAN Dedupe and Compression 

vSAN deduplication and compression can reduce raw storage capacity consumption and can be used when the application-level compression is not used. In this case, Splunk natively provides compression; hence vSAN compression and deduplication may not provide significant savings. Therefore, it is recommended to deactiviate deduplication and compression.

Recommendation: Disable deduplication and compression 

vSAN Encryption 

vSAN can perform data at rest encryption. Data is encrypted after all other processing, such as deduplication, is performed. Data at rest encryption protects data on storage devices if a device is removed from the cluster. Use encryption per your company’s Information Security requirements.  

Recommendation: Enable encryption required by your company's Information Security Policy. 

vSphere DRS  

DRS works on a cluster of ESXi hosts and provides resource management capabilities like load balancing and VM placement. DRS also enforces user-defined resource allocation policies at the cluster level while working with system-level constraints.

Recommendation: DRS–partially automated

vSphere High Availability 

vSphere HA provides high availability for virtual machines by pooling them and the hosts they reside on into a cluster. Hosts in the vSphere cluster are continually monitored. In the event of a failure, the virtual machines on a failed host are restarted on alternate hosts. 

Recommendation: HA Enabled.

Cloudian HyperStore Configuration

Table 5 shows the “Minimum Hardware Configuration” for ESXi host sizing requirements when deploying HyperStore.

Table 5. Minimum Hardware Configuration

 

Standalone or Single Instance

CPU

1 Socket, 12 core CPUs

RAM

128GB

Network adapter

Min 2 X 10Gb. Recommended 25Gb. ( Especially for Network-intensive Splunk like use cases where data from multiple indexers is moved)

Storage adapter

Must be listed on vSAN Hardware Compatibility List (HCL)

Disks

The required minimum flash drives for metadata storage class is 1 per node. The recommended number of minimum data drives in vSAN Direct object storage class is 10 per node.

When creating a new HyperStore instance, the following parameters specified in the wizard Figure affect the overall sizing of compute and storage requirements:

  • Namespace: The target deployment namespace for the cluster. This namespace needs to be pre-created in vCenter and assigned the appropriate vSAN storage policies and access permissions to users who can create HyperStore instances in it.
  • CPU required: Number of vCPUs associated per pod. 16 vCPUs as a minimum, enlarge vCPU based on workload to keep CPU usage under 80%.
  • Memory required: RAM per pod. Minimum of 64GB is recommended.
  • Number of nodes: Number of nodes (pods) in a HyperStore cluster. It cannot exceed the number of physical hosts in the vSAN cluster. The larger clusters provide better performance and data efficiency. A minimum of three nodes are required to satisfy Data Protection policies.
  • Metadata storage class: Set the associated storage class for object metadata. Typically set to SSD or NVMe based class.
  • Object data storage class: Set the associated storage class of object data. This class is the bulk of where HyperStore data is stored, so vSAN Direct or SNA policy is preferred.
  • Metadata volume size: The amount of capacity dedicated to object metadata. 100TB capacity supports approximately 68 million objects.
  • Total object data storage size: The amount of RAW capacity for object data in the cluster. To calculate the usable capacity, refer to HyperStore Storage Policies below.
  • Object Data volume size: The size of PV the nodes consume to meet the total object data storage size value. Because each persistent volume exists on a single disk drive, this value should not exceed the unused capacity of the drive.

 HyperStore Storage Policies

 Note: The HyperStore storage policy is unrelated to the vSphere/vSAN storage policy.

HyperStore storage policies are ways of protecting data so that it is durable and highly available to users. The HyperStore system lets you pre-configure one or more storage policies. Two options exist for creating storage policies. A default storage policy is created based on the number of nodes in a cluster at the creation time (for example: 3 nodes=RF3, 6 nodes=EC4+2). Additional policies can be created in the Cloudian Management Console.

After a new storage bucket is created, users can choose which pre-configured storage policy to use to protect data in that bucket.

Users cannot create buckets until you have created at least one storage policy. For each storage policy that you create, choose from either of two data protection methods:

  • Replication: With replication, a configurable number of copies of each data object are maintained in the system, and each copy is stored on a different node. For example, with 3X replication 3 copies of each object are stored, with each copy on a different node.
  • Erasure coding: With erasure coding, each object is encoded into a configurable number (known as the “k” value) of data fragments plus a configurable number (the “m” value) of redundant parity fragments. Each of an object’s “k” plus “m” fragments are unique, and each fragment is stored on a different node. The object can be decoded from any “k” number of fragments. To put it another way: the object remains readable even if the “m” number of nodes is unavailable. For example, in a 4+2 erasure coding configuration (4 data fragments plus 2 parity fragments), each object is encoded into a total of 6 unique fragments, which are stored on 6 different nodes, and the object can be decoded and read so long as any 4 of those 6 fragments are available.

In general, erasure coding requires less storage overhead—the amount of storage consumption above and beyond the original size of the stored objects, to ensure data persistence and availability—than replication. Put differently; erasure coding is more efficient in utilizing raw storage capacity than is replication.

Number of Nodes and Storage Policy

The number of nodes in a HyperStore cluster affects the available storage policy. A 3-node cluster can only support Replica Factor (RF) 3. A cluster with 6 nodes can support RF3 as well as an Erasure Coding scheme of 4+2. The larger the cluster, the more efficient the EC scheme can be.

Data protection efficiency is calculated as a ratio of usable capacity to raw capacity as follows:

  • 3 nodes: RF3 = 33%,
  • 6 nodes: EC4+2 = 67%,
  • 8 nodes: EC6+2 = 75%,
  • 10 nodes: EC8+2 = 80%

Splunk

Splunk enterprise distributed deployment requires virtual machines with different roles; Table 6 lists some of the critical roles and their VM resource requirements.

Table 6. Splunk VM Hardware Configuration

 

Standalone or Single Instance

Forwarder

Indexing (+ search)

Search Heads

CPU

X

*

X

X

Memory

X

*

X

X

Storage IO

X

*

X

X

Networking

X

X

X

X

Note:  *  in the above table represents a general comparison between the instance roles

  • Forwarder: Used to forward data to indexer and sized by GB/day, sample size is 250GB/day and would result in ~ 23+Mbit/s data ingestion. Mainly network bounded with low CPU and memory demand. Do not consume much storage except for small OS disk capacity. However, there are exceptions, like some forwarders used for Amazon Web services (AWS) add-on applications need considerable CPU, memory, and storage IO.
  • Indexer: Database and index require most of the resources because processing the search head request and data ingestion and indexing. This is also storage capacity heavy, and it executes high IO demand on the storage system.
  • Search heads: Executes searches against indexers and CPU bound to perform optimum against the indexer for the initiated search.

Splunk supports using Splunk enterprise in several computing environments. For the supported environments including but limited to Operating System (OS) version, OS settings, memory management scheme, and file systems (System Requirements), see Reference Hardware for capacity planning.

Conclusion

VMware Cloud Foundation delivers flexible, consistent, secure infrastructure and operations across private and public clouds. Combining the traditional vSAN storage with vSAN DPp and object storage such as Cloudian, we can support hot data (high performance) and warm data (large capacity) to meet the demands of Splunk SmartStore architecture.

In addition, Cloudian HyperStore is highly secure, enterprise-grade, fully native, S3-compatible object storage software. HyperStore for containers is a containerized version of HyperStore, designed to run on the VMware vSAN Data Persistence platform. The integration supports Cloudian HyperStore with VMware Cloud Foundation with Tanzu, and combines industry-leading VMware HCI platform and Cloudian HyperStore into a single, shared-nothing data platform.

CTO’s and CFO’s budget objectives can be achieved with dynamic provisioning, allowing enterprises to scale up and scale down. Further, VMware Cloud Foundation with VMware vSAN provides simplicity in management and Day 2 operations for Splunk Workloads.

About the Author 

Palani Murugan, Senior Technical Marketing Architect in VMware Cloud Infrastructure Business Group authored this paper with contributions from the following members:

  • Scott Ekstrom, Technical Marketing Engineer in Cloudian
  • Amit Rawlani, Alliance Director in Cloudian
  • Ka Kit Wong, Staff Technical Marketing Architect in VMware Cloud Infrastructure Business Group
  • Ting Yin, Senior Technical Marketing Architect in VMware Cloud Infrastructure Business Group
  • Catherine Xu, Workload Technical Marketing Manager in VMware Cloud Infrastructure Business Group

​​

image-20220429201808-1

 

Filter Tags

Modern Applications Cloud Foundation 4.3 Kubernetes Workload Domain Document Reference Architecture