]

Type

  • Document

Level

  • Overview

Technology

  • vSAN 2 Node
  • vSAN Compression
  • vSAN Deduplication
  • vSAN Encryption
  • vSAN File Services
  • vSAN Resilience
  • vSAN Sizer
  • vSAN Stretched Cluster

vSAN Frequently Asked Questions (FAQ)

Architecture

 

What are the hardware requirements for running vSAN?

vSAN requires at least two physical servers configured with hardware that is in the  VMware Compatibility Guide for Systems/Servers and VMware Compatibility Guide for vSAN. Using hardware that is not certified may lead to performance issues and/or data loss. The behavior of non-certified hardware cannot be predicted. VMware cannot provide support for environments running on non-certified hardware.

A vSAN cluster must have at least two physical hosts with local storage devices dedicated to vSAN. A vSAN cluster containing hosts with magnetic drives in the capacity tier is commonly called a “hybrid” cluster or configuration. A cluster with hosts containing flash devices in the capacity tier is referred to as an “all-flash” cluster or configuration. All-flash configurations are more commonly deployed to take advantage of flash drive performance and reliability.

Hosts participating in a vSAN cluster must be connected to the network using at least one network interface card (NIC). Multiple NICs are recommended for redundancy. Hybrid vSAN configurations can use 1Gb or higher networks although 10Gb is recommended. All-flash vSAN configurations require a minimum of 10Gb.

Cluster Size

A vSAN cluster supports any number of physical hosts from two up to a maximum of 64 hosts in a cluster. Multiple clusters can be managed by a single VMware vCenter Server™ instance. vSAN 2-node configurations have two physical hosts. A stretched cluster can have up to 30 physical hosts (15 at each site).

The following blog discusses the factors to consider while deciding the size of a cluster, vSAN Clusters – Considerations when Sizing and Scaling

Hardware Deployment Options

A  vSAN ReadyNode™  is an x86 server, available from all the leading server vendors, which is pre-configured, tested, and certified for vSAN. vSAN ReadyNodes provide an open, flexible approach when considering deployment methods. Organizations can continue to use their server vendor(s) of choice. Each ReadyNode is optimally configured for vSAN with the required amount of CPU, memory, network, I/O controllers, and storage devices.

Turn-key appliances such as  Dell EMC VxRail™, Hitachi UCP HC, and Lenovo ThinkAgile VX Series provide a fully integrated VMware hyper-converged solution for a variety of applications and workloads. Simple deployment enables customers to be up and running in as little as 15 minutes.

Custom configurations using jointly validated components from all the major OEM vendors are also an option. The vSAN Hardware Quick Reference Guide provides some sample server configurations as directional guidance. All components should be validated using the VMware Compatibility Guide for vSAN.

What is a vSAN disk group?

The flash device in the cache tier and the capacity device(s) are collectively known as a disk group. Each host has a minimum of one and up to a maximum of five disk groups. Each disk group consists of precisely one cache device and a minimum of one up to a maximum of seven capacity devices.

Can I mix all-flash disk groups and hybrid disk groups in the same host or cluster?

Mixing disk group types (all-flash and hybrid) is not supported. This complicates operational aspects such as balancing workloads, synchronizing components, and capacity management. All hosts in the cluster must be configured with the same type of disk groups (hybrid or all-flash).

What are the software requirements for running vSAN?

vSAN is natively embedded in the VMware vSphere® Hypervisor. vCenter Server is required to configure and manage a vSAN cluster. There is no need to install additional software or deploy “storage controller virtual appliances” to every host in the cluster as commonly found in other hyper-converged infrastructure (HCI) solutions. vSAN is enabled in a matter of minutes with just a few mouse clicks in the vSphere Web Client.

In certain implementation scenarios such as a vSAN stretched cluster and 2-node configurations, the deployment of a “witness host virtual appliance” is required to provide resiliency against "split-brain" scenarios. This virtual appliance is easily deployed using an OVA file provided by VMware. The witness virtual appliance is a pre-configured virtual machine (VM) running vSphere.

Can a witness appliance be shared across multiple deployments?

Until vSAN 7, Each 2-node deployment required a dedicated witness appliance

vSAN 7 Update 1 greatly improves efficiency by allowing witness appliances to be shared across multiple 2-node sites.

Note:Witness appliance cannot be shared in a Stretched cluster architecture.

How many 2-node deployments can share a witness appliance?
With vSAN 7 Update 1, a single witness appliances can support up to 64 2-node deployments

 

How much memory is required in a vSAN host?

Hosts participating in a vSAN cluster should be configured with a minimum of 32GB of RAM. Higher amounts of RAM are recommended depending on application requirements, VM-to-host consolidation ratio goals, and other factors. More information can be found in  Understanding vSAN 6.x memory consumption (2113954).

What are the processor requirements for vSAN hosts?

Processors in the  VMware Compatibility Guide for Systems/Servers must be used in supported configurations. vSAN typically uses less than 10% of CPU resources on a host. The use of vSAN space efficiency features such as deduplication and compression can increase CPU utilization by approximately 5-10%. vSAN is supported with hosts that have a single CPU. Hosts with multiple CPUs are recommended for redundancy, performance, and higher VM-to-host consolidation ratios.

Why does vSAN require a flash device for the cache tier?

The flash device in the cache tier of each disk group is used as a write buffer in all-flash vSAN configurations. These cache devices are typically higher-endurance, lower-capacity devices. Data is de-staged from the cache tier to the capacity tier. Capacity tier devices are more commonly lower-endurance, higher-capacity flash devices. The majority of reads in an all-flash vSAN cluster are served directly from the capacity tier. An all-flash configuration provides a good balance of high performance, low latency, endurance, and cost-effectiveness.

The flash device in the cache tier of a hybrid vSAN configuration is used for read caching and write buffering. 70% of the capacity is allocated for read cache and 30% for buffering writes. Data is de-staged from the cache tier to the capacity tier. The flash device in the cache tier enables very good performance for a hybrid configuration.

Does vSAN use VMware vSphere Virtual Volumes?

VMware vSphere Virtual Volumes™  is designed for use with external storage arrays. vSAN uses local drives to create a shared datastore. vSphere Virtual Volumes and vSAN can be used in the same cluster and both provide the benefits of storage policy-based management.

Can I use existing SAN and NAS storage in the same cluster as vSAN?

Yes, vSphere can access and use traditional VMFS and NFS datastores alongside vSAN and vSphere Virtual Volumes—all in the same cluster. If your SAN or NAS solution is compatible with vSphere Virtual Volumes, management is easier and more precise as storage policies can be used to manage all of your storage on a per-VM basis.

In most cases, vSphere Storage vMotion can be used to migrate VMs between these various datastore types. This feature makes it easy to migrate existing workloads when there is a need to perform maintenance or retire and older storage solution. 

Where can I find guidance on sizing a vSAN cluster?

To evaluate an existing environment for migration to vSAN, the VMware HCI Assessment tool, and Live Optics(formerly known as DPACK) are recommended options. It is best to request assistance with an assessment from your preferred VMware reseller. Data collected can be used along with the vSAN ReadyNode Sizer to help determine suitable vSAN ReadyNode server configurations. Additional guidance can be found in the vSAN Design Guide.

Does vSAN require reserve capacity or slack space for its operations? What is the recommended threshold and guidance to reserve capacity?

vSAN requires additional space for operations such as host maintenance mode data evacuation, component rebuilds, rebalancing operations, and VM snapshots. Activities such as rebuilds and rebalancing can temporarily consume additional raw capacity. Host maintenance mode temporarily reduces the total amount of raw capacity a cluster has. This is because the local drives on a host that is in maintenance mode do not contribute to vSAN datastore capacity until the host exits maintenance mode. The general guidance until vSAN 7 is to accommodate 25-30% of capacity as reserve capacity when designing and running a vSAN cluster. For example, a vSAN datastore with 20TB of raw capacity should always have 5-6TB of free space available for use as slack space. This recommendation is not exclusive to vSAN. Most other HCI storage solutions follow similar recommendations to allow for fluctuations in capacity utilization without disruption. See this blog article for more information:  vSAN Operations: Maintain Slack Space for Storage Policy Changes 

Starting with vSAN 7 Update 1, there has been significant improvements to reduce the reserve capacity requirements to handle host failures and internal operations. vSAN Sizer tool factors in a set of parameters based on design input to provide the required sizing guidance.

Is there any mechanism to monitor reserve capacity in a cluster?

vSAN 7 Update 1 introduces new workflows in the UI to optionally configure thresholds for vSAN operations and host failure. This is represented in the UI as Operations Reserve and Host Rebuild Reserve.

  • Operations reserve: The storage capacity reserved for vSAN internal operations such as resynchronization and rebuilds
  • Host Rebuild Reserve:The storage capacity reserved for rebuilding one host’s worth of capacity, should there be a sustained outage of a host.

When the threshold is reached a health alert is triggered and no further provisioning is allowed. This enhancement greatly helps administrators manage capacity efficiently.

 

How can we monitor if TRIM/UNMAP related metrics?

vCenter UI can be used to monitor IOPS and throughput generated through TRIM/UNMAP commands. This is available under [Monitor] > [vSAN] > [Performance] >  Backend. Alternatively, this can also be monitored through vsantop 

Are there recommendations for vSAN network connectivity?

The vSAN Network Design guide contains more information and recommendations for network connectivity in a vSAN cluster. Also, see the vSAN Stretched Cluster Bandwidth Sizing guide for information specific to stretched clusters.

Can I configure multiple vmknics with the vSAN service enabled to help improve resilience against vSAN network fabric failure?

Yes, this is a supported configuration. Note that vSAN 6.7 and higher versions include improvements that provide much faster failover times when one of the redundant fabrics fail. Cluster Quickstart also makes the setup process much easier by streamlining the configuration of a distributed virtual switch including physical NIC assignment and the vmkernel ports for vSAN and vMotion.

Do NIC teaming solutions improve performance?

Generally speaking, NIC teaming can provide a marginal improvement in performance, but this is not guaranteed. The complexity and additional expense rarely justify the use of NIC teaming for the vSAN network. More details can be found in the vSAN Networking Design guide.

Can I add a host that does not have local disks to a vSAN cluster?

A host with no local storage can be added to a vSAN cluster. Note that all hosts in a vSAN cluster require vSAN licensing to use the vSAN datastore regardless of whether they contribute storage to the vSAN datastore.

Recommendation: Use uniformly configured hosts for vSAN deployments. While compute only hosts can exist in a vSAN environment, and consume storage from other hosts in the cluster, VMware does not recommend having unbalanced cluster configurations.

Is it possible to deploy a vCenter Server Appliance (VCSA) to a single host when building a new vSAN cluster?

VCSA deployment wizard includes the ability to claim disks and turn on vSAN on a single host. This enables administrators to deploy vCenter Server to a new environment where vSAN will be the only datastore. This deployment option is typically referred to as "Easy Install". In addition, Cluster Quickstart provides a simplified workflow to configure and add hosts to an existing cluster.

Can I mix different hosts, controllers, drives, and so on in a cluster?

Consistent hardware and software configurations across all hosts in a vSAN cluster are recommended. However, mixing various host configurations in a vSAN cluster is supported as long as all components, firmware versions, drivers and so on are listed in the  VMware Compatibility Guide for Systems/Servers and  VMware Compatibility Guide for vSAN for the versions of vSphere and vSAN you are using. This flexibility enables organizations to upgrade and replace hosts and storage in an incremental fashion, which is commonly less expensive and easier than replacing entire storage arrays.

 Recommendation: Implement consistent hardware and software configurations across all hosts in a vSAN cluster. Verify vMotion compatibility across all of the hosts in the cluster - see this VMware Knowledge Base (KB) Article:   Enhanced vMotion Compatibility (EVC) processor support  (1003212).

Does vSAN require storage fabric host bus adapters (HBAs)?

No, vSAN uses standard network interface cards (NICs) found in nearly every x86 server platform. There is no need to provision and implement specialized storage networking hardware to use vSAN. See the  VMware vSAN Network Design guide for more information.

Are there vSphere features that are not supported with vSAN?

Nearly all vSphere features such as VMware vSphere vMotion™, VMware vSphere Distributed Resource Scheduler™, VMware vSphere High Availability, VMware vSphere Network I/O Control, and VMware vSphere Replication™ are compatible and supported with vSAN. VMware vSphere Fault Tolerance is supported for VMs running on vSAN except for stretched clusters.

The following vSphere features are not supported with vSAN:

VMware vSphere Distributed Power Management™ VMware vSphere Storage DRS™ VMware vSphere® Storage I/O Control

Can I share a single vSAN datastore across multiple vSphere clusters?

No, a vSAN datastore is directly accessible only by the hosts and VMs in the vSAN cluster.

How does vSAN store objects such as VM configuration files and virtual disks?

vSAN is an object-based datastore with a primarily flat hierarchy. Items such as a VM’s configuration (VMX) and virtual disks (VMDKs) are stored as objects. An object consists of one or more components. The size and number of components depend on several factors such as the size of the object and the storage policy assigned. The following figure shows common virtual machine objects.

Each object commonly consists of multiple components. The image below provides details on the number of components and the hosts where they are located. The vSAN Default Storage Policy is assigned which contains the rules Failures to Tolerate (FTT) = 1 and RAID-1 (mirroring). As a result, two components are created - two copies of the virtual disk - and there is a witness component that is used to achieve quorum if one of the other components is offline or there is a split-brain scenario.

Note: The witness component should not be confused with the witness host virtual appliance discussed earlier in this document as they are two different items.

When a VM is migrated to another host, are the VM’s objects migrated with the VM?

This concept is often referred to as “data locality”. vSAN does not require data locality to achieve excellent performance. vSAN does not tax the vSAN backend network with moving multiple gigabytes of data every time a VM is migrated to another host. There is no need to do this considering a 10Gb network has latencies from five to 50 microseconds (1). Flash devices such as a solid-state drive (SSD) have higher latencies ranging from 90 microseconds to eight milliseconds under heavy load (2). The few microseconds of latency added by reading data across a 10Gb network connection has no impact on performance.

vSAN also features a local read cache, which is kept in memory on the host where the VM is running. This helps avoid reads across the network and further improves performance considering the speed of reading from memory is exponentially faster than reading from persistent storage devices.

  1. http://www.qlogic.com/Resources/Documents/TechnologyBriefs/Adapters/Tech_Brief_Introduction_to_Ethernet_Latency.pdf
  2. http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-p3700-spec.pdf
When a VM is migrated to another host, does the vSAN cache need to be re-warmed?

As discussed in the previous question, the storage objects belonging to a VM are not migrated with the VM. Therefore, data in the cache tier is not lost and does not need re-warmed

Where can I find the vSAN hardware compatibility list (HCL)?

It is part of the VMware Compatibility Guide - here is the link:  VMware Compatibility Guide for vSAN 

 Recommendation:  Only use hardware that is found in the VMware Compatibility Guide (VCG). The use of hardware not listed in the VCG can lead to undesirable results.

Where can I find details about what can and can’t be changed in a vSAN ReadyNode?

Refer following KB for details:  What You Can (and Cannot) Change in a vSAN ReadyNode™ (52084) 

Where can I find guidance on vSphere boot devices for hosts in a vSAN cluster?

See these blog articles:

 vSAN Considerations When Booting from a Flash Device 

 M.2 SSD as Boot Device for vSAN 

What are the guidelines for sizing the cache tier in an all-flash vSAN configuration?

Refer to the vSAN Design Guide. Also, see this blog article for details:  Designing vSAN Disk groups – All-Flash Cache Ratio Update 

How do I determine if the storage controller firmware version installed is compatible and supported?

Storage Controller firmware health check helps identify and validate the firmware release against the recommended versions. With 6.7 U3 onwards the Storage Controller firmware health check allows for multiple approved firmware levels. This helps identify if the installed firmware is obsolete, current or newer and its support status.

I see my firmware and driver version is higher than what is listed in the VMware Compatibility Guide for vSAN. Am I supported by default for a higher version?

It depends on which components we are referring to. For the IO controller, we need the exact same version as listed in vSAN HCL to provide support. For disk drives, there is a minimum version for each family and as long as it is higher than what is listed in that family, we support it. This is decided and updated after discussion with the OEM partner.

Where can I find vSAN ReadyNode configurations for Intel Skylake and AMD EPYC processor families?

In the VMware Compatibility Guide for vSAN under the “vSAN ReadyNode Generation” section, choose “Gen3 – Xeon Scalable” for Intel Skylake and “Gen3-AMD-EPYC”.

Does vSAN support 4Kn storage devices?

Yes, 4Kn devices are supported for use in the capacity tier (only) of a vSAN environment. The following Knowledge Base article has more information -KB-2091600.

 

Availability

What happens if a host fails in a vSAN cluster?

vSAN will wait for 60 minutes by default and then rebuild the affected data on other hosts in the cluster. The 60-minute timer is in place to avoid unnecessary movement of large amounts of data. As an example, a reboot takes the host offline for approximately 10 minutes. It would be inefficient and resource-intensive to begin rebuilding several gigabytes or terabytes of data when the host is offline briefly.

vSphere HA is tightly integrated with vSAN. The VMs that were running on a failed host are rebooted on other healthy hosts in the cluster in a matter of minutes. A click-through demonstration of this scenario is available here:  vSphere HA and vSAN 50 VM Recovery.

 Recommendation: Enable vSphere HA for a vSAN cluster.

How does vSAN handle a network partition?

vSAN uses a quorum voting algorithm to help protect against “split-brain” scenarios and ensure data integrity. An object is available for reads and writes as long as greater than 50% of its components are accessible.

As an example, a VM has a virtual disk with a data component on Host1, a second mirrored data component on Host2, and a witness component on Host 3. Host1 is isolated from Host2 and Host3. Host2 and Host3 are still connected over the network. Since Host2 and Host3 have greater than 50% of the components (a data component and a witness component), the VM’s virtual disk is accessible.

However, if all three hosts in our example above are isolated from each other, none of the hosts have access to greater than 50% of the components. vSAN makes the object inaccessible until the hosts are able to communicate over the network. This helps ensure data integrity.

 Recommendation: Build your vSAN network with the same level of resiliency as any other storage fabric.

What happens if a storage device fails in a vSAN host?

VMs are protected by storage policies that include failure tolerance levels. For example, a storage policy with a “Failures to Tolerate” (FTT) rule set to one with RAID-1 mirroring will create two copies of an object with each copy on a separate host. This means the VMs with this policy assigned can withstand the failure of a disk or an entire host without data loss.

When a device is degraded and error codes are sensed by vSAN, all of the vSAN components on the affected drive are marked degraded and the rebuilding process starts immediately to restore redundancy. If the device fails without warning (no error codes received from the device), vSAN will wait for 60 minutes by default and then rebuild the affected data on other disks in the cluster. The 60-minute timer is in place to avoid unnecessary movement of large amounts of data. As an example, a disk is inadvertently pulled from the server chassis and reseated approximately 10 minutes later. It would be inefficient and resource-intensive to begin rebuilding several gigabytes of data when the disk is offline briefly.

When the failure of a device is anticipated due to multiple sustained periods of high latency, vSAN evaluates the data on the device. If there are replicas of the data on other devices in the cluster, vSAN will mark these components as “absent”. “Absent” components are not rebuilt immediately as it is possible the cause of the issue is temporary. vSAN waits for 60 minutes by default before starting the rebuilding process. This does not affect the availability of a VM as the data is still accessible using one or more other replicas in the cluster. If the only replica of data is located on a suspect device, vSAN will immediately start the evacuation of this data to other healthy storage devices.

 Note: The failure of a cache tier device will cause the entire disk group to go offline. Another similar scenario is a cluster with deduplication and compression enabled. The failure of any disk (cache or capacity) will cause the entire disk group to go offline due to the way deduplicated data is distributed across disks.

 Recommendation: Consider the number and size of disk groups in your cluster with deduplication and compression enabled. While larger disk groups might improve deduplication efficiency, this also increases the impact on the cluster when a disk fails. Requirements for each organization are different so there is no set rule for disk group sizing.

What if there is not enough free capacity to perform all of the component rebuilds after one or more host failures?

In cases where there are not enough resources online to comply with all storage policies, vSAN 6.6 and newer versions will repair as many objects as possible. This helps ensure the highest possible levels of redundancy in environments affected by the unplanned downtime. When additional resources come back online, vSAN will continue the repair process to comply with storage policies.

 Recommendation: Maintain enough reserve capacity for rebuild operations and other activities such as storage policy changes, VM snapshots, and so on.

What happens when there are multiple failures(loss of hosts or disk groups) that exceed the configured threshold of failures?

Some vSAN objects will become inaccessible if the number of failures in a cluster exceeds the failures to tolerate (FTT) setting in the storage policy assigned to these objects. For example, let's consider a vSAN object such as a virtual disk (VMDK) that has a storage policy assigned where FTT=1 and RAID-1 mirroring is used for redundancy. This means there are two copies of the object with each copy on a separate host. If both hosts that contain these two copies are temporarily offline at the same time (two failures), that exceeds the number of failures to tolerate in the assigned storage policy. The object will not be accessible until at least one of the hosts is back online. A more catastrophic failure, such as the permanent loss of both hosts, requires a restore from backup media. As with any storage platform, it is always best to have backup data on a platform that is separate from your primary storage.

How do I backup VMs on vSAN?

Many third-party data protection products use VMware vSphere Storage APIs - Data Protection to provide efficient, reliable backup and recovery for virtualized environments. These APIs are compatible with vSAN just the same as other datastore types such and VMFS and NFS. Nearly all of these solutions should work with vSAN. It is important to obtain a support statement for vSAN from the data protection product vendor you use. Best practices and implementation recommendations vary by vendor. Consult with your data protection product vendor for optimal results.

 Recommendation: Verify your data protection vendor supports the use of their product with vSAN.

Is it possible to stop vSAN resynchronization operations?

Completely stopping resynchronization operations is not currently supported. In nearly all cases, this would not be recommended - especially in cases where vSAN is building new components to restore object redundancy after a disk or host failure. Additionally, with vSAN 6.7 Update 3 release onwards,  free capacity across storage devices, disk groups, and across the cluster are implicitly monitored against predefined thresholds. vSAN pauses resync operations if disk space usage meets or exceeds the threshold. Subsequently, the operations are resumed when sufficient capacity is made available.

 Recommendation: While it is possible to throttle resynchronization traffic it is not recommended unless advised to do so by VMware Global Support Services (GSS) as this will delay the remediation of VMs out of compliance with their storage policies.

How is vSAN impacted if vCenter Server is offline?

vCenter operates as the management plane for vSAN and is the primary interface to manage and monitor vSAN. However, vCenter does not affect the data plane i.e., the VM I/O path. Hence when vCenter Server is offline, vSAN continues to function normally. VMs continue to run, and application availability is not impacted. Management features such as changing a storage policy, monitoring performance, and adding a disk group are not available.

vSAN has a highly available control plane for health checks using the VMware Host Client—even if vCenter Server is offline. Hosts in a vSAN cluster cooperate in a distributed fashion to check the health of the entire cluster. Any host in the cluster can be used to view vSAN Health. This provides redundancy for the vSAN Health data to help ensure administrators always have this information available.

The What happens to vSAN when vCenter Server is offline? blog article provides more details.

Does the vSAN iSCSI Target Service support Windows Server Failover Cluster (WSFC) configurations?

WSFC is supported with vSAN 6.7 and higher versions when using the vSAN iSCSI target service.

Additional details are considerations are outlined in the following Knowledge Base article - Guidelines for supported configurations and blog article - Introducing-wsfc-support-vsan/

What are my options for redundancy in a vSAN stretched cluster configuration?

VMware vSAN Stretched Clusters are supported on both Hybrid configurations and All-Flash configurations.

vSAN 6.5 and previous versions mirror data across sites for redundancy. If a disk, host, or an entire site goes offline, the data is still available at the other site. When the offline disk, host, or site comes back online, data is resynchronized to restore redundancy. vSAN 6.6 also includes local failure protection. RAID-1 mirroring or RAID-5/6 erasure coding can be implemented within each stretched cluster site to provide local resiliency to disk and host failures. In addition to providing higher levels of redundancy, this minimizes production and resynchronization traffic across the intersite link.

Note: RAID-5/6 for local protection is available only with All-Flash configuration.

Does vSAN work with VMware Site Recovery Manager?

Yes,  VMware Site Recovery Manager™  is compatible with vSAN to automate data center migration and disaster recovery. vSphere Replication is used to perform per-VM replication. Site Recovery Manager is integrated with vSphere Replication.

Refer VMware Product Interoperability matrices for version-specific interoperability details

Can I use vCenter HA with vSAN Stretched Cluster?

While vCenter HA can, of course, be used with vSAN Stretched Cluster using DRS affinity/anti-affinity in order to pin VMs to sites a cost/benefit analysis must be done in order to understand if it will work in your environment and operational model.

VCHA requires a maximum of 5ms latency between all three VMs, the primary, secondary and witness - as such, placing the VCHA witness on the same site as the vSAN SC witness host would mean that site can be a maximum of 5ms away, not 200ms as is supported by vSAN SC. If your vSAN witness sites are more than 5ms away an option is to co-locate the VCHA witness with either the primary or the secondary VCHA VMs, however, this also means that in the event that the co-located site fails, the VC will be offline.

vSAN SC is natively integrated with vSphere HA and as such, offers automated failover and startup for all VMs on it - including vCenter. The benefit of using VCHA over a single vCenter VM with vSphere HA to automate the restart is in the order of a minute or two in startup time - as such, it should be considered that if the extra startup time is acceptable, use vSphere HA and a single vCenter VM for operational simplicity.

If not, use VCHA with your chosen topology: either the VCHA witness on the same site as the vSAN Witness provided it is <5ms away or co-located with the primary or secondary VCHA VMs taking into consideration the failure scenarios that go with such a co-located topology.

 

Performance

What is the Number of Disk Stripes per Object rule in a vSAN storage policy?

Setting this rule to a number other than the default of 1 instructs vSAN to stripe an object across multiple drives. For example, setting this rule to 4 instructs vSAN to stripe an object with this policy assigned across four drives. This rule can be beneficial especially in hybrid configurations when there are higher numbers of read cache misses. Since a read cache miss in a hybrid configuration causes vSAN to read directly from the capacity tier, which is magnetic disks, striping those reads across multiple drives can improve read performance. However, it is best to properly size the cache tier in a hybrid configuration rather than relying on striping objects across multiple drives to achieve better performance. In most cases, it is best to leave the striping rule at its default setting of 1 for hybrid and all-flash vSAN configurations.

A more detailed discussion is outlined in the following blog - vSAN stripes 

Why does vSAN sometimes stripe an object across multiple components even though the Number of Disk Stripes per Object rule is set to 1?

This can occur for a few reasons. The maximum component size is 255GB. If an object such as a virtual disk (VMDK) is larger than 255GB, vSAN will stripe this object across multiple components. As an example, consider a virtual disk that is 600GB. vSAN will stripe this object across three components that are approximately 200GB each in size. In this case, the components might reside on the same drive, on separate drives in the same disk group, or across multiple drives in separate disk groups and/or hosts (unlike the Number of Disk Stripes per Object rule where striped components are always striped across separate drives). Another reason vSAN might stripe an object across multiple components is to improve the balance of drive utilization across a cluster. Splitting large components into smaller components enables more flexibility in placing these components across drives with higher capacity utilization. The figure below shows a basic example of this.

 

What is the recommended way to test vSAN performance?

VMware provides a tool called HCIBench. It is essentially an automation wrapper around popular and proven synthetic test utilities. With HCIBench, you can either invoke Vdbench or Flexible I/O tester (FIO) to automate performance assessment in an HCI cluster.

HCIBench simplifies and accelerates proof-of-concept (POC) performance testing in a consistent and controlled manner. The tool fully automates the process of deploying test VMs, coordinating workload runs, aggregating test results, and collecting data for troubleshooting purposes. The output from HCIBench can be analyzed by the Performance Diagnostics feature in vSAN 6.6.1 and newer versions of vSAN. See this VMware Knowledge Base article for more information:  vSAN Performance Diagnostics (2148770) 

HCIBench can be used to evaluate the performance of vSAN and other HCI storage solutions in a vSphere environment.

Recommendation: Use HCIBench to run performance tests versus running a workload from a single VM. HCIBench can be configured to deploy and distribute multiple VMs across the hosts in an HCI cluster to provide more realistic and accurate test results.

How does vSAN minimize the impact of data resync operations when a drive or host fails?

vSAN 6.7 includes enhancements to the method used to dynamically balance and prioritize virtual machine and resync IO. It is important to maintain adequate performance while providing resources for resync operations to restore resilience. When there is contention for IO, vSAN guarantees approximately 20% of the bandwidth to resync operations while virtual machines utilize the remaining 80%. If there is no contention for bandwidth, resync operations can consume more bandwidth to reduce resync times. Virtual machines can use 100% of the bandwidth when there are no resync operations occurring.

Does vSAN require manual intervention to balance data across the hosts, disk groups or the cluster?

No, In vSAN 6.7 U3, users can configure proactive rebalancing as an automated action at the cluster level to let the cluster balance the data out as it sees fit. When toggled on, this will replace the previous behavior of proactive rebalancing, which included tripping an alert in the health service, followed by the need for manual intervention by the administrator to perform the rebalance.

 

Operations

What is the primary user interface (UI) utilized to configure and monitor vSAN?

A new HTML5 based vSphere Client introduced with vSAN 6.7 is used to perform nearly all configuration and monitoring tasks. This includes the creation and assignment of vSAN storage policies. vSAN versions prior to 6.7 require the Flex-based vSphere Web Client. A single, common UI to administer vSphere and vSAN makes it easy for organizations to get to a "one team, one tool" model for managing virtualized infrastructure.

 

Is there a common dashboard to view configuration, inventory, capacity information, performance data, and issues in the environment?

VMware vSphere 6.7 and vSAN 6.7 introduce dashboards available directly in the HTML5 vSphere Client that make it easy for administrators to view key metrics at a glance. This enhancement is powered by VMware vRealize Operations. If more information is needed, a click opens the full-featured vRealize Operations user interface. A vRealize Operations license is required for full functionality. A subset of dashboards is included with a vSAN license.

How do I add storage capacity to a vSAN cluster?

Cluster Quickstart makes it easy to add compute and storage capacity to a vSAN cluster. Storage capacity can be increased in a few ways:

Hosts containing local storage devices can be added to a vSAN cluster. Disk groups must be configured for the new hosts after the hosts are added to the cluster. The additional capacity is available for use after configuration of the disk groups. This scale-out approach is most common and also adds compute capacity to the cluster. More storage devices can be added to existing hosts assuming there is room in the server’s chassis to add these devices. After the storage devices are added, additional disk groups can be created or existing disk groups reconfigured to use the new devices. This is considered a scale-up approach.

Unlike traditional storage solutions, vSAN enables a “just-in-time” provisioning model. Storage and compute capacity can be quickly provisioned as needed.

  • Existing storage devices can be replaced with new, higher-capacity devices. Data should be evacuated from the existing storage devices before replacing them. The evacuation of data is performed using vSphere maintenance mode. This is also considered a scale-up approach.

 

How do I monitor the health of a vSAN cluster?

vSAN features a comprehensive health service appropriately called vSAN Health that actively tests and monitors many items such as hardware compatibility, verification of storage device controllers, controller queue depth, and environmental checks for all-flash and hybrid vSAN configurations. vSAN Health examines network connectivity and throughput, disk and cluster health, and capacity consumption. Proactive monitoring and alerting in vSAN Health helps ensure the environment is optimally configured and functioning properly for the highest levels of performance and availability.

Customers enabling the  Customer Experience Improvement Program (CEIP)  feature with vSAN 6.6 receive additional benefits through online health checks. These checks will be dynamically updated from VMware’s online system as new issues are identified, knowledgebase articles are created, and new best practices are discovered.

Is it possible to manage vSAN using a command-line interface (CLI)?

Yes, vSAN can be monitored and managed with PowerCLI and ESXCLI. There are also SDKs for popular programming languages that can be used with vSAN. Details and sample code can be found on  VMware {code} 

What vSphere maintenance mode option should I use?

When a host that is part of a vSAN cluster is put into maintenance mode, the administrator is given three options concerning the data (vSAN components) on the local storage devices of that host. The option selected has a bearing on a couple of factors: The level of availability maintained for the objects with components on the host and the amount of time it will take for the host to enter maintenance mode. The options are:

  • Ensure accessibility (default)
  • Full data migration
  • No data migration

Details on how data is handled are provided in the vSAN documentation. In summary, the default option, “Ensure accessibility," is used when the host will be offline for a shorter period of time. For example, during maintenance such as a firmware upgrade or adding memory to a host. “Full data migration” is typically appropriate for longer periods (hours or days) of planned downtime or the host is being permanently removed from the cluster. "No data migration" commonly allows the host to enter maintenance mode in the shortest amount of time. However, any objects with “Primary Level of Failures to Tolerate” (PFTT) set to zero with components on the host going into maintenance mode are inaccessible until the host is back online.

How would I know what Virtual Machines or objects would be impacted when a host enters maintenance mode?

Before moving a host into maintenance mode, an administrator can use the Data Migration Pre-Check feature to assess the impact of the maintenance mode option. With 6.7 U3 significant updates were made to this feature to provide a detailed cluster-wide analysis of the data movement triggered by maintenance mode.  

As a result, you simulate a maintenance mode action and ascertain the impact on Object Compliance and Accessibility, Cluster capacity, and Predicted Health

Does vSAN include an option to upload information about my environment to help improve the support experience when an issue occurs?

Yes, vSAN 6.6 and newer versions enable customers to upload anonymous information about their environments to VMware, which provides several benefits including:

Time spent on gathering data is reduced when a support request (SR) is opened with VMware Global Support Services (GSS). A GSS Technical Support Engineer (TSE) can utilize vSAN Support Insight <link> to view current and historic data about a customer’s environment and start troubleshooting efforts sooner, which leads to faster resolution times. vSAN online health checks identify issues specific to a customer’s environment and suggest resolutions. These online health checks can also make recommendation changes that adhere to VMware “best practices.”

Note: This requires participation in the  VMware Customer Experience Improvement Program (CEIP) 

  • Options for proactive support calls depending on the tier of Support and Subscription (SnS) purchased.

  • VMware receives anonymous data from a large number of environments that can be utilized to identify trends, potential software bugs, and better understand how products are used. Bug fixes can potentially be developed faster, and improvements are implemented to provide a better overall customer experience.

 

Is it possible to migrate a vSAN hybrid configuration to an all-flash configuration?

Yes, a vSAN hybrid configuration can be migrated to an all-flash configuration. See the Administering VMware vSAN documentation.

Can a standard vSAN cluster be converted to a vSAN stretched cluster?

Yes, it is easy to convert a standard (non-stretched) vSAN cluster to a stretched cluster. This is performed in the “Fault Domains & Stretched Cluster” section of the vSAN UI. More details can be found in the vSAN Stretched Cluster Guide.

How do I upgrade vCenter Server for a cluster where vSAN is enabled?

Upgrading vCenter Server in a vSAN environment is similar to any other vSphere environment. Follow the process discussed in the vSphere Upgrade Guide, which can be found in VMware Documentation.

 Recommendation: Read the vSphere Upgrade Guide and product release notes prior to performing an upgrade.

What is the "vCenter state is authoritative" health check?

This check verifies that all hosts in the vSAN cluster are using a vCenter Server as the source of truth for the cluster configuration. This includes the vSAN cluster membership list. During normal operation, vCenter Server publishes the latest host membership list and updates the configuration for all hosts in the cluster. This health check reports an error if the vCenter Server configuration is not synchronized with a member host, and is no longer accepted as the source of truth. See this VMware Knowledge Base article for more information:  vSAN Health Service - Cluster health – vCenter state is authoritative (2150916) 

Does data need to be migrated or “rehydrated” when adding a disk to a disk group with deduplication and compression enabled?

vSAN does not migrate or “rehydrate” deduplicated and compressed data when a disk is added to a disk group. The configuration change takes only a few mouse clicks in the vSAN UI and the additional capacity is available immediately for use. This makes it very easy to incremental add capacity to a vSAN cluster with no disruption.

Can vSAN work with VMware vSphere Update Manager(VUM)?

Yes, administrators can now use vSphere Update Manager(VUM) to manage vSAN lifecycle. VUM generates automated build recommendations for vSAN clusters that align with the VMware compatibility guide for the specific hardware. With 6.7 U3 onwards, administrators can customize vSAN cluster baseline recommendations to update/patch the current version, upgrade to the latest version, or not show any recommendations.

Information in the VMware Compatibility Guide and vSAN Release Catalog is combined with information about the currently installed ESXi release. The vSAN Release Catalog maintains information about available releases, preference order for releases, and critical patches needed for each release. When a new, compatible update becomes available, a notification is proactively displayed in vSAN Health. This eliminates the manual effort of researching and correlating information from various sources to determine the best release for an environment.

Administrators can use the Remediate option in vSphere Update Manager to perform a rolling upgrade of the cluster. vSphere Update Manager migrates virtual machines from the host being upgraded to other hosts in the cluster with no downtime.

What is vSphere Lifecycle Manager(vLCM)?

vLCM is the next-generation solution for core vSphere/HCI lifecycle operations. In vSphere 7.0, vLCM defines lifecycle operations as software and firmware patch/upgrades. With vSphere 7 0, vLCM provides a powerful new framework based on a desired state model to deliver simple, reliable, and consistent lifecycle operations for vSphere/HCI clusters.

vLCM is available in vSphere 7.0. vLCM provides net new value and was designed to address many of the customer pain points and architectural inconsistencies with the current lifecycle solutions.

vSphere Lifecycle Manager(vLCM) is an integrated utility that can be used to manage the server lifecycle, including ESXi host, driver, and firmware version. vLCM is based on a desired-state state-model that continuously monitors the state for any deviations. It additionally enables a remediation workflow when there is a drift.  This helps provide a consistent and simplified methodology to maintain the lifecycle of a server stack that enables HCI. 

The following blog article has more detailed information about vLCM, Top new features-vSAN-7-vLCM

This video demonstrates the functionality, vLCM Demo

Does vLCM work only with vSAN ReadyNode or can it be used with Build Work Own(BYO) hardware?

vLCM can also work with BYO based Dell 14th Gen and HP Gen 10 hardware as per VMware Compatibility matrices(vCG) and is not restricted to Readynodes.

What’s the difference between vLCM and VUM?

VMware’s long-standing solution for upgrade and patch management of the hypervisor is called “vSphere Update Manager” or VUM. VUM is a good way for an administrator to manage the upgrade process of the hypervisor with the various hosts living in the data center. Its focus is primarily on upgrade and patch management of the hypervisor residing on the host. While it performs well, it was designed at a time in which component drivers and firmware wasn’t as mature as it is today, and has limited ability to connect to the ecosystems of vendors responsible for firmware and some drivers. Because of this, VUM never properly emphasized this part of the lifecycle management of the host – an important aspect of proper maintenance in today’s environment.  As a result administrators have to manage server firmware (i.e. BIOS, storage IO controllers, disk devices, NICs, and remote management tools) separately. While VUM was a step in the right direction, it lacked full server-stack management.

Management Tasks

vSphere Update Manager

vSphere Lifecycle Manager

Upgrade and patch ESXi hosts

Yes

Yes

Install and update third party software on hosts

Yes

Yes

Upgrade virtual machine hardware and VMware Tools

Yes

Yes

Update firmware of all ESXi hosts in a cluster

No

Yes

One desired image to manage entire cluster

No

Yes

Check hardware compatibility of hosts against vSAN Hardware Compatibility List

Yes

Yes

Checks for drift detection

No

Yes

How can I install or use vLCM?

vSphere Lifecycle Manager is a service that runs in vCenter Server. Upon deploying the vCenter Server appliance, the vSphere Lifecycle Manager user interface becomes automatically enabled in the HTML5-based vSphere Client. To enable full server-stack firmware management, 3rd party vCenter plugins from the respective hardware vendors would need to be procured and installed.

What qualifies as the “full server-stack” firmware?

vLCM is able to manage the full server-stack of firmware for vendor supplied and supported hardware components. Typically the list of hardware components includes BIOS, storage IO controllers, disk devices, NICs, BMC (iLO/iDRAC), etc. The list of supported components varies by vendor but should be the same as their products can manage today without vLCM. Only hardware components that are vendor supplied will be supported, so “off the shelf” components such as GPUs or consumer-grade devices will not be supported.

Does vLCM require homogeneous servers to automate lifecycle management?

A cluster managed by vLCM must contain servers from the same vendor. Keep in mind the entire cluster is managed by one image, that is validated against all servers in the cluster. As an example you might have a mix of server models in a cluster that are both “vLCM Capable ReadyNodes”. If the drivers and firmware for both models are in the same firmware package or repository vLCM can manage them with one image. If not, the servers should be moved to a separate cluster.  \

What is drift detection?

vLCM can be used to apply a desired image at the cluster level, monitor compliance, and remediate the cluster if there is a drift. This eliminates the need to monitor compliance for individual components and helps maintain a consistent state for the entire cluster in adherence to the VCG. Drift can occur and software, firmware and drivers that is either older or newer than the version specified on the vLCM desired image.

Is vLCM related or the same thing as vRLCM?

No. vRLCM is the lifecycle manager for vRealize products and is independent of vLCM. vLCM is focused on the core vSphere/HCI platform and vRLCM is focused on the vRealize layer of the SDDC. vRLCM can be used independently of vLCM.

Who are the supported server vendors and platforms for firmware management with vLCM in vSphere?

For vSphere 7.0, the first two vendors supporting vLCM are Dell and HPE. VMware is working with several other large server vendors to provide vLCM support. Be sure to check the vSAN Compatibility Guide to see a full list of vLCM Capable ReadyNodes.   

What server software is required?

vSphere Lifecycle Manager is a service that runs in vCenter Server and uses the embedded vCenter Server PostgreSQL database. No additional installation is required to start using vLCM to manage ESXi and Vendor Addons. In order to take advantage of the full server-stack firmware management capabilities of vLCM, a server vendor plugin called a Hardware Support Manager (HSM) must be installed. For example the Dell HSM is called OpenManage Integration for VMware vSphere (OMIVV) and for HPE the HSM is called the iLo Amplifier Pack.  

Does vLCM check my servers for hardware compatibility?

When you initiate a hardware compatibility check for a cluster, vSphere Lifecycle Manager verifies that the components in the image are compatible with all storage controllers on the hosts in the cluster as per the vSAN VCG. vLCM scans the image and checks whether the physical I/O device controllers are compatible with the ESXi version specified in the image.

Does vLCM automatically update my servers?

No. From the updates tab at the cluster level, click on Check Compliance to initiate a validation of the desired image. vLCM will report any discrepancy or drift compared to the image. In addition, when any change is made to a desired image, vLCM will automatically validate the image and check compliance. When a host(s) is out of compliance you can either right click a single host and click remediate or click Remediate All.

Does vLCM work with VMware Cloud Foundation (VCF)?

Yes. VMware Cloud Foundation 4 supports vLCM with the following considerations:

  • Workload domains are supported with vLCM, but management domains must still use vSphere Update Manager (VUM).

  • When provisioning new workload domains you choose either Kubernetes or vLCM. vLCM cannot manage a Kubernetes workload domain in VCF 4.

  • NSX-T and vLCM are supported for workload domains. NSX-T with vLCM is not supported outside of VCF. 

What is a “desired state model” and what are the benefits?

vLCM is based on a desired state or declarative model (similar to Kubernetes) that allows the user to define a desired image (ESXi version, drivers, firmware) and apply to an entire vSphere cluster. Once defined and applied, all hosts in the cluster will be imaged with the desired state. Managing at the cluster level is a superior model compared to individual hosts as it provides consistency and simplicity.

A vLCM desired state image consists of the following elements:

  • ESXI Version (Required)

  • Vendor Addon (optional)

  • Firmware and Drivers Addon (optional)

Are “air-gap” or offline vCenter environments supported with vLCM?

Yes. ESXi versions and Vendor Addons are stored in Update Manager’s internal depot (either downloaded directly from VMware for on-line systems or manually uploaded by admins for air-gapped ("dark site") systems.

Is NSX-T supported?

NSX-T with vLCM is supported in VMware Cloud Foundation for workload domains. In vSphere 7.0, NSX-T with vLCM is not supported outside of VCF. 

 

Space Efficiency

Does vSAN support TRIM/UNMAP?

Guest Operating systems use commands known as TRIM/UNMAP for the respective ATA and SCSI protocols, to reclaim space that is no longer in use. This helps the guest operating systems be more efficient with storage space usage. vSAN 6.7 U1 and higher versions have full awareness of TRIM/UNMAP commands sent from the guest OS and can reclaim the previously allocated storage as free space. This is an opportunistic space efficiency feature that can deliver better storage capacity utilization in vSAN environments.

Can I enable data services such as Deduplication and Compression on an existing vSAN cluster with data?

Yes, data services can be enabled on a newly provisioned cluster or an existing cluster with data. It is important to note that enabling data services requires an on-disk format change on all the disk groups. Hence in the case of existing clusters, vSAN automatically evacuates data from disk groups and performs a rolling disk format change before enabling a data service.

 Recommendation: It is recommended to enable data services at the time of provisioning to avoid unnecessary data migration. If you need to enable it on an existing cluster, ensure that there is sufficient slack space to accommodate data evacuation.

What does “Allow Reduced Redundancy” do when enabling or disabling deduplication and compression or encryption?

When deduplication and compression or encryption are enabled or disabled, vSAN performs a rolling reformat of each disk group. All data on a disk group must be evacuated to other disk groups in the cluster before the reformat process can proceed.

By default, vSAN will ensure data is compliant with storage policies during the operation. If there is not enough free capacity in other disk groups, the operation will fail. Clicking “Allow Reduced Redundancy” when enabling deduplication and compression or encryption allows vSAN to reduce the number of copies of data temporarily, if needed, to complete the requested operation.

“Allow Reduced Redundancy” is more commonly required in small vSAN clusters such as three or four hosts with one disk group each. This option might also be required if the free capacity in the cluster is low.

How do deduplication and compression affect performance?

As with any storage solution that offers space efficiency features, there can be a slight trade-off in performance and/or additional resource utilization. The latest flash device and CPU technologies mitigate the vast majority of perceivable performance impacts when deduplication and compression are enabled. vSAN 6.7 U3 introduced performance enhancements that can specifically benefit deduplication and compression enabled clusters. These changes offer more consistent and predictable performance, especially with sequential writes. However, latency-sensitive applications should be tested prior to production deployment on any storage platform including vSAN where space efficiency features such as deduplication, compression, and erasure coding are enabled.

What is Compression only and how is it different from Deduplication and compression feature?

Compression only is a new feature introduced with vSAN 7 Update 1. This is a cluster-wide feature that can be enabled similar to Deduplication and compression. The Compression only feature enables customer to take advantage of space efficiency techniques for workloads that may not necessarily benefit from deduplication techniques. Furthermore, provides customers with increased granularity to strike the right balance between performance and space efficiency.

What is the impact of a failed drive when Compression only is enabled?

Compression only reduces the failure domain to individual capacity drives. If a capacity drive fails in a disk group, only the specific drive is impacted, the disk group continues to be operational with the remaining healthy capacity drives.

 

HCI Mesh

What is HCI Mesh

HCI Mesh is a new feature introduced with vSAN 7 Update 1 that uses a unique software-based approach for disaggregation of compute and storage resources. HCI Mesh brings together multiple independent vSAN clusters for a native, cross-cluster architecture that disaggregates resources and enables utilization of stranded capacity. The basic functional premise is allowing one or more vSAN clusters (clients) to remotely mount Datastores from other vSAN clusters (servers) within vCenter inventory. This approach maintains the essence and simplicity of HCI while greatly improving agility.

What key capabilities does HCI Mesh add to a vSAN backed HCI

HCI Mesh allows storage consumption across vSAN clusters. This helps use stranded capacity between clusters. It also allows administrators and architects the ability to scale compute and storage independently, easing design and operational complexities.

How is HCI Mesh different from composable/modular infrastructures?

HCI Mesh uses a software-based approach to disaggregation that can be implemented on any certified hardware. Composable infrastructure is based-off a hardware-centric approach that requires specialized hardware.

Which protocol and data path does HCI Mesh use?

HCI Mesh utilizes vSAN's native protocol and data path for cross-cluster connections, which preserves the vSAN management experience and provides the most efficient and optimized network transport. For example, storage policy-based management (SPBM) and the vSAN management interfaces are available on the client cluster, where the remote vSAN datastore essentially resembles a local vSAN datastore.

Can remote vSAN datastores be mounted on compute-only clusters or vSAN clusters with no local storage?

All clusters participating in a mesh topology must be vSAN-enabled and have a local vsan datastore.

Are there any scalability limitations with HCI Mesh?

A client cluster can mount up to a maximum of 5 remote vSAN datastores, and a server cluster can export it's datastore up to a maximum of 5 client clusters.

Can VMs be provisioned to span across multiple remote vSAN datastores?

All VMDKs for a VM must reside on a single datastore which can be either local or remote.

Does HCI Mesh work with other vSAN features?

HCI Mesh supports all vSAN features except Data-in-Transit encryption and Cloud Native Storage features. HCI Mesh is not supported in Stretched cluster and 2-Node deployment. Additionally, HCI Mesh will not support remote provisioning of File Services shares and iSCSI volumes, these services can be provisioned locally on clusters participating in a mesh topology, but not provisioned on a remote vSAN datastore.

What are the network recommendations to implement HCI Mesh

Network requirements and best practices are very similar to vSAN Fault Domain configurations where data traffic travels east-west across multiple racks in the data center. In general, low-latency and high bandwidth network topologies are recommended for optimal performance. Sub-millisecond latency between two clusters is recommended to provide the most optimal workload performance, however higher latencies will be supported for workloads that are not latency sensitive. L2 or L3 topologies are supported, similar to other vSAN configurations.

General best practices and recommendations:

  • Design for redundancy everywhere for highest availability (multiple links, NICs, TOR/spine switches, etc.)
  • Use 25Gbps NICs or higher and storage-class switches
  • Use NIOC with vSphere Distributed Switches
  • LACP offers benefits but is operationally complex; LBT or Active/Passive failover is a simpler alternative

Note: Multiple vSAN VMkernel interfaces i.e. Airgapped network topology is not supported

Are there any availability considerations with HCI Mesh?

HCI Mesh uses existing vSphere HA and vSAN availability concepts to provide both compute and storage high availability. vSphere HA will provide compute availability on the client cluster, and vSAN storage policies with FTT=N configured will provide storage availability on the server cluster. If the network connection between a client and server cluster is severed, the remote vSAN datastore on a client host will enter APD 60 seconds after the host becomes isolated from the server cluster. After that, vSphere will follow the current HA mechanisms for APD events and attempt to restart VMs after a 180 second delay.

Are there any scalability limitations with HCI Mesh?

A client cluster can mount up to a maximum of 5 remote vSAN datastores, and a server cluster can export it's datastore up to a maximum of 5 client clusters.

Is cross-cluster vMotion (without storage vMotion) supported with HCI Mesh?

 

Yes - Virtual Machines can be migrated between two clusters sharing a single Datastore and is fully supported with HCI Mesh.

 

File Service

How does file service work with vSAN?

vSAN File Services is powered and managed by the vSphere platform that deploys a set of containers on each of the hosts. These containers act as the primary delivery vehicle to provision file services and are tightly integrated with the hypervisor.

Can I run Virtual Machines on top of a vSAN File service NFS share?

No, it is not supported to mount NFS to ESXi for the purpose of running virtual machines. The NFS shares may be used to mount NFS directly to virtual machines running on the vSAN cluster, but may not be used to store VMDKs for running virtual machines. vSAN NFS shares may be consumed by other VMware products such as VMware Cloud Director's NFS transfer file share.

What is the minimum number of hosts to configure file service?

A minimum of 3 hosts is required to configure file services. It will run with as many as 2 remaining hosts. vSAN File Services will auto-scale 1 container per host up to 8 per cluster.

Is vSAN File Service supported on a stretched cluster or 2 node cluster?

With vSAN 7 and vSAN 7 Update 1, vSAN File Service is not supported on stretched clusters and 2 node deployments.

What is the estimated resource overhead per host?

Each instance is configured with 4GB of RAM and 4vCPU. By default, there are no reservations applied to the resource pool associated with the entities required for vSAN File Service.

How is vSAN File Service monitored?

vSAN File Service can be monitored with vSAN Skyline Health Services. A new health check called "The File Service - Infrastructure health" monitors several parameters and includes an automated remediation option. 

Does vSAN require a specific vSphere licensing edition or feature?

No, however, if the DRS feature is available, it will create a resource group. If the DRS license feature is missing, it will not create a resource group.

What protocols and authentication methods are supported?

NFSv3, NFSv4.1, SMB v2.1 and SMBv3 are supported. Both NFS and SMB file shares are now able to use Kerberos based authentication when using Microsoft Active Directory.

 

Do I need to migrate or manage file services?

No. vSAN automatically manages the File Server VM. The containers are automatically shutdown and removed. It will be recreated once an available host is no longer in maintenance mode.

Do I need to create or add vmdks or objects to expand vSAN File Service storage?

vSAN File service uses elastic scalability and will create additional components as needed without any manual intervention.

How can I limit the consumption of vSAN file shares?

Soft and hard share quotas can help manage capacity consumption.

  • Hard quotas prevent users from writing data to disk. Hard quotas automatically limit the user’s disk space, and no users are granted exceptions. Once users are about to reach their quota, they must request help. 
  • Soft quotas send alerts when users are about to exceed disk space. Unlike hard quotas, there is no physical restriction to prevent users from saving their data. However, you do get alerts and can create a corporate policy to help manage data. 
Can I provision file shares to Cloud-Native workloads?

Yes, vSAN File Services can be used to provision file shares to container workloads as well as traditional workloads.

How do NFS Shares recover from host failure or migrate during upgrades?

vMotion and vSphere HA are not used as part of migration, or failure recovery. Services within vSphere monitor for failure or maintenance activities and drive the relocation of services.

The containers powering vSAN file services, automatically restart on other hosts, independent of vSphere HA. 

While by default you will have up to 1 container per host, additional containers will run on a host in a case where a host (or hosts)  have failed. When a host enters maintenance mode the container powering a given share or group of shares is recovered on a different host.

How is vSAN File Services Deployed?

vSAN File Services is deployed from the Cluster --> Configure --> vSAN Services menu. 

 

How are file shares created?

 

 

How is the vSAN File Service updated?

An updated OVF can be automatically downloaded or manually updated to the vCenter managing the cluster. A non-disruptive rolling upgrade will proceed across the cluster replacing the old containers with the new version.

 

Cloud-Native Storage

What is Cloud-Native Storage?

Cloud-Native Storage(CNS) is a term used to describe the storage that can be provisioned to Cloud-Native Applications (CNAs). These CNAs are typically containerized, deployed and managed by a Container Orchestrator like Kubernetes, Mesos, Docker Swarm, etc. The storage consumed by such apps could be ephemeral or persistent, but in most cases, it is required to be persistent.

What is a Container Storage Interface (CSI)?

Container Storage Interface (CSI) is a standardized API developed for container orchestration platforms to interface with storage plugins. This API framework enables vSAN & vVOLS to be able to provision persistent volumes to Kubernetes based containers running on vSphere.

Can vSAN datastore be used to provision persistent storage for a Kubernetes cluster?

Yes, With the release of vSAN 6.7 U3, vSAN supports provisioning persistent volumes to Kubernetes based workloads. A brief walkthrough is available here - Cloud-Native Storage on vSAN. vSAN 7 extends this capability to also support file-based persistent volumes.

How can Kubernetes administrators provision appropriate storage intended for respective containers on vSAN?

Kubernetes administrators can simply associate "storage classes" of the respective containers to Storage Policies. vSAN uses the standard Storage Based Policies Management(SPBM) to provision persistent volumes on the vSAN datastore.

What is vSAN Data Persistence platform (DPp)?

vSAN 7 Update 1 introduces a new framework to simplify the deployment and operation of modern applications relying on cloud native architectures known as vSAN Data Persistence platform (DPp). This allows modern databases and stateful services to plug into the framework through APIs to maintain an efficient persistence of these distributed applications and data.

What is VMware vSAN Direct Configuration?

vSAN Direct Configuration is a new feature introduced with vSAN 7 Update 1 that provides optimized data storage for shared nothing applications (SNA) like Cassandra and MongoDB that do not need data services. The applications are administered through the same vCenter management interface, but use an optimized data path to access designated storage devices on the hosts in the vSAN cluster, but are not part of the vSAN datastore.

Where would I download or procure applications that can benefit from vSAN Data Persistence Platform(DPp)

vSAN Data Persistence Platform(DPp) provides a framework for software technology partners to integrate with VMware Infrastructure. The specific partners develop and provide the relevant software.

What are the licenses required to use features such Encryption, File Service or vSAN Data Persistence(DPp)?

vSAN Licensing Guide is updated with the various feature offerings and their licensing requirements.

 

Security

Is encryption supported with vSAN?

Yes, vSAN supports Data-At-Rest Encryption and uses an AES 256 cipher. Data is encrypted after all other processing, such as deduplication, is performed. Data at rest encryption protects data on storage devices, in case a device is removed from the cluster.

With vSAN 7 Update 1, vSAN introduces Data-in-Transit Encryption capability. This feature compliments Data-at-rest encryption and securely encrypts all vSAN traffic that traverses across hosts.

Does vSAN Encryption require specialized hardware?

No, vSAN Encryption does not require any specialized hardware such as Self-encrypting drives (SEDs). Some drives on the vSAN Compatibility Guide may have SED capabilities, but the use of those SED capabilities are not supported.

What are the prerequisites to enable vSAN Data-at-rest Encryption?

A Key Management Server (KMS) is required to enable and use vSAN encryption. Nearly all KMIP-compliant KMS vendors are compatible, with specific testing completed for vendors such as HyTrust®, Gemalto®, Thales e-Security®, CloudLink®, and Vormetric®.

Turning on encryption is a simple matter of clicking a checkbox. Encryption can be enabled when vSAN is enabled or after, with or without virtual machines (VMs) residing on the datastore.

Recommendation: Do not run the VMs that comprise a KMS cluster on the encrypted vSAN datastore.

Should I deploy a Key Management Service (KMS) server on the vSAN datastore that will use the same KMS for key management?

No, this is not recommended. When a vSAN host with encryption enabled is restarted, it requests a new Host Key and Key Encryption Key (KEK) from the KMS. If the KMS is not online to provide these keys, the host will not be able to read the encrypted data. This creates a circular dependency resulting in no access to encrypted data. 

See this blog article for more information: Understanding vSAN Encryption – KMS Server Accessibility.

What is Data-in-transit encryption?

Data-in-transit encryption is a new cluster-wide feature introduced with vSAN 7 Update 1. Administrators can choose to enable it independently or along with Data-at-rest encryption.

Data-in-transit securely encrypts vSAN data traffic that traverses across hosts using FIPS 140-2 Cryptographic modules.

Does Data-in-transit encryption require a Key Management Server(KMS)

No, Data-in-transit encryption does not require a Key Management Server((KMS).

 

What happens when a vCenter managing a vSAN datastore with encryption enabled is permanently offline?

There is no impact to the virtual machines running on the vSAN datastore with encryption enabled. After vSAN Encryption is configured, vSAN hosts communicate directly with the Key Management Server (KMS) cluster. If the original vCenter Server cannot be recovered, a new vCenter Server should be deployed as soon as possible.

See this blog article for more information: What happens to vSAN when vCenter Server is offline? 

This blog article provides more details on how communication occurs between vSAN hosts, vCenter Server, and the KMS cluster: Understanding vSAN Encryption – Booting when vCenter is Unavailable

Also, see this VMware Knowledge Base article: Moving a vSAN cluster from one vCenter Server to another (2151610).

What is the impact to the VMs running on a vSAN datastore with encryption enabled, if Key Management Server (KMS) goes offline?

Key Encryption Key(KEK) is cached on the ESXi hosts' memory on booting. Hence there is no impact to the virtual machines till the hosts remain powered-on. If the hosts are restarted, the encrypted disk groups are unmounted and cannot be mounted until the KMS is restored.

How does vSAN Encryption differ from VM Encryption?

vSAN encryption operates at the storage level and encrypts the vSAN datastore.  

VM Encryption operates on a per-VM basis and performs encryption on in-flight I/O i.e., encrypts IO as it is generated from the VM. 

Note: VMs encrypted with vSphere VM encryption can be deployed to a vSAN datastore just like other datastore types such as VMFS and NFS. However, vSAN space efficiency features such as deduplication and compression will provide little benefit with these encrypted VMs.

Do items such as backup and recovery, vSphere Replication, and so on work with vSAN encryption?

Yes, as vSAN encryption was designed to maintain compatibility with other vSAN and vSphere features, as well as, 3rd-party products including data protection solutions. Data is encrypted or decrypted just above the physical storage device layer. APIs such as vSphere Storage APIs for Data Protection (VADP) and vSphere  APIs for IO Filtering (VAIO) that are used for data protection and other solutions are located higher in the storage stack. Data at this layer is not yet encrypted. Therefore, compatibility with these solutions is maintained when vSAN encryption is enabled.

Is two-factor authentication supported with vSAN?

2-factor authentication methods, such as RSA SecurID® and Common Access Card (CAC), are supported with vSAN, vSphere, and vCenter Server.

Is vSAN part of a DISA STIG?

Yes, VMware vSAN is part of the VMware vSphere STIG Framework. The DISA STIG defines secure installation requirements for deploying vSAN on DoD networks. VMware worked closely with DISA to include vSAN in the existing vSphere STIG. See the  Information Assurance Support Environment web site for details.

Has vSAN achieved FIPS 140-2 certification?

On December 05, 2017, the VMware VMkernel Cryptographic Module achieved FIPS 140-2 validation under the National Institute of Standards and Technology (NIST) Cryptographic Module Validation Program (CMVP).

vSAN also consumes the VMware VMkernel Cryptographic Module when providing data-at-rest encryption thanks to the tight integration between vSAN and the ESXi kernel. When vSAN Encryption is enabled, the vSAN datastore is encrypted with FIPS Approved AES-256 utilizing the validated VMware VMkernel Cryptographic Module. This delivers FIPS compliance without the need for costly self-encrypting drives (SEDs).

How can drives used in a vSAN cluster be safely decommissioned, removing any residual data?

vSAN 7 Update 1 introduces tools to securely wipe storage flash devices decommissioned from a vSAN cluster. This is done through a set of PowerCLI commands (or API), providing an efficient and secure methodology to erase data in accordance with NIST standards.

 

Miscellaneous

Where can I find vSAN blog articles?

Virtual Blocks contains a wealth of information on HCI, vSAN, disaster recovery, Site Recovery Manager, external storage, vSphere Virtual Volumes, and other topics related to storage and availability. The blog articles cover technical features and benefits, use cases, how-to guidance, recommendations, and more to help you get the best possible experience from these solutions.

What are the licenses required to use various features of vSAN?

vSAN license editions include Standard, Advanced, Enterprise, and EnterprisePlus. Based on the feature required the relevant licenses would need to be obtained. The following guide provides detailed insight of the topologies, features and their associated license edition - vSAN Licensing Guide

 

Filter Tags

  • Overview
  • Document
  • vSAN 2 Node
  • vSAN Compression
  • vSAN Deduplication
  • vSAN Encryption
  • vSAN File Services
  • vSAN Resilience
  • vSAN Sizer
  • vSAN Stretched Cluster