Latency Round Robin Path Selection Policy

Latency PSP Deep Dive

Problem

 With the traditional path selection policies (PSP), there’s no logic or intelligence in selecting the optimal path(s). Many of our storage partners use the Round Robin PSP as it usually provides the best performance and failover. With the default RR PSP being 1000, the next path isn’t used until 1000 IOs have been sent down the current path. Some storage vendors recommend changing this value to 1 which then directs every other IO to the next path. This can help with performance and fail-over but still does not have any logic or intelligence. Consequently, if a path has higher latency or possibly HW issues, that path will still be issued IO. Here in lies the problem.

 

 

Solution

 In vSphere 6.7 U1, we introduced a new sub-policy for the Round Robin (RR) Path Selection Policy (PSP) VMW_PSP_RR which actively monitors paths for latency and Outstanding IO (OIO). The policy considers path latency and OIO on each active and available path. This is accomplished via an algorithm that monitors the paths and calculates the average latency per path based on time and the number of OIOs. When latency policy is used, the logic will monitor 16 IOs per path to calculate a working average. This working average results in a value for each path. Once a sample period has been completed, the remaining IOs will be directed based on the algorithm’s calculations for the path(s) with the least latency. When using the latency mechanism, the Round Robin policy will dynamically select the optimal path(s) subsequently, achieving better load-balancing as well as performance.

 

 

Details

 When the latency policy is enabled, the algorithm looks at all available paths sampling OIO and latency for each. In the diagram below you can see path 2 has 2 OIO and a latency of 5ms and path 1 has 5 OIO and a latency of 1ms. Using the calculation P(avgLatency) = (Completiont - Issuet) / P(Sampling IO count), the result is path 1 is chosen to send IO for that sample cycle.

Latency PSP diagram

 

Diving into the logic a little more, you can see in the diagram below how it works.

Legend:

  • T = Interval after sampling should start again
  • m = Sampling IOs per path
  • t1 < t2 < t3  ---------------> 10ms < 20ms < 30ms
  • t1/m < t2/m < t3/m -----> 10/16 < 20/16 < 30/16

 

 With the testing, we found that the latency policy, even with latency introduced up to 100ms on half the paths, maintained almost full throughput.

 

 

Testing Results

Random 100% reads, 32 OIO, 2 Workers, Delay introduced on 2 paths of 4.

100% read IOPS100% read Latency100% read CPU cost

 

Random 70% read / 30% write, 32 OIO, 2 workers, Delay introduced on 2 path of 4.

70/40 read/write IOPS70/40 read/write Latency70/30 read/write CPU cost

 

 Reviewing the graphs, you can see the throughput, latency, and CPU utilization all stay more constant throughout the range. The Latency RR PSP tends to provide much more consistent and higher performance over traditional PSPs.

 

 

Configuring

To check whether the VMW_PSP_RR policy is enabled via CLI.

esxcfg-advcfg  -g /Misc/EnablePSPLatencyPolicy

 

If Policy is enabled, the value will be 1, in 6.7 U1, it is enabled by default.

Value of EnablePSPLatencyPolicy is 1

 

To enable the configuration option to use latency-based sub-policy for VMW_PSP_RR:

esxcfg-advcfg -s 1 /Misc/EnablePSPLatencyPolicy

 

 To check the value in the GUI, go into advanced settings on your ESXi/vSphere host, edit settings, and search for "Misc.EnablePSPLatencyPolicy". Here you can verify or change the value. "1" is enabled, "0" is disabled.

Important Note: This does not mean a specific target has the latency PSP enabled, this means the host has the functionality enabled.

Latency PSP advanced settings

 

Specific device settings and validation:

 

To check the current device sub-policy of a target, use the following command:

esxcli storage nmp psp roundrobin deviceconfig get -d <Device_ID>

Example: esxcli storage nmp device list -d naa.624a9370b97601e346f64ba900011028

 

To check the current sub-policy of an NVMeoF target:

esxcli storage hpp device list -d <NVMeoF Device_ID>

Example: esxcli storage hpp device list -d eui.0000000000000990742b0f00000006d0

 

To switch to the latency-based sub-policy of a target, use the following command:

esxcli storage nmp psp roundrobin deviceconfig set -d <Device_ID> --type=latency

Example: esxcli storage nmp psp roundrobin deviceconfig set -d naa.624a9370b97601e346f64ba900011028 --type=latency

 

If you want to change the default evaluation time or the number of sampling IOs to evaluate latency, use the following commands.

Note: Check with your storage vendor before changing the defaults. Vendors have found the defaults to be adequate.

 

For Latency evaluation time:

esxcli storage nmp psp roundrobin deviceconfig set -d <Device_ID> --type=latency --latency-eval-time=18000

 

For the number of sampling IOs:

esxcli storage nmp psp roundrobin deviceconfig set -d <Device_ID> --type=latency --num-sampling-cycles=32

 

To check the device configuration and sub-policy:

esxcli storage nmp device list -d <Device_ID>

 

 

Usageesxcli storage nmp psp roundrobin deviceconfig set [cmd options]

Description:

 set         Allow setting of the Round Robin path options on a given device controlled by the Round Robin Selection Policy.

 

Cmd options:

 -B|--bytes=<long>   When the --type option is set to 'bytes' this is the value that will be assigned to the byte limit value for this device.

 -g|--cfgfile         Update the config file and runtime with the new setting. In case the device is claimed by another PSP, ignore any errors when applying to runtime configuration.

 -d|--device=<str>    The device you wish to set the Round Robin settings for. This device must be controlled by the Round Robin Path Selection Policy (except when -g is specified)(required)

-I|--iops=<long>     When the --type option is set to 'iops' this is the value that will be assigned to the I/O operation limit value for this device.

 -T|--latency-eval-time=<long>    When the --type option is set to 'latency' this value can control at what interval (in ms) the latency of paths should be evaluated.

 -S|--num-sampling-cycles=<long>  When the --type option is set to 'latency' this value will control how many sample IOs should be issued on each path to calculate latency of the path.

 -t|--type=<str>      Set the type of the Round Robin path switching that should be enabled for this device. Valid values for type are:

       bytes: Set the trigger for path switching based on the number of bytes sent down a path.

       default: Set the trigger for path switching back to default values.

       iops: Set the trigger for path switching based on the number of I/O operations on a path.

       latency: Set the trigger for path switching based on latency and pending IOs on path.

 -U|--useano=<bool>   Set useano to true, to also include non-optimized paths in the set of active paths used to issue I/Os on this device, otherwise set it to false.

 

 

For host profiles: Stateless

“Edit host profile” window allows changing additional parameters i.e. ‘latencyEvalTime’ and ‘samplingIOCount’ of latency sub-policy.

 

Host Profile Latency PSP

 

“Copy settings to Host Profile” window allows copying latency sub-policy settings into other extracted profiles.

Host Profile Latency PSP

 

 

 

For host profiles: Stateful

Settings persist across reboots using Esx.conf.

/storage/plugin/NMP/device[naa.624a9370b97601e346f64ba900011028]/rrNumSamplingCycles = "32"

/storage/plugin/NMP/device[naa.624a9370b97601e346f64ba900011028]/rrPolicy = "latency"

/storage/plugin/NMP/device[naa.624a9370b97601e346f64ba900011028]/rrLatencyEvalTime = "30000"

In Stateful, latency sub-policy settings can be applied using hostprofiles also, if device using the sub-policy is shared across hosts

 

 

Setting via CLI

ESXCli command to change ‘latency-eval-time’:

esxcli storage nmp psp roundrobin deviceconfig set --type=latency --latency-eval-time=30000 --device=<Device_ID>

 

To check the setting:

esxcli storage nmp device list -d <Device_ID>

 

Example:

esxcli storage nmp device list -d naa.624a9370b97601e346f64ba900011028

naa.624a9370b97601e346f64ba900011028

Device Display Name: PURE Fibre Channel Disk (naa.624a9370b97601e346f64ba900011028)

Storage Array Type: VMW_SATP_ALUA

Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=0,TPG_state=AO}}

Path Selection Policy: VMW_PSP_RR

Path Selection Policy Device Config: {policy=latency,latencyEvalTime=30000,samplingCycles=16,curSamplingCycle=1,useANO=0; CurrentPath=vmhba5:C0:T1:L7: NumIOsPending=0,latency=0}

Path Selection Policy Device Custom Config:

Working Paths: vmhba4:C0:T0:L7, vmhba6:C0:T1:L7, vmhba6:C0:T0:L7, vmhba5:C0:T1:L7, vmhba5:C0:T0:L7, vmhba4:C0:T1:L7, vmhba3:C0:T0:L7, vmhba3:C0:T1:L7

Is USB: false

 

ESXCli command to change ‘num-sampling-cycles’:

esxcli storage nmp psp roundrobin deviceconfig set --type=latency --num-sampling-cycles=32 --device=<Device_ID>

 

To check the setting:

esxcli storage nmp device list -d <Device_ID>

 

Example:

esxcli storage nmp device list -d naa.624a9370b97601e346f64ba900011028

naa.624a9370b97601e346f64ba900011028

Device Display Name: PURE Fibre Channel Disk (naa.624a9370b97601e346f64ba900011028)

Storage Array Type: VMW_SATP_ALUA

Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=0,TPG_state=AO}}

Path Selection Policy: VMW_PSP_RR

Path Selection Policy Device Config: {policy=latency,latencyEvalTime=30000,samplingCycles=32,curSamplingCycle=1,useANO=0; CurrentPath=vmhba5:C0:T1:L7: NumIOsPending=0,latency=0}

Path Selection Policy Device Custom Config:

Working Paths: vmhba4:C0:T0:L7, vmhba6:C0:T1:L7, vmhba6:C0:T0:L7, vmhba5:C0:T1:L7, vmhba5:C0:T0:L7, vmhba4:C0:T1:L7, vmhba3:C0:T0:L7, vmhba3:C0:T1:L7

Is USB: false

 

 

Supported Protocols

 All the major protocols are supported: FC, iSCSI, and NVMeoF. Make sure to check with your storage vendor before making changes. If you do change a setting it must be changed on all hosts for the same target.

 

Example of setting Latency PSP for NVMeoF target in UI:

NVMeoF Latency PSP

 

Checking NVMe target using esxcli:

esxcli storage hpp device list -d eui.0000000000000990742b0f00000006d0

eui.0000000000000990742b0f00000006d0

  Device Display Name: NVMe TCP Disk (eui.0000000000000990742b0f00000006d0)

  Path Selection Scheme: LB-Latency

  Path Selection Scheme Config: {latencyEvalTime=30000,samplingCycles=16;}

  Current Path: vmhba68:C0:T4:L10

  Working Path Set: vmhba67:C0:T3:L10, vmhba67:C0:T4:L10, vmhba68:C0:T3:L10, vmhba68:C0:T4:L10

  Is SSD: true

  Is Local: false

  Paths: vmhba67:C0:T3:L10, vmhba67:C0:T4:L10, vmhba68:C0:T3:L10, vmhba68:C0:T4:L10

  Use ANO: false

 

 

 

Example of how well the Latency RR PSP works with a stretch cluster (vMSC)

Courtesy of Cody Hosterman @ Pure Storage https://www.codyhosterman.com/2018/10/latency-based-psp-in-esxi-6-7-update-1-a-test-drive/

 

 Initial settings for paths are not tagged with Optimized/non-optimized. Subsequently, IO is sent to all paths and must wait for the remote array to acknowledge writes. To see the path details in esxtop, select “d” for disk the “P” and the <disk.id>. With this configuration, the workload is pushing about 8,500 IOPS.

latency without optimized or latency psp

 

 Changing the paths to be tagged with optimized/non-optimized tells the host which paths are local or not. This can greatly help with performance. Here you can see the remote array paths are not used. The workload increases to 13,500 IOPS by optimizing paths.

optimized/non-optimized path selection

 

In the UI, you can also see which paths have active IO and the other available paths that don’t have IO. 

 

 Now, the optimized/non-optimized tagging has been removed and the Latency RR policy enabled. Almost immediately the non-optimized or remote array paths are dropped.

Latency PSP enabled

 

 After a short while, all the higher latency paths no longer have IO sent through them, and only the local, lower latency paths are used. This is accomplished solely by the latency RR PSP. Also, the IOPS are back to 13,500. Remember, the optimized/non-optimized tags are not used in this example.

 

After some validation and testing, some of our storage partners have made this their default SATP claim rule for their arrays. 

 

You can check to see if your target is using the Latency policy using the following command example.

esxcli storage nmp device list -d naa.6742b0f0000006d0000000000000007f

naa.6742b0f0000006d0000000000000007f

  Device Display Name: NFINIDAT iSCSI Disk (naa.6742b0f0000006d0000000000000007f)

  Storage Array Type: VMW_SATP_ALUA

  Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=off; {TPG_id=1,TPG_state=AO}}

  Path Selection Policy: VMW_PSP_RR

  Path Selection Policy Device Config: {policy=latency,latencyEvalTime=180000,samplingCycles=16,curSamplingCycle=11,useANO=0; CurrentPath=vmhba64:C0:T3:L11: NumIOsPending=0,latency=0}

  Path Selection Policy Device Custom Config:

  Working Paths: vmhba64:C0:T3:L11, vmhba64:C1:T3:L11, vmhba64:C6:T3:L11, vmhba64:C7:T3:L11

  Is USB: false

 

 

Customer Observed Results

 Customers who have enabled or are using the policy noticed much more consistent performance. There have been a few customers that reported paths, in a traditional setup not vMSC, consistently not being used. Usually, when we see this outcome it’s an indication of a path issue. Possibly HW failure, bad cable, GBIC, HBA, etc. Path issues are highlighted because the Latency PSP observes a higher latency than the other paths and ends up avoiding that path or paths. Customers who dove more into the issue ended up finding a physical issue with the path not being used and not an issue with the latency PSP.

 

 

Summary

 More and more vendors and validating and planning to move to the latency RR PSP. It simplifies the customer setup and provides very good, and intelligent path selection and failover. If you or your vendor are not using the Latency RR PSP, it is something you should look into. Below I’ve provided resources for checking and modifying pathing options.

 

 

Resources

Author

 Jason Massae is the Staff Technical Marketing Architect for Core Storage, vVols, and vSAN at VMware by Broadcom. Focusing on external storage, Jason works with VMFS, NFS, vVols, NVMeoF, and vSAN solutions for vSphere storage. Working closely with customers, and engineering he develops content collateral for customers to help optimize the best possible storage solution and deployment.

@jbmassae

Filter Tags

Storage ESXi 6.7 ESXi 7 ESXi 8 vSphere vSphere 6.7 vSphere 7 vSphere 8 iSCSI NVMeoF Virtual Volumes (vVols) VMFS Document Deep Dive Feature Walkthrough Intermediate Advanced Design Planning Deploy Optimize