Latency Round Robin Path Selection Policy
Latency PSP Deep Dive
Problem
With the traditional path selection policies (PSP), there’s no logic or intelligence in selecting the optimal path(s). Many of our storage partners use the Round Robin PSP as it usually provides the best performance and failover. With the default RR PSP being 1000, the next path isn’t used until 1000 IOs have been sent down the current path. Some storage vendors recommend changing this value to 1 which then directs every other IO to the next path. This can help with performance and fail-over but still does not have any logic or intelligence. Consequently, if a path has higher latency or possibly HW issues, that path will still be issued IO. Here in lies the problem.
Solution
In vSphere 6.7 U1, we introduced a new sub-policy for the Round Robin (RR) Path Selection Policy (PSP) VMW_PSP_RR which actively monitors paths for latency and Outstanding IO (OIO). The policy considers path latency and OIO on each active and available path. This is accomplished via an algorithm that monitors the paths and calculates the average latency per path based on time and the number of OIOs. When latency policy is used, the logic will monitor 16 IOs per path to calculate a working average. This working average results in a value for each path. Once a sample period has been completed, the remaining IOs will be directed based on the algorithm’s calculations for the path(s) with the least latency. When using the latency mechanism, the Round Robin policy will dynamically select the optimal path(s) subsequently, achieving better load-balancing as well as performance.
Details
When the latency policy is enabled, the algorithm looks at all available paths sampling OIO and latency for each. In the diagram below you can see path 2 has 2 OIO and a latency of 5ms and path 1 has 5 OIO and a latency of 1ms. Using the calculation P(avgLatency) = (Completiont - Issuet) / P(Sampling IO count), the result is path 1 is chosen to send IO for that sample cycle.
Diving into the logic a little more, you can see in the diagram below how it works.
Legend:
- T = Interval after sampling should start again
- m = Sampling IOs per path
- t1 < t2 < t3 ---------------> 10ms < 20ms < 30ms
- t1/m < t2/m < t3/m -----> 10/16 < 20/16 < 30/16
With the testing, we found that the latency policy, even with latency introduced up to 100ms on half the paths, maintained almost full throughput.
Testing Results
Random 100% reads, 32 OIO, 2 Workers, Delay introduced on 2 paths of 4.
Random 70% read / 30% write, 32 OIO, 2 workers, Delay introduced on 2 path of 4.
Reviewing the graphs, you can see the throughput, latency, and CPU utilization all stay more constant throughout the range. The Latency RR PSP tends to provide much more consistent and higher performance over traditional PSPs.
Configuring
To check whether the VMW_PSP_RR policy is enabled via CLI.
esxcfg-advcfg -g /Misc/EnablePSPLatencyPolicy
If Policy is enabled, the value will be 1, in 6.7 U1, it is enabled by default.
Value of EnablePSPLatencyPolicy is 1
To enable the configuration option to use latency-based sub-policy for VMW_PSP_RR:
esxcfg-advcfg -s 1 /Misc/EnablePSPLatencyPolicy
To check the value in the GUI, go into advanced settings on your ESXi/vSphere host, edit settings, and search for "Misc.EnablePSPLatencyPolicy". Here you can verify or change the value. "1" is enabled, "0" is disabled.
Important Note: This does not mean a specific target has the latency PSP enabled, this means the host has the functionality enabled.
Specific device settings and validation:
To check the current device sub-policy of a target, use the following command:
esxcli storage nmp psp roundrobin deviceconfig get -d <Device_ID>
Example: esxcli storage nmp device list -d naa.624a9370b97601e346f64ba900011028
To check the current sub-policy of an NVMeoF target:
esxcli storage hpp device list -d <NVMeoF Device_ID>
Example: esxcli storage hpp device list -d eui.0000000000000990742b0f00000006d0
To switch to the latency-based sub-policy of a target, use the following command:
esxcli storage nmp psp roundrobin deviceconfig set -d <Device_ID> --type=latency
Example: esxcli storage nmp psp roundrobin deviceconfig set -d naa.624a9370b97601e346f64ba900011028 --type=latency
If you want to change the default evaluation time or the number of sampling IOs to evaluate latency, use the following commands.
Note: Check with your storage vendor before changing the defaults. Vendors have found the defaults to be adequate.
For Latency evaluation time:
esxcli storage nmp psp roundrobin deviceconfig set -d <Device_ID> --type=latency --latency-eval-time=18000
For the number of sampling IOs:
esxcli storage nmp psp roundrobin deviceconfig set -d <Device_ID> --type=latency --num-sampling-cycles=32
To check the device configuration and sub-policy:
esxcli storage nmp device list -d <Device_ID>
Usage: esxcli storage nmp psp roundrobin deviceconfig set [cmd options]
Description:
set Allow setting of the Round Robin path options on a given device controlled by the Round Robin Selection Policy.
Cmd options:
-B|--bytes=<long> When the --type option is set to 'bytes' this is the value that will be assigned to the byte limit value for this device.
-g|--cfgfile Update the config file and runtime with the new setting. In case the device is claimed by another PSP, ignore any errors when applying to runtime configuration.
-d|--device=<str> The device you wish to set the Round Robin settings for. This device must be controlled by the Round Robin Path Selection Policy (except when -g is specified)(required)
-I|--iops=<long> When the --type option is set to 'iops' this is the value that will be assigned to the I/O operation limit value for this device.
-T|--latency-eval-time=<long> When the --type option is set to 'latency' this value can control at what interval (in ms) the latency of paths should be evaluated.
-S|--num-sampling-cycles=<long> When the --type option is set to 'latency' this value will control how many sample IOs should be issued on each path to calculate latency of the path.
-t|--type=<str> Set the type of the Round Robin path switching that should be enabled for this device. Valid values for type are:
bytes: Set the trigger for path switching based on the number of bytes sent down a path.
default: Set the trigger for path switching back to default values.
iops: Set the trigger for path switching based on the number of I/O operations on a path.
latency: Set the trigger for path switching based on latency and pending IOs on path.
-U|--useano=<bool> Set useano to true, to also include non-optimized paths in the set of active paths used to issue I/Os on this device, otherwise set it to false.
For host profiles: Stateless
“Edit host profile” window allows changing additional parameters i.e. ‘latencyEvalTime’ and ‘samplingIOCount’ of latency sub-policy.
“Copy settings to Host Profile” window allows copying latency sub-policy settings into other extracted profiles.
For host profiles: Stateful
Settings persist across reboots using Esx.conf.
/storage/plugin/NMP/device[naa.624a9370b97601e346f64ba900011028]/rrNumSamplingCycles = "32"
/storage/plugin/NMP/device[naa.624a9370b97601e346f64ba900011028]/rrPolicy = "latency"
/storage/plugin/NMP/device[naa.624a9370b97601e346f64ba900011028]/rrLatencyEvalTime = "30000"
In Stateful, latency sub-policy settings can be applied using hostprofiles also, if device using the sub-policy is shared across hosts
Setting via CLI
ESXCli command to change ‘latency-eval-time’:
esxcli storage nmp psp roundrobin deviceconfig set --type=latency --latency-eval-time=30000 --device=<Device_ID>
To check the setting:
esxcli storage nmp device list -d <Device_ID>
Example:
esxcli storage nmp device list -d naa.624a9370b97601e346f64ba900011028
naa.624a9370b97601e346f64ba900011028
Device Display Name: PURE Fibre Channel Disk (naa.624a9370b97601e346f64ba900011028)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=0,TPG_state=AO}}
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=latency,latencyEvalTime=30000,samplingCycles=16,curSamplingCycle=1,useANO=0; CurrentPath=vmhba5:C0:T1:L7: NumIOsPending=0,latency=0}
Path Selection Policy Device Custom Config:
Working Paths: vmhba4:C0:T0:L7, vmhba6:C0:T1:L7, vmhba6:C0:T0:L7, vmhba5:C0:T1:L7, vmhba5:C0:T0:L7, vmhba4:C0:T1:L7, vmhba3:C0:T0:L7, vmhba3:C0:T1:L7
Is USB: false
ESXCli command to change ‘num-sampling-cycles’:
esxcli storage nmp psp roundrobin deviceconfig set --type=latency --num-sampling-cycles=32 --device=<Device_ID>
To check the setting:
esxcli storage nmp device list -d <Device_ID>
Example:
esxcli storage nmp device list -d naa.624a9370b97601e346f64ba900011028
naa.624a9370b97601e346f64ba900011028
Device Display Name: PURE Fibre Channel Disk (naa.624a9370b97601e346f64ba900011028)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=0,TPG_state=AO}}
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=latency,latencyEvalTime=30000,samplingCycles=32,curSamplingCycle=1,useANO=0; CurrentPath=vmhba5:C0:T1:L7: NumIOsPending=0,latency=0}
Path Selection Policy Device Custom Config:
Working Paths: vmhba4:C0:T0:L7, vmhba6:C0:T1:L7, vmhba6:C0:T0:L7, vmhba5:C0:T1:L7, vmhba5:C0:T0:L7, vmhba4:C0:T1:L7, vmhba3:C0:T0:L7, vmhba3:C0:T1:L7
Is USB: false
Supported Protocols
All the major protocols are supported: FC, iSCSI, and NVMeoF. Make sure to check with your storage vendor before making changes. If you do change a setting it must be changed on all hosts for the same target.
Example of setting Latency PSP for NVMeoF target in UI:
Checking NVMe target using esxcli:
esxcli storage hpp device list -d eui.0000000000000990742b0f00000006d0
eui.0000000000000990742b0f00000006d0
Device Display Name: NVMe TCP Disk (eui.0000000000000990742b0f00000006d0)
Path Selection Scheme: LB-Latency
Path Selection Scheme Config: {latencyEvalTime=30000,samplingCycles=16;}
Current Path: vmhba68:C0:T4:L10
Working Path Set: vmhba67:C0:T3:L10, vmhba67:C0:T4:L10, vmhba68:C0:T3:L10, vmhba68:C0:T4:L10
Is SSD: true
Is Local: false
Paths: vmhba67:C0:T3:L10, vmhba67:C0:T4:L10, vmhba68:C0:T3:L10, vmhba68:C0:T4:L10
Use ANO: false
Example of how well the Latency RR PSP works with a stretch cluster (vMSC)
Courtesy of Cody Hosterman @ Pure Storage https://www.codyhosterman.com/2018/10/latency-based-psp-in-esxi-6-7-update-1-a-test-drive/
Initial settings for paths are not tagged with Optimized/non-optimized. Subsequently, IO is sent to all paths and must wait for the remote array to acknowledge writes. To see the path details in esxtop, select “d” for disk the “P” and the <disk.id>. With this configuration, the workload is pushing about 8,500 IOPS.
Changing the paths to be tagged with optimized/non-optimized tells the host which paths are local or not. This can greatly help with performance. Here you can see the remote array paths are not used. The workload increases to 13,500 IOPS by optimizing paths.
In the UI, you can also see which paths have active IO and the other available paths that don’t have IO.
Now, the optimized/non-optimized tagging has been removed and the Latency RR policy enabled. Almost immediately the non-optimized or remote array paths are dropped.
After a short while, all the higher latency paths no longer have IO sent through them, and only the local, lower latency paths are used. This is accomplished solely by the latency RR PSP. Also, the IOPS are back to 13,500. Remember, the optimized/non-optimized tags are not used in this example.
After some validation and testing, some of our storage partners have made this their default SATP claim rule for their arrays.
You can check to see if your target is using the Latency policy using the following command example.
esxcli storage nmp device list -d naa.6742b0f0000006d0000000000000007f
naa.6742b0f0000006d0000000000000007f
Device Display Name: NFINIDAT iSCSI Disk (naa.6742b0f0000006d0000000000000007f)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=off; {TPG_id=1,TPG_state=AO}}
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=latency,latencyEvalTime=180000,samplingCycles=16,curSamplingCycle=11,useANO=0; CurrentPath=vmhba64:C0:T3:L11: NumIOsPending=0,latency=0}
Path Selection Policy Device Custom Config:
Working Paths: vmhba64:C0:T3:L11, vmhba64:C1:T3:L11, vmhba64:C6:T3:L11, vmhba64:C7:T3:L11
Is USB: false
Customer Observed Results
Customers who have enabled or are using the policy noticed much more consistent performance. There have been a few customers that reported paths, in a traditional setup not vMSC, consistently not being used. Usually, when we see this outcome it’s an indication of a path issue. Possibly HW failure, bad cable, GBIC, HBA, etc. Path issues are highlighted because the Latency PSP observes a higher latency than the other paths and ends up avoiding that path or paths. Customers who dove more into the issue ended up finding a physical issue with the path not being used and not an issue with the latency PSP.
Summary
More and more vendors and validating and planning to move to the latency RR PSP. It simplifies the customer setup and provides very good, and intelligent path selection and failover. If you or your vendor are not using the Latency RR PSP, it is something you should look into. Below I’ve provided resources for checking and modifying pathing options.
Resources
- Change Default Parameters for Latency Round Robin (vmware.com)
- Viewing and Managing Storage Paths on ESXi Hosts (vmware.com)
- Using Claim Rules to Control ESXi Multipathing Modules (vmware.com)
- Modifying path information for ESXi hosts (2000552) (vmware.com)
- Changing a LUN to use a different Path Selection Policy (PSP) (1036189) (vmware.com)
- Changing Path Selection Policy for multiple LUNs (2053628) (vmware.com)
- Adjusting Round Robin IOPS limit from default 1000 to 1 (2069356) (vmware.com)
- VMware High Performance Plug-In and Path Selection Schemes
- VMware Path Selection Plug-Ins and Policies
- Define NMP SATP Rules (vmware.com)
- VMware SATPs
- Multipathing Considerations (vmware.com)
- Display SATPs for the Host (vmware.com)
- Using Claim Rules (vmware.com)
Author
Jason Massae is the Staff Technical Marketing Architect for Core Storage, vVols, and vSAN at VMware by Broadcom. Focusing on external storage, Jason works with VMFS, NFS, vVols, NVMeoF, and vSAN solutions for vSphere storage. Working closely with customers, and engineering he develops content collateral for customers to help optimize the best possible storage solution and deployment.