UNMAP requests on thin provisioned VMFS may cause PSOD in vSphere 7.0 U3
In vSphere 7.0 Update 3 there were changes made to the granularity used for space reclamation (Trim/UNMAP). The change was to make the size of the commands uniform between regular VMFS6 and SESparse snapshots. The update makes both 2GB.
The issue that may arise is when a guest OS issues a trim or UNMAP command with the new 2GB granularity in fuller VMFS datastores. This might require the VMFS metadata transaction to do lock acquisition of more than 50 Resource Clusters (RCs). Consequently, VMFS might not handle the request correctly and result in an ESXi host failing with a purple diagnostic screen. VMFS metadata transaction requiring lock actions on greater than 50 RCs is rare and can only happen on aged datastores.
In VMFS, a Resource Cluster is an allocation of blocks available for data to be written. For more detail, see my article on the new Affinity 2.0 in the vSphere 7.0 release.
The issue only impacts thin-provisioned VMDKs, thick, and Eager zero thick VMDKs are not impacted. Disabling space reclamation at the datastore level will not alleviate the potential of this issue occurring.
In vSphere 7.0 Update 3a, there is a fix for this issue. If you have experienced an issue where the host PSODs after upgrading to Update 3, make sure to verify it is this issue before upgrading.
You can read the details in the VMware ESXi 7.0 Update 3a Release Notes
PR 2861632: If a guest OS issues UNMAP requests with large size on thin-provisioned VMDKs, ESXi hosts might fail with a purple diagnostic screen.