Preparing for a VMware Cloud Foundation BOM upgrade
As VMware Cloud Foundation upgrades can be quite large in terms of the Bill of Materials (BOM) that may require upgrading, a common question that I get asked is what can a VCF administrator do to ensure the environment is in a healthy state to give an upgrade through SDDC Manager's LCM service the best opportunity to successfully complete.
My normal recommendations are the following:
- A log support bundles for any Workload Domain, taken prior to the upgrade attempt.
- SoS health-check.
- Running the LCM pre-check in the UI for each Workload Domain.
- A password sanity check against all SDDC Manager managed components
- Use vCenter Web Client to review any alerts.
Below I am going to demonstrate these steps and outline the importance each of these have in the health validation.
Firstly, gathering an SoS bundle may sound like overkill due to the sheer amount of logging that might need to be gathered based on the size of the environment and will require a judgement call to be made if this is feasible to meet a specific maintenance window.
With that said, capturing data before a major upgrade can be crucial for root cause analysis. For some VMware appliance upgrades like vCenter Server, SDDC Manager, vRSLCM, these upgrades are "migration upgrades" to a new appliance and with that logging on the older appliance may not be imported. Here's where logging prior to the upgrade comes in handy:
- Unexpected behaviours / errors post successful upgrade.
- Failed upgrade attempt and left in an "unrecoverable" state requiring GSS Support assistance.
Capturing logs from all of the managed components in VMware Cloud Foundation can be done through the SDDC Manager command line "Supportability and Serviceability (SoS)" tool.
Connect to the SDDC Manager VM through SSH and run the command below, changing the "Domain Name" and directory as appropriate:
reports any dumps that have been created on the ESXi hosts in that Workload Domain along with the NSX Controller cluster status. This is really useful to find out if there are intermittent issues in the workload domain that may indicate a reason why an upgrade could go wrong.
curl i u
With the output, connect to each component and validate that the passwords successfully log in and are not expired/ expiring soon.
The workload domain pre-check will include either the NSX-V or NSX-T managers and additional vCenter VM as part of the pre-check.
If any issues are found during these workflows, the UI will reports something like the image below.
Ensure to take proper actions to resolve these issues before proceeding with an upgrade. Even if a pre-check fails in the workflow, it will not stop you from attempting an upgrade though depending on the failure the likelihood of it being successful will be low and fail early.
Expanding the failed component will provide more details and potential remediation steps.
Re-run the pre-check once you have done fixed the reported failure to ensure SDDC Manager validates the fix.
While the above steps and tools will have caught most, if not all, of the environmental issues that SDDC Manager is aware of, the final check is to review the vCenter Web Client for any alerts/ alarms that are unknown to SDDC Manager.
These can arise from some sort of manual intervention or third party linkage that could cause some issues during an upgrade so it worth just calling out as a last place to check. I'm not going to demonstrate where and how to check these alerts, though you can find out more to in the official documentation for vCenter .
The final and probably the most important of all is to ensure you review the release notes for the BOM you plan to upgrade to as there is valuable information on in these releases that you can also utilize in a pre-validation process.
At this point you know that the system is healthy and you are aware of the potential issues that could be faced in the upgrade to the next release. Just download the appropriate bundles from the depot (through the UI or offline) and schedule your upgrade!
About the Author:
Bryan O'Sullivan is a Staff Technical Support Engineer based in Cork, Ireland supporting VMware Cloud Foundatin in Global Support Services. This was originally posted here, it is being reposted with the authors consent.