I’ve decided that whilst it’s great to provision a Kubernetes blueprint with a standard network overlay to the vRealize Automation catalog, offering one that leverages NSX-T is even better. So a few days ago I started creating
this new blueprint to make use of the NSX-T NCP. However, once the first machine provisioned I discovered that routing was seriously broken, and no other machines would deploy. Seeing as this is the secondary site I dived into the NSX-T UI to see what the issue could be (we use NSX-V in the primary site).
Routing looked okay, but the transport nodes (my ESXi hosts my vRA compute cluster) had connectivity issues when communicating with the NSX controller.
Taking a look at the management cluster it was clear something wasn’t right:
On closer inspection…
Now I started to panic. When I upgraded from 2.3 to 2.4, I remember that the controllers were merged into the NSX manager before being deleted. Did I miss a step? Did I blow away my control plane before ensuring the upgrade was successful?
I decided to investigate further in the CLI.
For a start, is the controller even running?
Maybe the config has gone awry?
Time to get the full status:
I was convinced at this point I would have to add another node, and at some time later remove the original. This involved a lot of work, as the original node is defined in various firewall configs and is set as an endpoint in both vRA on-prem and vRA Cloud.
Then an alert caught my eye:
Okay, so I needed to sort that. Could it be the cause of all my pain?
At this point I SSH’d to the NSX-T manager and reset the password expiration using:
clear user root password-expiration
This is confirmed by using:
get user admin password-expiration
I then rebooted the manager. After a few minutes I was greeted with:
The moral of this story is clear – don’t let your NSX user passwords expire!
From now on I’m keeping my passwords from expiring. It may not be best practice, but it is preferable to the loss of service.