So lately updated a couple vCenter server servers, and in my process I hit a couple errors that required some resolving…
- Expired Certs on Source vCenter
- Error  Auth Provider, due to something, potentially bad certs.
- An HPE Bug, filling up ramdisk, causing HA config issues.
- Change in security process; preventing login.
So a couple hiccups along the way. And now it’s time to resolve this one…
Yeahhhh and alert on Certificates… Seems like VMware and certificate management is like Oil n Water. They don’t mix well.
Anyway… to the point, you figured you simply navigate to the vCenter WebUI -> Home -> Administration -> Certificates. Only to realize there’s nothing reporting as invalid or expired.
Checking for Expired Certs
What gives? Ahhh yes, more hidden secret stuff that is not in your face when it comes to the WebUI. Can you guess? That’s right another VMware KB…
So while the other issues I’ve mentioned does have references and script in relation to certs, the only “check” in those previous posts was using openssl on the VCSA shell to grab the certificate from the listening service on the dedicated port. Which was based on a particular symptom which spurred that check. So here’s the KB telling you how to actually check the certificates the easiest way I found so far (no check.py; python script needed)
for store in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list | grep -v TRUSTED_ROOT_CRLS); do echo "[*] Store :" $store; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $store --text | grep -ie "Alias" -ie "Not After";done;
That’s it! :D…. which just like the KB indicated which cert was bad, in this case, an old Root CA that was used in previous deployments of vCenter before upgrades, So it turns out even though you follow the required KB to get past the pre-check of expired certs. It doesn’t delete the old certificates CA Cert.
There it is, the second CA Cert with expiry in 2019… OK so… You figured it would be easy to clean this up, but remember you couldn’t even see it in the WebUI, so you best believe there is no WebUI way to do this that protects you from human error.
Removing old Expired Certs
Instead, very brilliantly, you get… yes another KB! Booo Yeah… So let’s do this!
The main thing to note about this is…
Certificates are copied back to the VECS store because the CA Certificate which is expiring is published to the VMware Directory Service (VMDIR). When the Certificate is removed from VECS, VMDIR adds the Certificate back to VECS during a sync operation. This is done in order to ensure the integrity of the TRUSTED_ROOTS Certificate store, as deletion of an incorrect Certificate from this store could cause the environment to be irreparably damaged.
OK…. All I take away from this is Certs are important so they have a second cert store as a backup to the first cert store… that’s all I can take away form this odd statement.
/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOTS --text | less
“Find the Certificate you wish to remove and make a note of the Alias and the X509v3 Subject Key Identifier.
Note: There Could be several Certificates to remove. Any expired and not in use certificates should be removed to avoid certificate related alarms.”
Yes that is the plan…
List the trusted certs published to the VMware Directory Service using the following command (email@example.com password required). This command is in the same location as vecs-cli:
/usr/lib/vmware-vmafd/bin/dir-cli trustedcert list
Huh… in this case it looks like it is not here, so I should be safe to delete it from the normal store and it shouldn’t auto populate back in.
If you do see it (CN equal to x509v3 Key Identifier) then follow the linked KB to remove it, which seems to save a copy of the cert and use that saved copy to run another command to remove it from the store… super weird.
/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store TRUSTED_ROOTS --alias 3276134ad93b3688b5dc5dcfaa402e9bfd7af12f
Restart all services on the PSCs and on the vCenter Servers and ensure that all services start and respond normally and that you can log in and manage the environment.
service-control --stop --all service-control --start --all
Took a liil while, then logging in… alert still there, I guess I just have to Reset to Green?
For Now Clicked the Reset to Green link. Even after Yet another vCenter patch, it still did not show up anymore. Yay.