Change vCenter FQDN or IP on Veeam

Story

I recently did a infrastructure upgrade on my home lab, which included moving all my esxi hosts into a dedicate subnet, and making them all more dependent on DNS. This has it’s pros n cons, after all my ESXi host had their IP addresses changed. I also moved my vCenter and changed it IP address, which is now supported yay.

Now I had to move Veeam along with it, originally it was in the same subnet as the esxi hosts, and vCenter which have all moved, instead of trying to manage cross subnet comms, I changed Veaam’s IP address and pointed it’s DNS settings to my AD DNS which has all the ESXi and vCenter host records. Was easy enough just changing the Windows NIC Ip address, and changing the VM’s VMPG.

 How to

Now when I went to scan the vCenter instance in Veeam, it complained about the certificate, since it was renewed from the vCEnter upgrade. I decided I’d change it to be based on DNS now that everything else is as well. When I went to edit the object in Veeam it was greyed out.  Lucky for me Veeam had a KB ready to go.

Challenge

The Name/FQDN/IP of the vCenter Server has changed, and needs to be updated within Veeam Backup & Replication.

Solution

Solves Name Change Only
This solution applies ONLY if the vCenter Server database has not changed.
(I did an upgrade so yes, which you’d want to preserve VM-IDs, and chains)

If the Name/FQDN/IP of the vCenter changed due to a reinstall or upgrade, and a new vCenter database was used, the Ref-IDs will have changed. Due to the changed Ref-IDs you will need to follow the documented process in www.veeam.com/KB1299

Step 1

Prior to running the commands below you need to identify the Name\FQDN\IP Veeam is using to communicate with the VC currently. To do this, edit the entry for the vCenter under Backup Infrastructure and note the “Name:”.

Next perform the following steps to change that VMware Server’s name.

Step 2

Launch PowerShell from inside the Veeam Backup & Replication console. You can find the “PowerShell” button under the File-menu’s “Console” section.

The Veeam Backup & Replication PowerShell Tookit will load.

Step 3

Run the following command:

$Servers = Get-VBRServer -name "old-name"

Replace old-name with the “Name” current set for the vCenter in the Veeam Backup and Replication Console

Step 4

Run the following command next to change the name:

$Servers.SetName("new-name")

Replace new-name with the new name for the vCenter, this can be an IP, Hostname, or FQDN.
Do not remove the quotes on either side.
This change will go in to effect as soon as the command in Step 4 completes.

How I did it – One Liner

Verify:

Get-VBRServer -name "Name from Step 1"

Change:

(Get-VBRServer -name "Name From Step 1").SetName("new.domain.com")

Results:

Now you can click next, Apply, should get right past checking certificate if the certificates are all good… and end up with the follow after rescan:

That was easy enough, I don’t fully understand why the grey out the UI to make this change, but there you have it. Happy Backups!

vSphere HA Agent cannot be correctly installed or configured… again

Story

Another vCenter Patch, Another problem 😀

This seems to be a reoccurring story these last couple posts…

Error on Host

This time after updating again a host in the cluster had the error message.

Troubleshooting

Un like the last time this happened, the event log wasn’t as blatant (flooded) complaining about the /tmp being full. and checking the host with

vdf -h

which showed only 90% full, which was still pretty high, which might have explained the one log event that I did see about it:

The ramdisk 'tmp' is full. As a result, the file /tmp/img-stg/data/vmware_f.v00 could not be written

Which was in the log right after this event of attempting to install a base ESXi image?

Installing image profile '(Updated) HPE-ESXi-Image' with acceptance level checking disabled

This seemed a bit weird but I could find any info other than what’s usuallly a very Microsoft type answer of “you can just ignore it” or “usually this is not an issue, just it says vCenter saying it is connecting to esxi host and installing it’s agent

OK I guess… moving on… the very next error event was:

Could not stage image profile '(Updated) HPE-ESXi-Image': ('VMware_bootbank_vmware-fdm_7.0.2-18455184', '[Errno 28] No space left on device')

Huh, Now note this host was installed running the official VMware Image provided by HPE for this exact hardware supported by the VMware HCL. So there should be no funny business. However I feel maybe there’s a bit of the known HPE bug as mentioned the last time this happened. It just hasn’t fully flooded /tmp just yet.

Lil Side Trail

So couple things to note here, first the ESXi image is installed on a USB/SD Card style setup as such it should be well know to define the persistent log location, as well as the scratch location. However, not many source specify changing the system swap location.

  1. Persistent Log; VMware KB; Tech Blogger
    (Most standard ESXi Log info)
  2. Scratch Log: VMware KB; Tech Blogger 1; Tech Blogger 2
    (Crash Logs, Support log creations)
  3. Swap Location: VMware Doc 1 (Configure), VMware Doc2 (About), Tech Blogger Who seem to regurgitate the exact about page from VMware.

However, researching this even more lots of posts on reddit mentioned the swap file for VM’s being on their VM directories, so if using a shared datastore they will reside there, and I shouldn’t see issues around swap usage at all at the host level.

Which if you look on the vCenter Web UI on a ESXi hosts there are two options available: VM – Swap, and System Swap.

The VMware docs doesn’t seem to describe accurately the difference between these two options.

Lookup up the error about not being able to stage the file I found this one blog post which of course mentioned changing the swap location to get past the error…

The main thing mentioned by the blogger is “The problem is caused by ESXi not having enough free space available to extract the installation packages.” but failed to specify where that exactly is, and the event log didn’t specify that either. Now since his solution was to adjust the system swap location, it begs the question. Is the package extraction location the System Swap location?

Since the host settings seem to be only specified with the alternative option checkboxes as:

Can use host cache
Can use datastore specified by host for swap files

It’s still not fully clear to me where the swap is actually located with these, assumed default settings. Or if extraction of the image actually using swap, or why the same imagine already on the ESXi host is being re-applied when your upgrade vCenter?

Resolution

So many question, so little answers, so unfortunately I’m going to go on a bit of a whim, and simply try exactly what I did before, clear the file from the /tmp location that was takin up a lot of it’s space, install the HPE patch for the known bug, in hopes it resolves the issue….

Sure enough the exact same thing happened, as in my initial post it just seems it wasn’t fully full. So the symptoms were just a bit different.

  1. vMotion all VMs to another host in the cluster (amazing vMotion works without issue)
  2. Ignore the HA warning on the VMs migrated
  3. Place Host into Maintenance mode (This clears the HA warnings on the VMs and cluster)
  4. Verify /tmp has room. Update any ESXi packages from the hardware vendor if applicable.
  5. Reboot the host.
  6. Exit Maintenance mode.

Hope this helps someone who might see the same type of error events in their ESXi event logs.

Clear vCenter Alert Certificate Status

Story

So lately updated a couple vCenter server servers, and in my process I hit a couple errors that required some resolving…

  1. Expired Certs on Source vCenter
  2. Error [500] Auth Provider, due to something, potentially bad certs.
  3. An HPE Bug, filling up ramdisk, causing HA config issues.
  4. Change in security process; preventing login.

The Problem

So a couple hiccups along the way. And now it’s time to resolve this one…

Yeahhhh and alert on Certificates… Seems like VMware and certificate management is like Oil n Water. They don’t mix well.

I’ve had some terrible times managing certificates  with VMware. However as blogged about here, seems there’s finally a way to use your own certificates via the WebUI.

Anyway… to the point, you figured you simply navigate to the vCenter WebUI -> Home -> Administration -> Certificates. Only to realize there’s nothing reporting as invalid or expired.

Checking for Expired Certs

What gives? Ahhh yes, more hidden secret stuff that is not in your face when it comes to the WebUI. Can you guess? That’s right another VMware KB

So while the other issues I’ve mentioned does have references and script in relation to certs, the only “check” in those previous posts was using openssl on the VCSA shell to grab the certificate from the listening service on the dedicated port. Which was based on a particular symptom which spurred that check. So here’s the KB telling you how to actually check the certificates the easiest way I found so far (no check.py; python script needed)

for store in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list | grep -v TRUSTED_ROOT_CRLS); do echo "[*] Store :" $store; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $store --text | grep -ie "Alias" -ie "Not After";done;

That’s it! :D…. which just like the KB indicated which cert was bad, in this case, an old Root CA that was used in previous deployments of vCenter before upgrades, So it turns out even though you follow the required KB to get past the pre-check of expired certs. It doesn’t delete the old certificates CA Cert.

There it is, the second CA Cert with expiry in 2019… OK so… You figured it would be easy to clean this up, but remember you couldn’t even see it in the WebUI, so you best believe there is no WebUI way to do this that protects you from human error.

Removing old Expired Certs

Instead, very brilliantly, you get… yes another KB! Booo Yeah… So let’s do this!

The main thing to note about this is…

Certificates are copied back to the VECS store because the CA Certificate which is expiring is published to the VMware Directory Service (VMDIR). When the Certificate is removed from VECS, VMDIR adds the Certificate back to VECS during a sync operation. This is done in order to ensure the integrity of the TRUSTED_ROOTS Certificate store, as deletion of an incorrect Certificate from this store could cause the environment to be irreparably damaged.

OK…. All I take away from this is Certs are important so they have a second cert store as a backup to the first cert store… that’s all I can take away form this odd statement.

/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOTS --text | less

“Find the Certificate you wish to remove and make a note of the Alias and the X509v3 Subject Key Identifier.

Note: There Could be several Certificates to remove. Any expired and not in use certificates should be removed to avoid certificate related alarms.”

Yes that is the plan…

List the trusted certs published to the VMware Directory Service using the following command (administrator@vsphere.local password required). This command is in the same location as vecs-cli:

/usr/lib/vmware-vmafd/bin/dir-cli trustedcert list

Huh… in this case it looks like it is not here, so I should be safe to delete it from the normal store and it shouldn’t auto populate back in.

If you do see it (CN equal to x509v3 Key Identifier) then follow the linked KB to remove it, which seems to save a copy of the cert and use that saved copy to run another command to remove it from the store… super weird.

/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store TRUSTED_ROOTS --alias 3276134ad93b3688b5dc5dcfaa402e9bfd7af12f

Restart all services on the PSCs and on the vCenter Servers and ensure that all services start and respond normally and that you can log in and manage the environment.

service-control --stop --all
service-control --start --all

Took a liil while, then logging in… alert still there, I guess I just have to Reset to Green?

For Now Clicked the Reset to Green link. Even after Yet another vCenter patch, it still did not show up anymore. Yay.

Fixing [400] An error occurred while sending an authentication request to the vCenter Single Sign-On server

So After the last two blog posts about fixing vCenter7’s access issues due to it’s due certificate monument work flows. I was greeted with this error when trying to sign into the web UI on vCenter.

[400] An error occurred while sending an authentication request to the vCenter Single Sign-On server- An error occurred when processing meta data during vCenter Single Sign-On setup:the service provider validation failed. Verify that the server URL is correct and is in FQDN format, or that the hostname is a trusted service provider alias.

After a a quick google search I found yet another VMware KB discussing it.

 Resolution
This is an expected behavior.
VMware vSphere 7.0 enforce FQDN or IP address reverse resolvable to FQDN to allow authentication for Single-Sign on.
Greeeeeeeeeeeeeeaaaaaaaaat! Thanks VMware, just another example of security destroying functionality.
What did I do? Exactly what it stated, I navigated to the WebUI URL using the hostnames Fully Qualified Domain Name E.G: Hostname.domain.end
Cause I was attempting to access it just by just the hostname as domain info was being auto resolved by the domain suffix during queries.

vSphere HA Agent cannot be correctly installed or configured

I updated a vCenter server to 7.0.x when logging into the newly updated vCenter one host in the cluster state the following alert.

Error: “vSphere HA agent cannot be correctly installed or configured” (2056299) (vmware.com)

The KB didn’t sound promising. Checking the hosts event logs. a bunch of errors about /tmp ramdisk being full…

The ramdisk ‘tmp’ is full – VMware ESXi on HPE ProLiant – Davoud Teimouri – Virtualization and Data Center

For real? Wow, not gettin’ lucky last couple weeks. Sure enough exact same issue, cleared /tmp temporality, and downloaded the patch. When I vMotion the VMs from this host onto another host the VMs themselves showed alerts.

Virtual machine failed to become vSphere HA Protected and HA may not attempt to restart it after a failure.

I kept chugging alone in hopes I’d resolve each VM later. However as soon as I placed the issued host into maintenance mode, the alerts from all VMs disappeared. Applied to patch exactly as the HPE KB stated for the ESXi version it was on.

With luck on my side, the host came up clean, and came out of maintenance mode without an issue, and all error and alerts were resolved. Woooo!

Hope this helps anyone doing a vCenter upgrade to 7.x

Fixing vCenter [500] An error occurred while fetching identity providers.

Story

So The other day I posted about upgrading vCenter to 7.0.x while everything went fine during the upgrade. For some odd reason a couple days later when I went to navigate to the vCenter login page I was greeted with:

[500] An error occurred while fetching identity providers.

Kind of wished I had read this reddit post right off the hop, cause the first reply was is going to be my answer at the end of this post.

I did however first hit this KB about it as well I was a bit thrown off has it indicated to only do it if you see the following in the logs:

(/var/log/vmware/trustmanagement/trustmanagement-svcs.log)

2021-03-10T09:27:03.474Z [tomcat-exec-14  INFO  com.vmware.identity.token.impl.X509TrustChainKeySelector  opId=] Failed to find trusted path to signing certificate <STS Certificate Subject, example - C=US,CN=ssoserverSign\,dc\=vsphere\,dc\=local>
java.security.cert.CertPathBuilderException: Unable to find certificate chain.

Which I could not see, so I wasn’t sure if this was the issue or not. What I did see in my logs was the following:

2021-09-17T23:58:03.945Z [tomcat-exec-14 WARN com.vmware.vcenter.trustmanagement.impl.VcIdentityProviders opId=] com.vmware.sso.interop.ldap.NoSuchObjectLdapException: No such object
LDAP error [code: 32]

and

2021-09-18T01:19:01.322Z [tomcat-exec-26 INFO com.vmware.vapi.security.AuthenticationFilter opId=] Not successful authentication
java.lang.RuntimeException: Authentication data not found
Caused by: com.vmware.vapi.dsig.json.SignatureException: Cannot verify the signature over the provided data

So it wasn’t matching. Looking at my firewall I couldn’t see any LDAP connections from vCenter to my LDAP server since the upgrade. So I decided instead to try a reboot. This simply made things worse.

No Healthy Upsteam

Now when I’d try access vCEnter Web UI I was greeted with a blank white web page with simple text stating “No Healthy Upstream”, now looking into this, people reached this problem for several different reasons. As mentioned here and here and for some odd reason this guy just changed his IP address?! Weird.

For me I checked the local Hosts file and it was fine, and couple other mentioned fixes and they all didn’t work for me.

Try Anyway

For some reason at this point I decided to double the mentioned work around in the initial VMware KB I found as the main login symptom was exactly the same even though I couldn’t validate the same log entries within the logs.

How to Copy Files to VCSA via WinSCP

Now a couple real quick things to note here. You need to copy a script to the VCSA. If you get unable to agree on a cipher suite, you’ll need to update your copy of WinSCP to a newer version. Also instead of doing what VMware says to change the shell on the VCSA, do what this guy suggests instead:

“In the new connection dialog, specify the Host name, User name and then click the Advanced button,

(VCSA 6.5)

Choose the Environment/SFTP option

Specify for SFTP server: shell /usr/libexec/sftp-server”

so much easier.

I decided to take a look at the script after copying it to the VCSA, and it had this line which had me hopeful it would actually work to resolve my issue:

/opt/likewise/bin/ldapmodify -x -h localhost -p 389 -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" -f sso-sts.ldif | tee -a $LOGFILE

So I followed along with the workaround specified in the KB…

1) Download the attached fixsts.sh script from this article and upload to the impacted PSC or vCenter Server with Embedded PSC to the /tmp folder.

2) If the connection to upload to the vCenter by the SCP client is rejected, run this from an SSH session to the vCenter:

chsh -s /bin/bash

3) Connect to the PSC or vCenter Server with an SSH session if you have not already per Step 2.

4) Navigate to the /tmp directory:

cd /tmp

5) Run chmod +x fixsts.sh to make the file executable.

chmod +x ./fixsts.sh

6) Run ./fixsts.sh.

./fixsts.sh

Restart services on all vCenters and/or PSCs in your SSO domain by using below commands:

service-control --stop --all
service-control --start --all

my results:

To my Amazement it actually worked, and I was able to login into the vCenter server!! Wooo!

*Update* Here’s a great blog post covering managing or creating custom certificates with vCenter 7

Kinda funny that 7.0 is stated as 6.8 in the scripts.. mhmm

ESXi Update Network Config Failed
Set ESXi IP via CLI

Real quick post here. I was moving my ESXi hosts and vCenter to a new dedicated subnet. I did the usual; had a temp Windows System in the new subnet, create VMK with temp IP in new subnet, connect to ESXi Web UI via new Temp IP in new Subnet via temp Windows machine. Reconfigure default TCP/IP stack default gateway, change VMK0 IP address (and edit management port group VLAN id if applicable). and Away I’d go.

However on this one host for some unknown stupid reason it would simply fail “Failed – An Error occurred during host configuration”, and the detailed log was just as vague “operation failed diagnostics report unable to set network unreachable” OK… whatever, that shouldn’t matter do as I tell you! Here’s a snippet of the error, and the CLI command that simply worked without bitching.

I just figured let’s try the CLI way and see if it worked, and it turns out it did. The source I used to figure out the command syntax.

The commands I used:

Get IPs:

esxcli network ip interface ipv4 get

Set new IP:

esxcli network ip interface ipv4 set -i vmk1 -I 1.1.1.1 -N 255.255.255.0 -t static

Hope this helps someone.

FreeNAS Single SSD as ZIL and L2ARC

Quick Story I remember I set this up on my FreeNAS server in hopes to get better performance, in reality, I don’t think it helped anything cause of my FreeNAS servers setup. Which was an old desktop with 3 Gigs of memory and a couple SATA drives, 2 spindle and 1 SSD.

Took me a while but I finally found the original source I followed.

Main Parts (assume SSD is ada0):

root@freenas1:~ % gpart create -s gpt ada0
ada0 created
root@freenas1:~ % gpart add -a 4k -b 128 -t freebsd-zfs -s 10G ada0
ada0p1 added
root@freenas1:~ % gpart add -a 4k -t freebsd-zfs ada0
ada0p2 added

List Disk to get GUIDs

root@freenas1:~ % gpart list

Add partitions as Zil and L2ARC on a logical disk (volume0)

root@freenas1:~ % zpool add volume0 log gptid/94a4bd28-aeb7-11e5-99ac-bc5ff42c6cb2
root@freenas1:~ % zpool add volume0 cache gptid/9a79622f-aeb7-11e5-99ac-bc5ff42c6cb2

Nice you can use the zpool command to verify their used as such:

If you are paying attention you’ll noticed the guid are different. Anyway you can use the GUI to see the results as well, if you click the main volume under Storage -> Volume, Then click the Show details button.

If you pick any of the partitions in this list, at the bottom you get a button labeled “Remove”. To undo the previous additions made via the back end SSH.

After this I removed the old Volume completely, including the old File based extent I was using on it.

I then created all new Volumes, one volume on each drive, then created 1 zVol on each volume, then used those zVols as Device based Extents on the iSCSI service…. and I couldn’t believe the performance increase, I couldn’t saturate the 1gbps link before with storage vMotions. Now every single Datastore maxes out the NIC and I hot 100 MB/s plus on every storage vMotion and I increased my storage capacity. W00t (of course I never had storage redundancy to begin with so nothing lost, all gains.

Summary

Don’t bother using a SSD to try and gain speeds on simple homelab FreeNAS servers. It’s useless… “Some more specifics: as a rule of thumb L2ARC is only really useful if you have lots of RAM (64GB+) and a ZIL is only useful if you’re performing lots of synchronous writes.” – anodos

Upgrade and Migrate a vCenter Server

Intro

Hello everyone! Today I’ll be doing a test in my home lab where I will be upgrading, not to be confused with updating, a vCenter server. If you are interested in staying on the version your vCenter is currently on but just patch to the latest version, see my other blog post: VMware vCenter Updates using VAMI – Zewwy’s Info Tech Talks

Before I get into it, there are a couple thing expected from you:

  1. An existing instance of vCenter deployed (for me yup, 6.7)
  2. A backup of the config or whole server via a backup product
  3. A Copy of the latest vCenter ISO (either from VMware directly or for me from VMUG)

Side Story

*Interesting Side Note* VM Creation dates property is only a thing since vCenter 6.7. Before that it was in the events table that gets rotated out from retention policies. 🙂

*Side Note 2* I was doing some vmotions of VMs to prepare rebooting a storage device hosting some datastores before the vCenter update, and oddly even though the Task didn’t complete it would disappear from the recent task view. Clicking all Tasks showed the task in progress but @ 0% so no indication of the progress. The only trick that worked for me was to log off and back in.

A quick little side story, it was a little while since I had logged into VMUG for anything, and I have to admit the site setup is unbelievably bad designed. It’s so unintuitive I had to Google, again, how to get the ISO’s I need from VMUG.

Also for some reason, I don’t know why, when I went to log in it stated my username and password is wrong. Considering I use a password manager, I was very confident it was something wrong on their end. Attempting to do a password reset, provided no email to my email address.

Distort I decided to make a another account with the same email, which oddly enough when created brought me right back to my old account on first log in. Super weird. According to Reddit I was not the only one to experience oddities with VMUG site.

Also on the note of VMware certification, I totally forgot you have to take one of the mandatory classes before you can challenge, or take any of the VMware exams.

“Without the mandatory training? Yes, they represent a reasonable value proposition. With mandatory training? No, they do not. Requiring someone who’s been using your products for a decade to attend a class which covers how to spell ESXi is patronizing if not downright condescending. I only carry VMware certifications because I was able to attain them without going through the nonsense mandatory training.”

“The exam might as well cost $3500 and “include” the class for “free”.”

Don’t fully agree with that last one cause you can take any one class (AFAIK) and take all the exams. I get the annoyance of the barrier to entry, gotta keep the poor out. 😛

Simple Summary about VMUG.

  1. Create account and Sign up for Advantage from the main site.
  2. Download Files from their dedicated Repo Site.

Final gripes about VMUG:

  1. You can’t get Offline Bundles to create custom ESXi images.
  2. You can’t seem to get older versions of the software from there.
  3. The community response is poor.
  4. The site is unintuitive and buggy.

So now that we finally got the vCenter 7 ISO

For a more technical coverage of updating vCenter see VMware’s guide.

For shits.. moving esxi hosts, and vcenter to new subnet.

1) Build Subnet, and firewall rules and vlans
2) Configure all hosts with new VMPG for new vlan
3) Move each host one at a time to new subent, ensure again that network will be allowed to the vCenter server after migration
4) Can’t change VMK for mgmt to use VLAN from the vCenter GUI, have to do it at host level.
i) Place host into maintenance mode, remove from inventory (if host were added by IP, otherwise just disconnect)
ii) Update hosts IP address via the hosts console, and update DNS records
iii) Re-add the host to the cluster via new DNS hostname

Changing vCenter Server IP address

Source: How to change vCenter and vSphere IP Address ( embedded PSC ) – Virtualblog.nl

changed IP address in the VAMI, it even changed the vpxa config serverIP address to the new IP automatically. it worked. :O

Upgrading vCenter

Using the vCenter ISO

The ISO is not a bootable one, so for me I mount it on to a Windows machine that has access to the vCenter server.

Run the installer exe file…

Click Upgrade

I didn’t enter the source ESXi host IP.. lets see

nope wants all the info, fill all fields including source esxi host info.

Yes.

Target ESXi Host for new VCSA deployment. Next

Target VCSA VM info. Next

Would you like, large or eXtra large?

pick VMs datastore location, next.

VM temp info, again insure network connections are open between subnets if working with segregated networks.

Ready to deploy.

Deploying VM to target ESXi host. Once this was done got a message to move on to Stage 2, which can be done later, I clicked next.

Note right here, when you get a prompt for entering the Root password, I found it to be the target Root password not actually the source.

Second Note Resolving Certs Expired Pre-Check

While working on a client upgrade, it was more in my face when doing the source server pre-checks and would not continue stating certificates expired.

I was wondering how to check Existing certs and while this KB states you can check it via the WebUI There  could be a couple issues.

1) You might not even be able to login into the WebUI as mentioned in this Blog, a bit of a catch 22. (Note* same goes for SSO domains, it can’t be managed by VAMI, so if there’s an AD issue with a source, you often get a service 503 error attempting to log on to the WebUI)

2) It might not even show up in that area of the WebUI.

In these cases I managed to find this blog post… which shockingly enough is the very guy who wrote the fixsts script used to fix my problem in this very blog post :O

Checking Certs via the CLI

Grab Script from This VMware KB

Download the checksts.py script attached to the above KB article.
Upload to attached script to the VCSA or external PSC.

For example, /tmp

Once the script has been successfully uploaded to VCSA, change the directory to /tmp.

For example:

cd /tmp

Run python checksts.py.

OK Dokie then, I guess this script doesn’t check the required cert… so instead I followed along with this VMware KB (Yes another one).

In which case I ran the exact commands as specified in the KB and saved the certificate to a txt, file and opened it up in Windows by double clicking the .crt file.

openssl s_client -connect MGMT-IP:7444 | more

So now instead of running the fixsts script, this KB states to run the following to reset this certificate to use the Machine Cert (self signed with valid date stamps, at least that’s what this server showed when checking them via the Certificate management are in the vCenter WebUI).

For the Appliance (I don’t deal with the Windows Server version as it EOL)

/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store MACHINE_SSL_CERT --alias __MACHINE_CERT > /var/tmp/MachineSSL.crt
/usr/lib/vmware-vmafd/bin/vecs-cli entry getkey --store MACHINE_SSL_CERT --alias __MACHINE_CERT > /var/tmp/MachineSSL.key
/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT > /var/tmp/sts_internal_backup.crt
/usr/lib/vmware-vmafd/bin/vecs-cli entry getkey --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT > /var/tmp/sts_internal_backup.key
/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT -y
/usr/lib/vmware-vmafd/bin/vecs-cli entry create --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --cert /var/tmp/MachineSSL.crt --key /var/tmp/MachineSSL.key

Then:

  • service-control --stop --all
  • service-control --start --all

In my case for some odd reason I saw a bunch of these… when stopping and starting the services

2021-09-20T18:35:47.049Z Service vmware-sts-idmd does not seem to be registered with vMon. If this is unexpected please make sure your service config is a valid json. Also check vmon logs for warnings.

I was nervous at first I may have broke it, after sometime it didn’t complete the startup command sequence, and after some time the WebUI was fully accessible again. Let’s validate the cert with the same odd method we did above.

Which sure enough showed a date valid cert that is the machine cert, self-signed.

Running the Update Wizard… Boooo Yeah!

 

Uhhh, ok….

 

Ok dokie?

I didn’t care too much about old metrics.

nope.

Let’s go!

After some time…

Nice! and it appears to have worked. 🙂

Another Side Trail

I was excited cause I deployed this new VCSA off the FreeNAS Datastore I wanted to bring and reboot. but low and behold some new random VMs are on the Datastore…

doing some research I found this simple explanation of them however it wasn’t till I found this VMware article with the info I was more after.

Datastore selection for vCLS VMs

The datastore for vCLS VMs is automatically selected based on ranking all the datastores connected to the hosts inside the cluster. A datastore is more likely to be selected if there are hosts in the cluster with free reserved DRS slots connected to the datastore. The algorithm tries to place vCLS VMs in a shared datastore if possible before selecting a local datastore. A datastore with more free space is preferred and the algorithm tries not to place more than one vCLS VM on the same datastore. You can only change the datastore of vCLS VMs after they are deployed and powered on.

If you want to move the VMDKs for vCLS VMs to a different datastore or attach a different storage policy, you can reconfigure vCLS VMs. A warning message is displayed when you perform this operation.

You can perform a storage vMotion to migrate vCLS VMs to a different datastore. You can tag vCLS VMs or attach custom attributes if you want to group them separately from workload VMs, for instance if you have a specific meta-data strategy for all VMs that run in a datacenter.

In vSphere 7.0 U2, new anti-affinity rules are applied automatically. Every three minutes a check is performed, if multiple vCLS VMs are located on a single host they will be automatically redistributed to different hosts.

Note:When a datastore is placed in maintenance mode, if the datastore hosts vCLS VMs, you must manually apply storage vMotion to the vCLS VMs to move them to a new location or put the cluster in retreat mode. A warning message is displayed.

The enter maintenance mode task will start but cannot finish because there is 1 virtual machine residing on the datastore. You can always cancel the task in your Recent Tasks if you decide to continue.
The selected datastore might be storing vSphere Cluster Services VMs which cannot be powered off. To ensure the health of vSphere Cluster Services, these VMs have to be manually vMotioned to a different datastore within the cluster prior to taking this datastore down for maintenance. Refer to this KB article: KB 79892.

Select the checkbox Let me migrate storage for all virtual machines and continue entering maintenance mode after migration. to proceed.

huh, the checkbox is greyed out and I can’t click it.
vmotioned them and the process kept moving up.

How a Small Mistake Became a Big Problem

Background

This story is about implantation of compliance requirements, and the technical changes made that caused some heartburn. In particular Exchange server and retention policies.

Very simple; Compliance, and regulatory practices.

Retention Policy was becoming enforced. As such on Exchange no less, see here for more information on how to configure retention policies on Exchange.

You might notice you have to create tags of time frames. In this case there wasn’t one already pre-populated with my needs. So you have to create those. You may have also noticed that the only name time frame is all defined with number of days.

Human Error

So long story short, I  wrongly defined the number of days for the length of period I wanted to defined. Simply due to bad arithmetic, I swear I was an ace at math in school. Anyway, after this small mistake on the tag definition, and it was deployed to all Mailboxes. (Yeap there was steps of approval, and wasn’t caught even during pilot users).

Once it was discovered, there were 2 options. 1) Wait till specific people notice and  recover as required, or  2) do it all in one swoop.

Recover deleted messages in a user’s mailbox in Exchange Server | Microsoft Docs

After following this, it was determined that we couldn’t find just the emails from the time frame we needed to restore. This turns out cause all the emails “whenChanged” timestamp all became the same time the retention policy came into effect. So filtering by Date was completely useless.

Digging a Hole

At this point we figured we’d just restore all email, and let the retention policy rerun with the proper time frame tag applied. While this did work, there was a technique or property that was recently added that would have restored the emails into the sub folders in which they were removed from. Instead, all the emails were placed back into users Inbox.

This was a rough burn.  Overall it did work, it just wasn’t very clean and there was some fallout from the whole ordeal.

Hope this story helps someone prevent the same mistakes.