vMotion – Not Allowed in the Current State

First things first, I vMotioned the vm to another host and that worked fine, so the issue appeared to be target related. I also found this post, which states to restart the mgmt and vpxa services:

/etc/init.d/hostd restart

/etc/init.d/vpxa restart

doing this on the source ESXi did nothing, again seeming the issue is on the target. Did the same tasks on the target and it still failed.

I then disconnected the target esxi, put it in maintenance mode, rebooted it, took it out of maintenance mode, reconnected to vCenter, and this time the vMotino worked.

Hope this helps someone.

ESXi 6.x Datastore Not Mounted

Quick post here, I had to recover from a flooded basement. Sorry for the day outage. I had to put my disc in another server and load FreeNAS, and import my ZFS volumes, recreate the iSCSI targets, and then I added them to my ESXi hosts, and rescanning the HBAs shows the disks…

but the datastores were not visible…

so I googled and found this VMware thread with some helpful commands to try. (I do kind of agree with the OP, that its annoying they removed the front end UI for import that could handle this)

esxcli storage vmfs snapshot list

esxcfg-volume -M UUID

Ehh it worked!

Hope this helps someone. If this doesn’t work you might have some other underling issue?

ESXi /tmp is Full

I’ll keep this post short and to the point. Gott errors in the alerts.

I was like huh, interesting… go to validate it on the host by logging in via SSH then typing the command:

vdf -h

At the bottom you can see /tmp space usage:

I then found out about this cool command from this thread:

find /tmp/ -exec ls -larth '{}' \;

This will list all the files and their sizes to gander at, when I noticed a really large file:

I decided to look up this file and found this lovely VMware KB:

The Workaround:

echo > /tmp/ams-bbUsg.txt

The solution:

To fix the issue, upgrade to VMware AMS to version 11.4.5 (included in the HPE Offline Bundle for ESXi version 3.4.5), available at the following URLs:

HPE Offline Bundle for ESXi 6.7 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-a38161c3e8674777a8c664e05a

HPE Offline Bundle for ESXi 6.5 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-7d214544a7e5457e9bb48e49af

HPE Offline Bundle for ESXi 6.0 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-98c6268c29b3435e8d285bcfcc

Procedure

  1. Power off any virtual machines that are running on the host and place the host into maintenance mode.
  2. Transfer the offline bundle onto the ESXi host local path, or extract it onto an online depot.
  3. Install the bundle on the ESXi host.
    1. Install remotely from client, with offline bundle contents on a online depot:
      esxcli -s <server> -u root -p mypassword software vib install -d <depotURL/bundle-index.xml>
    2. Install remotely from client, with offline bundle on ESXi host:
      esxcli -s <server> -u root -p mypassword software vib install -d <ESXi local path><bundle.zip>
    3. Install from ESXi host, with offline bundle on ESXi host:
      esxcli software vib install -d <ESXi local path><bundle.zip>
  4. After the bundle is installed, reboot the ESXi host for the updates to take effect.
  5. (Optional) Verify that the vibs on the bundle are installed on your ESXi host.
    esxcli software vib list
  6. (Optional) Remove individual vibs. <vib name> can be identified by listing the vibs as shown in #5.
    esxcli software vib remove -n <vib name>

    Summary

    Use the commands shown to trace the source of the usage, your case may not be as easy. Once found hopefully find a solution. In my case I got super lucky and other people already found the problem and solution.

Veeam – More Than One Replica Candidate Found

Story Time!

The Problem!

So real quick one here. I edited a Replication job and changed it source form production to a backup dataset within the Veeam Replication Job settings. I went to run the replication job and was presented with an error I have no seen before…

I had an idea of what happened (I believe the original ESXi host might have been rebuilt) I’m not 100% sure, but just speculating. I was pretty sure the change I made on the job was not the source of the problem.

Since I wasn’t concerned about the target VM being re-created entirely I decided to go to Veeam’s Replica’s, and right clicked the target VM, and picked Delete from Disk… to my amazement the same error was presented…

Alright… kind of sucks, but here’s how I resolved it.

The Solution

Sadly I had to right click the Target VM under Veeams Replicas, and instead picked “Remove from Configuration”. What’s really annoying about this is it will remove the source VM from the replication job itself as well.

Why? Unno Veeams coding choices...

So after successfully removing the target VM from Veeam’s configuration, I manually deleted the target VM on the host ESXi host. Then I had to reconfigure the replication job and point it to the source VM again. Again if your interested in why that’s the case see the link above.

After that the job ran successfully. Hope this helps someone.

vCenter Appliance Failed File Based Backup

Story Time

*UPDATE* VMware has pulled this garbage mess of an update version of vSphere. Why?

1) They PSOD ESXi Hosts...

2) Broke more shit then they fixed...

3) Broke and silently removed protocols for File Based Backups (This post)

As much as the backup failed, I failed along with it,

Task. Backup the vCenter Server using VAMI to create a file based backup.

Now for a ESXi host, you can do this super easy (at least the config so install new and simply load the config)

For a deep and better understanding of backing up and restoring ESXi host’s please read this really amazing blog post by Michael Bose from NAKIVO.

Back up ESXi configuration:

vim-cmd hostsvc/firmware/backup_config

and You will get a simple URL to download the file right to your management machine/computer.

Does vCenter have something like this? (from my research…) No.

You use the vCenter Server Interface to perform a file-based backup of the vCenter Server core configuration, inventory, and historical data of your choice. The backed-up data is streamed over FTP, FTPS, HTTP, HTTPS, SFTP, NFS, or SMB to a remote system. The backup is not stored on the vCenter Server.

Which hasn’t been updated since 2019. Let’s make a couple things here clear:

  1. The HTTP and HTTPS mentioned above are not like the ESXi style mentioned above where it creates a nice backup file locally on the VCSA and presents you with a simple URL to navigate to, to download it. It expects the HTTP/HTTPS to be a file based server to accept file transfers to (like dropbox).
  2. Lots of these “supported” protocols have pretty bad bugs, or simply don’t even work at all. Which well see below.

Doing the Theory

So OK, l log into VAMI, Click the Backup tab on the left hand nav, try to add a open SMB path I have available to use cause, why not, make my life some what easy…

Looking this up I get: VAMI Backup with SMB reports error: “Path not exported by the remote filesystem” (86069) (vmware.com) dated Oct 28,2021. Nice, nice.

Alrighty then, I’ll just spin up a dedicated FTP service on my freeNas box I guess. I learnt a couple things about chroot and local users via FTP, but the short and sweet was I created a local account on the FreeNAS box, created a Dataset under than existing mounted logical volume, and granted that account access to the path. Then enabled local user login for the FTP server, and specified that path as the user’s home path, and enabled chroot on the FTP service, so when this user logs in all they can see is their home path, which to that user appears as root. This (I felt) was a fair bit of security on it, even though its a lab and not needed, just nice…. ANYWAY… Once I had an FTP server ready….

Now I went to Start a File based backup of the vcenter server:

First Error: Service Not Running

In my case I got an error that the PSC Health service was not running, this might just be cause my lack of decent hardware for good performance might have caused some services to not start up in a timely manner. Either way, Navigating to Services in VAMI and started the PSC Health service. Lucky for me there was no further errors on this part.

If you have service errors you will have to check them out and get the required services up and running, which is out the scope of this post.

Second Error: Number of Connections

The next error I got complained about the allowed number of connections to the target.

Which in my case there was an option on the FreeNAS FTP service configurations for this, I adjusted it to “0” or unlimited in hopes to resolve this problem:

restart the service, and try again…

Third Error: Unknown

This is starting to get annoying…

What kind of vague error is that?!

Guy in this thread states the path has to be empty? what?

I tried that, cleared some more space, and it seems to have sorta worked?

Clear the FTP users home path, and try again:

Fourth Problem: Stuck @ 95%

The Job appeared to run but I noticed a couple things:

1) Even though the backup config said the overall size would only be roughly 400MB, the job ran to around 1.8 Gigs.

2)  All I/O appeared to stop and all Resources returned to an idle state, while the job remained stuck processing at 95%.

OK… I found this thread, which suggested to restart the autodeploy service, tried that and it didn’t work, the job remained stuck @ 95%.

I also found this VMware KB,  however,

1) I have a tiny deployment so no chance my DB would be 300Gigs.

2) When I went to check the “buggy python script” the “workaround” seemed to already have been implemented. So the versions of vCenter I was on (7.0u3a) already had this “fix” in place

3) The symptoms still remain to be exactly the same and the python scripts remain in a “sleeping” state.

FFS already….

Try Anyway

Well I saw the files were created, so I decided to try the restore method on the VCSA deployment wizard anyway…

I forgot to take a snippet here, but it basically stated there was a missing metafile.json file. I can only assume that when the backup process was stuck at 95% it never created this required json file…

FUCK….

One Scheduled Run

I noticed that I suppose overnight a scheduled job tried to run and provided yet a different error message:

Well that’s still pretty vague, as far as I know there should be no connectivity issues since file were created all the way up to 1.8 gigs, so I don’t see how it’s network, or permissions related, or even available space in this case, since all files were cleared, up to the already possible and shown to be written 1.8 gigs, which have been deleted to empty the path every time.

Liek seriously, wtf gives here. The fact there’s an entirely new KB with an entire Table of list of shit that apparently is wrong with this file based backup honestly begs the question, Where the FUCK is the QA in software these days? This shit is just fucking ridiculous already…

Check the Logs

*This Log file only gets created the first time you click “configure” under the backup section of VAMI.

Here’s how to access the logs:

Using putty or similar, SSH in as root on the appliance.
Type Shell at the prompt.
Type cd /var/log/vmware/applmgmt.
Type more backup.log or tail backup.log.

[VCDB-WAL-Backup:PID-42812] [VCDB::_backup_wal_files:VCDB.py:797] INFO: VCDB backup WAL start not received yet.

Checking the entry I find this thread. Along with this Reddit Post. Which leads right back to the first shared thread, which states some bitching about the /etc/issues files… and I have a strange feeling, just like the stuck @ 95% issue, I’ll look at the file and it will probably be correct just like the guy who created the Reddit post.

Try Alternative Protocols

When I tried alternative protocols I came across more issues:

NFS – Had the same path issue SMB did “Path not exported by remote system”

SCP – Was apparently silently dropped, much like what this thread mentioned. The amount of silence on that thread speaks volumes to me.

TFTP was also dropped.

You are so Fucked

Soo I wonder if I try to “upgrade” aka downgrade using the UI installer of a supposed version that works (7.0u2b)…

Alright so let me get this straight… I upgraded, and now I can’t make a backup cause the upgraded version is completely broken it terms of its File Basked Backups.

I can’t Roll back the upgrade without having kept the old VCSA, which was removed in my case since all other services was working, vSphere itself.

I can’t “downgrade” and existing one, I can’t make a backup to restore my old ones. OK fine well how about a huge FUCK YOU VMWARE. while I try to come up with some sort of work around for this utter fucking mess.

Infected Mushroom – U R So F**ked [HQ & 1080p] – YouTube

Work around option #1

Build a brand new vCenter, add hosts, and reconfigure.

The main issue here is the fact if you rely on CBT, you will be fucked and all the VM-IDs will have changed, so you will have to:

1) Edit and adjust all back up jobs to point to the new VM, via it’s new VM-IM.

2) Let the delta files be all recalculated (which can be major I/O on storage units depending on many different factors (# of VM, Size of VMs, change of files on VMs, etc)

Not and option I want to explore just yet.

Work Around option #2

Back and restore the config database?

Let’s try.. first backup…

copy python scripts (hope they not all buggy and messed up too..)

Stop required services:

service-control --stop vmware-vpxd
service-control --stop vmware-content-library

change the script permissions

chmod +x backup_lin.py

Run it:

Make a copy of it via WinSCP.

run the restore script… and

well was worth a shot but that failed too….

Lets try PG dump for shits…

I’d really recommend to read this blog post by Florian Grehl on Virden.net for great information around using postgres on vCenter.

Connect to server via SSH (SSH enabled required on vCenter).

“To connect to the database, you have to enable SSH for the vCenter Server, login as root, and launch the bash shell. When first connecting to the appliance, you see the “Appliance Shell”. Just enter “shell” to enter the fully-featured bash shell.

The simplest way to connect to the databases is by using the “postgres” user, which has no password. It is convenient to also use the -d option to directly connect to the VCDB instance.”

# /opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB

Cool, this lets us know the postgres DB service is running. The most important take away from Florian’s post is:

“When connecting, make sure that you use the psql binaries located in /opt/vmware/vpostgres/current/bin/ and not just the psql command. The reason is that VMware uses a more recent version than it is provided by the OS. In vSphere 7.0 for example, the OS binaries are at version 10.5 while the Postgres server is running 11.6”

Kool, I could use pg_dumpall but I found it didn’t work (maybe that was wrong version of vcenter being mixed, not sure) either way lets try just the VCDB instance…

interesting, lol, as you see I got an error about version mismatch. I found this thread about it and with the info from Florians post, had an idea, tried it out, and it actually worked. Mind… BLOWN.

rm /usr/bin/

OK let’s take this file and place it on the newly deployed vcenter.

even though restore appeared to have worked the vCenter instance booted and showed to be like new install. Was worth a shot I guess, but did not work.

Work Around Option #3

I’m not sure this is even a fair option, as it only works if you have existing backup of alternative types. In my case I use Veeam and its saved my bacon I don’t know how many times.

Sure enough Veeam saved my bacon again. I ended up restoring a copy of my vCenter before the 7.0u3a, which happened to be on 7.0u2d.

I managed to add a SMB path without it erroring, and unreal, I ran a File Based Backup and it actually succeeded!!

Now I just simply run the deploy wizard, and pick restore to build a new vCenter server from this backup.

Ahhh VMware… dammit you got me again!

alright fine… grabs yet another copy of vCenter…

and this time…

are you fucking kidding me? Mhmmm interesting… VCSA 7.0 restore issue – VMware Technology Network VMTN

ok… good to know…

From this… to this….

then Deploy again…

It stated it failed, due to user auth. However I was able to login and verify it worked, but sadly it also instantly expired the license as well. I was hoping I could get another 60 days without creating a new center, reconfiguring and breaking my VM-IDs and CBT delta points for my backup software.

Even this link states what I’m trying to do is not possible… ugh the struggles are real!

In the end just started from scratch, Ugh,

When to VMware Snapshot

OK real quick short post here. I figured I’d take a snapshot of my vCenter server (reason will be next blog post).  In this case I decided to snapshot the VM with memory saving, I figured it would be faster than bringing the VM back up from a shutdown state as that’s what a normal snapshot would do.

In most cases that would probably be a fair assumption, but boy was I wrong.

It Took a short time save the snapshot but almost 15 min or more to bring the VM back to full operational status with all memory back in tact… just check out these charts:

Here you can see it took maybe 5 min to save the memory state to disk, this would have been a time of 0 minutes since a normal snapshot doesn’t save memory to disk. Then you can see the slower longer recovery time it took to get the same memory from disk and put it back into memory.

Of course taking a solid guess that disk I/O is much slower than Memory I/O the bottle would have to be non other than the actual disk….

Yup there’s the same matching results of fast disk writes, and slow disk reads…

and there’s the disk being 100% bust on the read requests. I’m not sure why the read performance on this drive was as bad as it was, but I have a feeling a regular boot would have been faster… I’ll update this post if I do an actual test.

Meh, pretty much same amount of time… I think I need some super fast local storage… yet I’m so cheap I never do…. cheap bastard…

Changing vCenter Hostname

Changing vCenter Hostname

Why?!?! Cause I gotta!

Source: Changing your vCenter Server’s FQDN – VMware vSphere Blog

PreReqs, AKA Checklist

  • Backup all vCenter Servers that are in the SSO Domain before changing the FQDN of the vCenter Server(s)
  • Supports Enhanced Linked Mode (ELM)
  • Changing the FQDN is only supported for embedded vCenter Server nodes
  • Products which are registered with vCenter Server will first need to be unregistered prior to an FQDN change. Once the FQDN change is complete they can then be reregistered.
  • vCenter HA (VCHA) should be destroyed prior to an FQDN change and reconfigured after changes
  • All custom certificates will need to be regenerated
  • Hybrid Linked Mode with Cloud vCenter Server must be recreated
  • vCenter Server that has been renamed will need to be rejoined back to Active Directory
  • Make sure that the new FQDN/Hostname is resolvable to the provided IP address (DNS A records)

NOTE: If the vCenter Server was deployed using the IP as PNID/FQDN, then the following should also be considered:

  • The PNID change workflow cannot be used to change the IP address of vCenter Server
  • The PNID change workflow cannot be used to change the FQND of vCenter Server

In this scenario, use the vCenter Server Appliance Management Interface (VAMI) to update hostnames or IP changes directly. 

The main thing I was expecting was the certificate issue. In my home lab, I removed SSO domain before this change (just using vpshere.local), no ELM, already using embedded (all-in-one), no VCHA, no Hybird, oh yeah…. not sure if you “leave an SSO domain”, before joining back to AD…

My Only Pre-Req

I went into DNS and pre-created A host records for the new server hostname: vCenter.zewwy.ca

Steps

Basically log into VAMI, and change the name.

Then

and and…. well WTF…

No matter what I do it’s greyed out… I thought maybe the untrusted cert, might be an issue so tried from a machine with full trusted chain, and same issue!

Like…. Why… why is Next greyed out? It’s like whatever Button Validation code is written for it is not being triggered, is this a browser version issue? I can’t find anything online with anyone having this issue…. Why? Cause I was right, it was the input validation…

Honestly, this is one of those MASSIVE facepalm moments in my life. I only realized after the fact the username field was NOT auto filled, it was only a label that was greyed and provided as a suggestion… Fill both fields and the next is ungreyed…

Step 4, check the checkbox to acknowledge the warning, and away… she goes!

At which point I clicked redirect now (both web addresses were still available as it didn’t seem to matter which you came from, the cert was untrusted either way, cause the CA not in my trusted ca store)

5 minutes later….

I tell ya nothing more annoying than a spinning circle and the warning “don’t refresh” when the status bar simply does not move… sure got some conflicting messages here….

*Starts to sweat*…

after about 10 minutes time…

More Certificate Fun!

Alright so after this, quick take always… when I went to check the site it was “untrusted” but not for the reason I had thought, I thought it would have been from the same issue as the source blog, and be the hostname on the cert but that was not the case, instead it was imply the the cert chain seemed to be missing, and the issuer could not be verified:

as well as:

So what to do about this… You can download the CA cert from vcenter/certs/download.zip (some reason I had to use IE). Then install the CA cert. (I noticed even after I did this I still had cert warning, error, but after the next day, maybe cache clearing or update, it reported green in the web browser).

Now when I logged in, I got the ol Cert Alert in the vCenter UI

first thing to try is removing old CA’s

Which I did, following this VMware KB

I simply followed my other post about this, and just cleared reset to green on the alert. (Still good days later).

Backup Solutions

Don’t forget to change the server in your backup software, such as I had to do this in Veeam.

These were my results…

Which go figure errored out…

So right click, go to properties of the object… Next, next…

Accept the certs new certificate

Now you figure all is well, but when I went to create a new backup job, when I attempted to expand the vcenter server in Veeam. It just hung there…

I ended up rebooting the server, and then waiting for all the Veeam services to be started. I reopened Veeam, and when to Inventory, clicked the vCenter server, took a second and then showed all the hosts, and the VMs. I clicked it and rescanned to be safe and got this result which was a bit different then the applied settings confirmation above. I think maybe I forgot to rescan the host after applying the new settings, assuming it would have done that as part of the properties change wizard.

which lucky for me now worked, and I was able to select a VM in the Veeam backup wizard, and it successfully backed up the VM.

Final Caveats

like what the heck, everywhere else its changed except at the shell. Let’s see if we change change this.

Well that was easy enough, no reboot required. 🙂

I also found the local hosts file doesn’t update either, in the file it states it managed by VAMI, so many have to look there for potential solutions:

I noticed this since I had to do a work around for something else, and sure enough caught it. I’ll change it manually with vi for now and see what changes after a reboot.

Summary

Overall, literally quick n easy.

  1. Verify DNS records exist.
  2. Use VAMI to edit hostname via editing the Network MGMT settings and change the hostname, click apply and wait.
  3. Manually clear out the old Certs that were created under the old hostname.
  4. Reconfigure you backup solution, which is vender specific (I provided step for Veeam as that is the Backup Vender I like to use)

Overall the task seemed to go pretty smooth. I’ll follow up with any other issue I might come across in the future. Cheers.

 

 

Change vCenter FQDN or IP on Veeam

Story

I recently did a infrastructure upgrade on my home lab, which included moving all my esxi hosts into a dedicate subnet, and making them all more dependent on DNS. This has it’s pros n cons, after all my ESXi host had their IP addresses changed. I also moved my vCenter and changed it IP address, which is now supported yay.

Now I had to move Veeam along with it, originally it was in the same subnet as the esxi hosts, and vCenter which have all moved, instead of trying to manage cross subnet comms, I changed Veaam’s IP address and pointed it’s DNS settings to my AD DNS which has all the ESXi and vCenter host records. Was easy enough just changing the Windows NIC Ip address, and changing the VM’s VMPG.

 How to

Now when I went to scan the vCenter instance in Veeam, it complained about the certificate, since it was renewed from the vCEnter upgrade. I decided I’d change it to be based on DNS now that everything else is as well. When I went to edit the object in Veeam it was greyed out.  Lucky for me Veeam had a KB ready to go.

Challenge

The Name/FQDN/IP of the vCenter Server has changed, and needs to be updated within Veeam Backup & Replication.

Solution

Solves Name Change Only
This solution applies ONLY if the vCenter Server database has not changed.
(I did an upgrade so yes, which you’d want to preserve VM-IDs, and chains)

If the Name/FQDN/IP of the vCenter changed due to a reinstall or upgrade, and a new vCenter database was used, the Ref-IDs will have changed. Due to the changed Ref-IDs you will need to follow the documented process in www.veeam.com/KB1299

Step 1

Prior to running the commands below you need to identify the Name\FQDN\IP Veeam is using to communicate with the VC currently. To do this, edit the entry for the vCenter under Backup Infrastructure and note the “Name:”.

Next perform the following steps to change that VMware Server’s name.

Step 2

Launch PowerShell from inside the Veeam Backup & Replication console. You can find the “PowerShell” button under the File-menu’s “Console” section.

The Veeam Backup & Replication PowerShell Tookit will load.

Step 3

Run the following command:

$Servers = Get-VBRServer -name "old-name"

Replace old-name with the “Name” current set for the vCenter in the Veeam Backup and Replication Console

Step 4

Run the following command next to change the name:

$Servers.SetName("new-name")

Replace new-name with the new name for the vCenter, this can be an IP, Hostname, or FQDN.
Do not remove the quotes on either side.
This change will go in to effect as soon as the command in Step 4 completes.

How I did it – One Liner

Verify:

Get-VBRServer -name "Name from Step 1"

Change:

(Get-VBRServer -name "Name From Step 1").SetName("new.domain.com")

Results:

Now you can click next, Apply, should get right past checking certificate if the certificates are all good… and end up with the follow after rescan:

That was easy enough, I don’t fully understand why the grey out the UI to make this change, but there you have it. Happy Backups!