Fix Orphaned Datastore in vCenter

Story

The Precursor

I did NOT want to write this blog post. This post comes from the fact that VMware is not perfect and I’m here to air some dirty laundry…. Let’s get started.

UPDATE* Read on if you want to get into the nitty gritty, otherwise go to the Summary section, for me rebooting the VCSA resolved the issue.

The Intro

OK, I’ll keep this short. 1 vCenter, 2 hosts, 1 cluster. 1 host started to act “weird”; Random power off,   Boots normal but USB controller not working.

Now this was annoying … A LOT, so I decided I would install ESXi on the local RAID array instead of USB.

Step 1) Make a backup of the ESXi config.

Step 2) Re-install ESXi. When I went to re-install ESXi it stated all data in the exiting datastore would be deleted. Whoops lets move all data first.

Step 2a) I removed all data from the datastore

Step 2b) Delete the Datastore, , and THIS IS THE STEP THAT CAUSED ME ALL FUTURE GRIEF IN THIS BLOG POST! DO NOT FORGET TO DO THIS STEP! IF YOU DO YOU WILL HAVE TO DO EVERYTHING ELSE THIS POST IS TALKING ABOUT!

Unmount, and delete the datastore. YOU HAVE BEEN WARNED!

*during my testing I found this was not always the case. I was however able to replicate the issue in my lab after a couple of attempts.

Step 3) Re-install ESXi

Step 4) Reload saved Config file, and all is done.

This is when my heart sunk.

The Assumptions

I had the following wrong assumptions during this terrible mistake:

  1. Datastore names are saved in the backup config.
    INCORRECT – Datastore names are literally volume labels and stay with the volume in which they were created on.
    UUID is stored on the device FS SuperBlock.
  2. Removing an orphaned Object in vCenter would be easy.
  3. Renaming a Datastore would be easy.
    1. If the host is managed by vCenter Server, you cannot rename the datastore by directly accessing the host from the VMware Host Client. You must rename the datastore from vCenter Server.
  4. Installing on USB drive defaults all install mount points on the USB drive.
    INCORRECT – There’s magic involved.

Every one of these assumptions burnt me hard.

The Problem

So it wasn’t until I clicked on the datastore section of vCenter when my heart sunk. The old datastore was listed attempting to right click and delete the orphaned datastore shot me with another surprise…. the options were greyed out, I went to google to see if I was alone. It turns out I was not alone, but the blog source I found also did not seem very promising… How to easily kill a zombie datastore in your VMware vSphere lab | (tinkertry.com)

Now this blog post title is very misleading, one can say the solution he did was “easy” but guess what … it’s not support by VMware. As he even states “Warning: this is a bit of a hack, suited for labs only”. Alright so this is no good so far.

There was one other notable source. This one mentioned looking out for related objects that might still be linked to the Datastore, in this case there was none. It was purely orphaned.

Talking to other in #VMware on libera chat, told me it might be possibly linked to a scratch location which is probably the reason for the option being greyed out, while this might be a reasonable case for a host, for vCenter in which the scratch location is dependent on a host itself, not vCenter, it should have the ability to clear the datastore, as the ESXi host itself will determine where the scratch location is stored (foreshadowing, this causes me more grief).

In my situation, unlike tinkertry’s situation, I knew exactly what caused the problem, I did not rename the datastore accordingly. Since the datastore name was not named appropriately after being re-created, it was mounted and shown as a new datastore.

The Plan

It’s one thing to fuck up, it another to fess up, and it’s yet another to have a plan. If you can fix your mistake, it’s prime evidence of learning and growing as you live life. One must always perceiver. Here’s my plan.

Since building the host new and restoring the config with a wrong datastore, I figured I’d I did the same but with the proper datastore in place, I should be able to remove it by bringing it back up.

I had a couple issues to overcome. First one was my 3rd assumption: That renaming a datastore was easy. Which, usually, it is, however… in this case attempting to rename it the same as the missing datastore simply told me the datastore already exists. Sooo poop, you can’t do it directly from a ESXi host unless it’s not managed by vCenter. So as you can tell a catch22, the only way to get past this was to do my plan, which was the same as how I got in this mess to being with. But sadly I didn’t know how bad a hole I had created.

So after installing brand new on another USB stick, I went to create the new datastore with the old name, overwriting the partition table ESXi install created… and you guessed it. Failed to create VMFS datastore. Cannot change host configuration. – Zewwy’s Info Tech Talks

Obviously I had gone through this before, but this time was different. it turned out attempting to clear the GPT partition table and replace it with msdos(MBR) based one failed telling me it was a read-only disk. Huh?

Googling this, I found this thread which seemed to be the root cause… Yeap my 4th assumption: “Installing on USB drive defaults all install mount points on the USB drive.”

so doing a “ls -l”, and “esxcli storage filesystem list” then “vmkfstools -P MOUNTPOINT” I was veriy esay to discover that the scratch and coredump were pointing to the local RAID logical volume I created which overwrote the initial datastore when ESXi was installed. Talk about a major annoyance, like I get why it did what it did, but in this case it is  major hindrance as I can’t clear the logical disk partition to create a new one which will be hold the datastore I need to have mounted there… mhmmm

So I kept trying to change the core dump location and the scratch location and the host on reboot kept picking the old location which was on the local RAID logical volume that kept preventing me from moving forward. Regardless if I did it via the GUI or if I did it via the backend cmd “vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /tmp/scratch” even though VMware KB mentions to create this path path first with mkdir what I found was the creation of this path was not persistent, and it would seem that since it doesn’t exist at boot ESXi changes it via it’s usual “Magic”:

“ESXi selects one of these scratch locations during startup in this order of preference:
The location configured in the /etc/vmware/locker.conf configuration file, set by the ScratchConfig.ConfiguredScratchLocation configuration option, as in this article.
A Fat16 filesystem of at least 4 GB on the Local Boot device.
A Fat16 filesystem of at least 4 GB on a Local device.
A VMFS datastore on a Local device, in a .locker/ directory.
A ramdisk at /tmp/scratch/.”

So in this case, I found this post around a similar issue, and turns out setting the scratch location to just /tmp, worked.

When I attempted to wipe the drive partitions I was again greeted by read-only, however this time it was right back to the coredump location issues, which I verified by running:

esxcli system coredump partition get

which showed me the drive, so I used the unmounted final partition of the USB stick in it’s place:

esxcli system coredump partition set -p USBDriveNAA:PartNum

Which sure enough worked, and I was able to set the logical drive to have a msdos based partition, yay I can finally re-create the datastore and restore the config!

So when the OP in that one VMware thread post said congrats you found 50% of the problem I guess he was right it goes like this.

  1. Scratch
  2. Coredump

Fix these and you can reuse the logical drive for a datastore. Let’s re-create that datastore…

This is hen my heart sunk yet again…

So I created the datastore successfully however… I had to learn about those peskey UUID’s…

The UUID is comprised of four components. Lets understand this by taking example of one of the vmfs volume’s UUID : 591ac3ec-cc6af9a9-47c5-0050560346b9

System Time (591ac3ec)
CPU Timestamp (cc6af9a9)
Random Number (47c5)
MAC Address – Management Port uplink of the host used to re-signature or create the datastore (0050560346b9)

FFS… I can never be able to reproduce that… and sure enough thats why my UUIDs not longer aligned:

I figured maybe I could make the file, and create a custom symlink to that new file with the same name, but nope “operation not permitted”:

Fuck! well now I don’t know if i can fix this, or if restoring the config with the same datastore name but different UUID will fix it or make things worse…. fuck me man…. not sure I want to try this… might have to do this on my home lab first…

Alright I finally was able to reproduce the problem in my home lab!

Let’s see if my idea above will work…

Step 1) Make config Backup of ESXi host. (should have one before mess up but will use current)

Step 2) Reload host to factory defaults.

Step 3) rename datastore

Step 4) reload config

poop… I was afraid of that…

ok i even tried, disconnecting host from vcenter after deleting the datstore  I could, recreate with same name and it always attaches with appending (1) cause the datastore exists as far as vCenter thinks, since the UUID can never be recovered… I heard a vCenter reboot may help let’s see…

But first I want to go down a rabbit hole…. the Datastore UUID, in this case the ACTUAL datastore UUID, not the one listed in a VM’s config file (.vmx), not the one listed in the vCenter DB’s (that we are trying to fix), but the one actually associated with the Datastore… after much searching…  it seems it is saved in the File Systems “SuperBlock“, in most other File Systems there’s some command to edit the UUID if you really need to. However, for VMFS all I could find was re-signaturing for cloned volumes

So it would seem if I simply would have saved the first 4MB of the logical disk, or partition, not 100% sure which at this time, but I could have in theory done a DD to replace it and recovered the original UUID and then connect the host back to vCenter.

I guess I’ll try a reboot here see what happens….

Well look at that.. it worked…

Summary

  1. Try a reboot
  2. If reboot does not Fix it call VMware Support.
  3. If you don’t have support, You can try to much the with backend DB (do so at your own risk).

 

Azure AD and the ADConnect

Here’s what I did, I found this MS doc for reference:

  1. I followed this to guide me to make the “primary” tenant.
    no, I did not check either checkbox, **** em!
  2. I read this content to understand the tenant hierarchy.
  3. I added a custom domain (zewwy.ca), it said, sure no problem no federation issues, just verify. (Create a TXT record on the registrar to verify you own domain.)
    *refresh the page and the status will update accordingly.
  4.  I proceeded to download the Azure AD Connect msi file via the provided link after adding the custom domain.
  5. Install: (This was on Server 2016 Core)

2015.. interesting…

Click Accept Next.

Enter the Credentials from Step 1 (or enter the credentials provided by your MSP/CSP/VAR.

Enter the credentials of the local domain, enterprise admin account.

If you wish to do a hybrid Exchange setup check the second checkbox, Not sure how to configure this later but I’m sure there is a way. At this time that was not part of this post’s goals.

There was one snippet I missed, it appears to install a SQL express on the DC.

Then it appears to install a dedicated service.

This is Ground Control to Major Tom…

This is Major Tom to Ground Control… You’ve really made the grade!

They got all my passwords!

wait … it worked…. like what? No Errors?… No Service account creations? It actually just worked?…

Goto azure portal login, use my on prem credentials… and it logged me in….

I’m kind of mind blown right now. Well Guess on the next post can cover possibly playing with M365 services. Stay tuned. 😀

How to vMotion a VM without vCenter

Well here I am… again…

In short, you figure… “Ummm just vMotion the VM in vCenter” and for the most part I would agree, however what do you do if you need to move a VM, for example vCenter, and it just so happens to be on an ESXi host that is not within a cluster with other similar ESXi hosts, or in a cluster without EVC? (In most cases rare, sure) However I happened to be just in that situation recently.

First thing I thought I’d just copy the files via ESXi console, using the CP command, and it for the most part it seemed to work for one smaller VM. However when I went to do it against vCenter. It seemed to be going longer then I had expected. After nearly an hour… I decided to see what was going on… but since I was just using CP command how do I find out the processes time?

Yes, by running stat on target file and local file, and get a file size,

i.e stat -c “%s” /bin/ls”

Oh neat, so when I went to check the source was 28.5 Gigs…. and the target was 94 Gigs… wait wait what??? I can only assume something messed up with the copying cause the files were thin provisioned… not sure stopped the process and deleted the files…

Now I began to Google search and I wasn’t searching properly and found useless results such as this: Moving virtual machines with Storage vMotion (1005544) (vmware.com) then I got my act together and found exactly what I was looking for from here: How to move VMware ESXi VM to new datastore using vmkfstools | Alessandro Arrichiello (alezzandro.com)

So basically I liked this, and it was what I needed, I was just slightly annoyed that 1) there wasn’t a nice way to do multiple VMDKs via his examples, just for all the other files, so I took the one liner from the other files trick, and found out how to get the path I need from the files in question.

Low and behold here’s how to do the magic!

1) This assumes a shared datastore between hosts (if you need to move files between hosts without a shared datastore, follow this guide from VMware arena.) (I’m not sure but I think you can leave the VM’s registered, but they have to be powered down, and that there are no snapshots.)

2) Ensure you make the directory you wish to move the VM files to.

mkdir /vmfs/volumes/DatastoreTarget/VMData

3) Copy/Clone VMDK files to target.

find "/vmfs/volumes/DatastoreSource/VMData" -maxdepth 1 -type f | grep ".vmdk" | grep -v "flat" | while read file; do vmkfstools -i $file -d thin /vmfs/volumes/DatastoreTarget/VMData/${file##*/}; done

4) Copy remain files to target.

find "/vmfs/volumes/DatastoreSource/VMData/" -maxdepth 1 -type f | grep -v ".vmdk" | while read file; do cp "$file" "/vmfs/volumes/DatastoreTarget/VMData/"; done

Once done cloning and copying all necessary files, add the VM from the new datastore back to inventory.

In the vSphere client go to: Configuration->Storage->Data Browser, right click the destination datastore which you moved your VM to and click “Browse datastore”.

Browse to your VM and right click the .vmx file, then click “Add to inventory”

Boot up the VM to see if it works, when asked whether you copied or moved it, just answer that you moved it. In this case it all depends on if you want the VM DI to stay the same as it is known within vCenter. As long as you properly delete the old files and removed it from the host inventory, this will complete the VM migration. If you don’t plan on deleting the old VM, or do not care about VM IDs or backups, then select “I copied it”.

Hope this helps someone.

Failed to create VMFS datastore. Cannot change host configuration.

Quick one here. Create a new logical disk via RAID5, after an old logical unit failed from only a single bad disk.

No issues deleting the old logical disk, and creating a new one via HP storage controller commands.

However was greeted with this nice error.

From here, by Cookies04: ”

I had the same problem and in order to fix it I had to run three commands through an SSH connection. From what I have seen and found this error comes from having disks that were part of different arrays and contain some data on them. When I ran the commands I was then able to connect the data stores with no issues.

1. Show connected disks.

ls -lha /vmfs/devices/disks/

(Verify the disk is seen. You will probably see your disk ID then :1. This is a partition on the disk. We only need to work about the main disk ID.)

2. Show the error on disk.

partedUtil getptbl /vmfs/devices/disks/(disk ID)

(It will probably indicate that the GPT is located beyond the end of the disk.)

3. Wipe disk and rewrite with a basic MSDOS partion.

partedUtil setptbl /vmfs/devices/disks/(disk ID) msdos

(The output from this should be similar to msdos and the next line will be o o o o)

I hope this helps you out.”

Looks like it worked… Thanks Cookie04!

ESXi /tmp is Full

I’ll keep this post short and to the point. Gott errors in the alerts.

I was like huh, interesting… go to validate it on the host by logging in via SSH then typing the command:

vdf -h

At the bottom you can see /tmp space usage:

I then found out about this cool command from this thread:

find /tmp/ -exec ls -larth '{}' \;

This will list all the files and their sizes to gander at, when I noticed a really large file:

I decided to look up this file and found this lovely VMware KB:

The Workaround:

echo > /tmp/ams-bbUsg.txt

The solution:

To fix the issue, upgrade to VMware AMS to version 11.4.5 (included in the HPE Offline Bundle for ESXi version 3.4.5), available at the following URLs:

HPE Offline Bundle for ESXi 6.7 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-a38161c3e8674777a8c664e05a

HPE Offline Bundle for ESXi 6.5 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-7d214544a7e5457e9bb48e49af

HPE Offline Bundle for ESXi 6.0 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-98c6268c29b3435e8d285bcfcc

Procedure

  1. Power off any virtual machines that are running on the host and place the host into maintenance mode.
  2. Transfer the offline bundle onto the ESXi host local path, or extract it onto an online depot.
  3. Install the bundle on the ESXi host.
    1. Install remotely from client, with offline bundle contents on a online depot:
      esxcli -s <server> -u root -p mypassword software vib install -d <depotURL/bundle-index.xml>
    2. Install remotely from client, with offline bundle on ESXi host:
      esxcli -s <server> -u root -p mypassword software vib install -d <ESXi local path><bundle.zip>
    3. Install from ESXi host, with offline bundle on ESXi host:
      esxcli software vib install -d <ESXi local path><bundle.zip>
  4. After the bundle is installed, reboot the ESXi host for the updates to take effect.
  5. (Optional) Verify that the vibs on the bundle are installed on your ESXi host.
    esxcli software vib list
  6. (Optional) Remove individual vibs. <vib name> can be identified by listing the vibs as shown in #5.
    esxcli software vib remove -n <vib name>

    Summary

    Use the commands shown to trace the source of the usage, your case may not be as easy. Once found hopefully find a solution. In my case I got super lucky and other people already found the problem and solution.

Veeam – More Than One Replica Candidate Found

Story Time!

The Problem!

So real quick one here. I edited a Replication job and changed it source form production to a backup dataset within the Veeam Replication Job settings. I went to run the replication job and was presented with an error I have no seen before…

I had an idea of what happened (I believe the original ESXi host might have been rebuilt) I’m not 100% sure, but just speculating. I was pretty sure the change I made on the job was not the source of the problem.

Since I wasn’t concerned about the target VM being re-created entirely I decided to go to Veeam’s Replica’s, and right clicked the target VM, and picked Delete from Disk… to my amazement the same error was presented…

Alright… kind of sucks, but here’s how I resolved it.

The Solution

Sadly I had to right click the Target VM under Veeams Replicas, and instead picked “Remove from Configuration”. What’s really annoying about this is it will remove the source VM from the replication job itself as well.

Why? Unno Veeams coding choices...

So after successfully removing the target VM from Veeam’s configuration, I manually deleted the target VM on the host ESXi host. Then I had to reconfigure the replication job and point it to the source VM again. Again if your interested in why that’s the case see the link above.

After that the job ran successfully. Hope this helps someone.

Exchange Certificates and SMTP

Exchange and the Certificates

Quick Post here… If you need to change Certificates on a SMTP receiver using TLS.. how do you do it?

You might be inclined to search and find this MS Doc source: Assign certificates to Exchange Server services | Microsoft Docs

What you might notice is how strange the UI is designed, you simple find the certificate, and in it’s settings check off to use SMTP.

Then in the connectors options, you simply check off TLS.

Any sensible person, might soon wonder… if you have multiple certificates, and they can all enable the check box for SMTP, and you can have multiple connectors with the checkbox enabled for TLS…. then… which cert is being used?

If you have any familiarity with IIS you know that you have multiple sites, then you go enable HTTPS per site, you define which cert to use (usually implying the use of SNI).

When I googled this I found someone who was having a similar question when they were receiving a unexpected cert when testing their SMTP connections.

I was also curious how you even check those, and couldn’t find anything native to Windows, just either python, or openSSL binaries required.

Anyway, from the first post seems my question was answered, in short “Magic”…

“The Exchange transport will pick the certificate that “fits” the best, based on the if its a third party certificate, the expiration date and if a subject name on the certificate matches what is set for the FQDN on the connector used.” -AndyDavid

Well that’s nice…. and a bit further down the thread someone mentions you can do it manually, when they source non other than the Exchange Guru himself; Paul Cunninham.

So that’s nice to know.

The Default Self Signed Certificate

You may have noticed a fair amount of chatter in that first thread about the default certificate. You may have even noticed some stern warnings:

“You can’t unless you remove the cert. Do not remove the built-in cert however. ” “Yikes. Ok, as I mentioned, do not delete that certificate.”-AndyDavid

Well the self signed cert looks like is due to expire soon, and I was kind of curious, how do you create a new self signed certificate?

So I followed along, and annoyingly you need an SMB shared path accessible to the Exchange server to accomplish this task. (I get it; for clustered deployments)

Anyway doing this and using the UI to assign the certificates to all the required services. Deleted the old Self Signed Cert, wait a bit, close the ECP, reopen it and….

I managed to find this ms thread with the same issue.

The first main answer was to “wait n hour or more”, yeah I don’t think that’s going to fix it…

KarlIT700 – ”

Our cert is an externally signed cert that is due to expire next year so we wanted to keep using it and not have to generate a new self sign one.

We worked around this by just running the three PS commands below in Exchange PS

Set-AuthConfig -NewCertificateThumbprint <WE JUST USED OUR CURRENT CERT THUMPRINT HERE> -NewCertificateEffectiveDate (Get-Date)
Set-AuthConfig -PublishCertificate
Set-AuthConfig -ClearPreviousCertificate

 

Note: that we did have issues running the first command because our cert had been installed NOT allowing the export of the cert key. once we reinstalled the same cert back into the (local Computer) personal cert store but this time using the option to allow export of the cert key, the commands above worked fine.

We then just needed to restart ISS and everything was golden. :D”

Huh, sure enough this MS KB on the same issue..

The odd part is running the validation cmdlet:

(Get-AuthConfig).CurrentCertificateThumbprint | Get-ExchangeCertificate | Format-List

Did return the certificate I renewed UI the ECP webUI… even then I decided to follow the rest of the steps, just as Karl has mentioned using the thumbprint from the only self signed cert that was there.

Which sure enough worked and everything was working again with the new self signed cert.

Anyway, figured maybe this post might help someone.

vCenter Appliance Failed File Based Backup

Story Time

*UPDATE* VMware has pulled this garbage mess of an update version of vSphere. Why?

1) They PSOD ESXi Hosts...

2) Broke more shit then they fixed...

3) Broke and silently removed protocols for File Based Backups (This post)

As much as the backup failed, I failed along with it,

Task. Backup the vCenter Server using VAMI to create a file based backup.

Now for a ESXi host, you can do this super easy (at least the config so install new and simply load the config)

For a deep and better understanding of backing up and restoring ESXi host’s please read this really amazing blog post by Michael Bose from NAKIVO.

Back up ESXi configuration:

vim-cmd hostsvc/firmware/backup_config

and You will get a simple URL to download the file right to your management machine/computer.

Does vCenter have something like this? (from my research…) No.

You use the vCenter Server Interface to perform a file-based backup of the vCenter Server core configuration, inventory, and historical data of your choice. The backed-up data is streamed over FTP, FTPS, HTTP, HTTPS, SFTP, NFS, or SMB to a remote system. The backup is not stored on the vCenter Server.

Which hasn’t been updated since 2019. Let’s make a couple things here clear:

  1. The HTTP and HTTPS mentioned above are not like the ESXi style mentioned above where it creates a nice backup file locally on the VCSA and presents you with a simple URL to navigate to, to download it. It expects the HTTP/HTTPS to be a file based server to accept file transfers to (like dropbox).
  2. Lots of these “supported” protocols have pretty bad bugs, or simply don’t even work at all. Which well see below.

Doing the Theory

So OK, l log into VAMI, Click the Backup tab on the left hand nav, try to add a open SMB path I have available to use cause, why not, make my life some what easy…

Looking this up I get: VAMI Backup with SMB reports error: “Path not exported by the remote filesystem” (86069) (vmware.com) dated Oct 28,2021. Nice, nice.

Alrighty then, I’ll just spin up a dedicated FTP service on my freeNas box I guess. I learnt a couple things about chroot and local users via FTP, but the short and sweet was I created a local account on the FreeNAS box, created a Dataset under than existing mounted logical volume, and granted that account access to the path. Then enabled local user login for the FTP server, and specified that path as the user’s home path, and enabled chroot on the FTP service, so when this user logs in all they can see is their home path, which to that user appears as root. This (I felt) was a fair bit of security on it, even though its a lab and not needed, just nice…. ANYWAY… Once I had an FTP server ready….

Now I went to Start a File based backup of the vcenter server:

First Error: Service Not Running

In my case I got an error that the PSC Health service was not running, this might just be cause my lack of decent hardware for good performance might have caused some services to not start up in a timely manner. Either way, Navigating to Services in VAMI and started the PSC Health service. Lucky for me there was no further errors on this part.

If you have service errors you will have to check them out and get the required services up and running, which is out the scope of this post.

Second Error: Number of Connections

The next error I got complained about the allowed number of connections to the target.

Which in my case there was an option on the FreeNAS FTP service configurations for this, I adjusted it to “0” or unlimited in hopes to resolve this problem:

restart the service, and try again…

Third Error: Unknown

This is starting to get annoying…

What kind of vague error is that?!

Guy in this thread states the path has to be empty? what?

I tried that, cleared some more space, and it seems to have sorta worked?

Clear the FTP users home path, and try again:

Fourth Problem: Stuck @ 95%

The Job appeared to run but I noticed a couple things:

1) Even though the backup config said the overall size would only be roughly 400MB, the job ran to around 1.8 Gigs.

2)  All I/O appeared to stop and all Resources returned to an idle state, while the job remained stuck processing at 95%.

OK… I found this thread, which suggested to restart the autodeploy service, tried that and it didn’t work, the job remained stuck @ 95%.

I also found this VMware KB,  however,

1) I have a tiny deployment so no chance my DB would be 300Gigs.

2) When I went to check the “buggy python script” the “workaround” seemed to already have been implemented. So the versions of vCenter I was on (7.0u3a) already had this “fix” in place

3) The symptoms still remain to be exactly the same and the python scripts remain in a “sleeping” state.

FFS already….

Try Anyway

Well I saw the files were created, so I decided to try the restore method on the VCSA deployment wizard anyway…

I forgot to take a snippet here, but it basically stated there was a missing metafile.json file. I can only assume that when the backup process was stuck at 95% it never created this required json file…

FUCK….

One Scheduled Run

I noticed that I suppose overnight a scheduled job tried to run and provided yet a different error message:

Well that’s still pretty vague, as far as I know there should be no connectivity issues since file were created all the way up to 1.8 gigs, so I don’t see how it’s network, or permissions related, or even available space in this case, since all files were cleared, up to the already possible and shown to be written 1.8 gigs, which have been deleted to empty the path every time.

Liek seriously, wtf gives here. The fact there’s an entirely new KB with an entire Table of list of shit that apparently is wrong with this file based backup honestly begs the question, Where the FUCK is the QA in software these days? This shit is just fucking ridiculous already…

Check the Logs

*This Log file only gets created the first time you click “configure” under the backup section of VAMI.

Here’s how to access the logs:

Using putty or similar, SSH in as root on the appliance.
Type Shell at the prompt.
Type cd /var/log/vmware/applmgmt.
Type more backup.log or tail backup.log.

[VCDB-WAL-Backup:PID-42812] [VCDB::_backup_wal_files:VCDB.py:797] INFO: VCDB backup WAL start not received yet.

Checking the entry I find this thread. Along with this Reddit Post. Which leads right back to the first shared thread, which states some bitching about the /etc/issues files… and I have a strange feeling, just like the stuck @ 95% issue, I’ll look at the file and it will probably be correct just like the guy who created the Reddit post.

Try Alternative Protocols

When I tried alternative protocols I came across more issues:

NFS – Had the same path issue SMB did “Path not exported by remote system”

SCP – Was apparently silently dropped, much like what this thread mentioned. The amount of silence on that thread speaks volumes to me.

TFTP was also dropped.

You are so Fucked

Soo I wonder if I try to “upgrade” aka downgrade using the UI installer of a supposed version that works (7.0u2b)…

Alright so let me get this straight… I upgraded, and now I can’t make a backup cause the upgraded version is completely broken it terms of its File Basked Backups.

I can’t Roll back the upgrade without having kept the old VCSA, which was removed in my case since all other services was working, vSphere itself.

I can’t “downgrade” and existing one, I can’t make a backup to restore my old ones. OK fine well how about a huge FUCK YOU VMWARE. while I try to come up with some sort of work around for this utter fucking mess.

Infected Mushroom – U R So F**ked [HQ & 1080p] – YouTube

Work around option #1

Build a brand new vCenter, add hosts, and reconfigure.

The main issue here is the fact if you rely on CBT, you will be fucked and all the VM-IDs will have changed, so you will have to:

1) Edit and adjust all back up jobs to point to the new VM, via it’s new VM-IM.

2) Let the delta files be all recalculated (which can be major I/O on storage units depending on many different factors (# of VM, Size of VMs, change of files on VMs, etc)

Not and option I want to explore just yet.

Work Around option #2

Back and restore the config database?

Let’s try.. first backup…

copy python scripts (hope they not all buggy and messed up too..)

Stop required services:

service-control --stop vmware-vpxd
service-control --stop vmware-content-library

change the script permissions

chmod +x backup_lin.py

Run it:

Make a copy of it via WinSCP.

run the restore script… and

well was worth a shot but that failed too….

Lets try PG dump for shits…

I’d really recommend to read this blog post by Florian Grehl on Virden.net for great information around using postgres on vCenter.

Connect to server via SSH (SSH enabled required on vCenter).

“To connect to the database, you have to enable SSH for the vCenter Server, login as root, and launch the bash shell. When first connecting to the appliance, you see the “Appliance Shell”. Just enter “shell” to enter the fully-featured bash shell.

The simplest way to connect to the databases is by using the “postgres” user, which has no password. It is convenient to also use the -d option to directly connect to the VCDB instance.”

# /opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB

Cool, this lets us know the postgres DB service is running. The most important take away from Florian’s post is:

“When connecting, make sure that you use the psql binaries located in /opt/vmware/vpostgres/current/bin/ and not just the psql command. The reason is that VMware uses a more recent version than it is provided by the OS. In vSphere 7.0 for example, the OS binaries are at version 10.5 while the Postgres server is running 11.6”

Kool, I could use pg_dumpall but I found it didn’t work (maybe that was wrong version of vcenter being mixed, not sure) either way lets try just the VCDB instance…

interesting, lol, as you see I got an error about version mismatch. I found this thread about it and with the info from Florians post, had an idea, tried it out, and it actually worked. Mind… BLOWN.

rm /usr/bin/

OK let’s take this file and place it on the newly deployed vcenter.

even though restore appeared to have worked the vCenter instance booted and showed to be like new install. Was worth a shot I guess, but did not work.

Work Around Option #3

I’m not sure this is even a fair option, as it only works if you have existing backup of alternative types. In my case I use Veeam and its saved my bacon I don’t know how many times.

Sure enough Veeam saved my bacon again. I ended up restoring a copy of my vCenter before the 7.0u3a, which happened to be on 7.0u2d.

I managed to add a SMB path without it erroring, and unreal, I ran a File Based Backup and it actually succeeded!!

Now I just simply run the deploy wizard, and pick restore to build a new vCenter server from this backup.

Ahhh VMware… dammit you got me again!

alright fine… grabs yet another copy of vCenter…

and this time…

are you fucking kidding me? Mhmmm interesting… VCSA 7.0 restore issue – VMware Technology Network VMTN

ok… good to know…

From this… to this….

then Deploy again…

It stated it failed, due to user auth. However I was able to login and verify it worked, but sadly it also instantly expired the license as well. I was hoping I could get another 60 days without creating a new center, reconfiguring and breaking my VM-IDs and CBT delta points for my backup software.

Even this link states what I’m trying to do is not possible… ugh the struggles are real!

In the end just started from scratch, Ugh,

When to VMware Snapshot

OK real quick short post here. I figured I’d take a snapshot of my vCenter server (reason will be next blog post).  In this case I decided to snapshot the VM with memory saving, I figured it would be faster than bringing the VM back up from a shutdown state as that’s what a normal snapshot would do.

In most cases that would probably be a fair assumption, but boy was I wrong.

It Took a short time save the snapshot but almost 15 min or more to bring the VM back to full operational status with all memory back in tact… just check out these charts:

Here you can see it took maybe 5 min to save the memory state to disk, this would have been a time of 0 minutes since a normal snapshot doesn’t save memory to disk. Then you can see the slower longer recovery time it took to get the same memory from disk and put it back into memory.

Of course taking a solid guess that disk I/O is much slower than Memory I/O the bottle would have to be non other than the actual disk….

Yup there’s the same matching results of fast disk writes, and slow disk reads…

and there’s the disk being 100% bust on the read requests. I’m not sure why the read performance on this drive was as bad as it was, but I have a feeling a regular boot would have been faster… I’ll update this post if I do an actual test.

Meh, pretty much same amount of time… I think I need some super fast local storage… yet I’m so cheap I never do…. cheap bastard…