vCenter Appliance Failed File Based Backup

Story Time

*UPDATE* VMware has pulled this garbage mess of an update version of vSphere. Why?

1) They PSOD ESXi Hosts...

2) Broke more shit then they fixed...

3) Broke and silently removed protocols for File Based Backups (This post)

As much as the backup failed, I failed along with it,

Task. Backup the vCenter Server using VAMI to create a file based backup.

Now for a ESXi host, you can do this super easy (at least the config so install new and simply load the config)

For a deep and better understanding of backing up and restoring ESXi host’s please read this really amazing blog post by Michael Bose from NAKIVO.

Back up ESXi configuration:

vim-cmd hostsvc/firmware/backup_config

and You will get a simple URL to download the file right to your management machine/computer.

Does vCenter have something like this? (from my research…) No.

You use the vCenter Server Interface to perform a file-based backup of the vCenter Server core configuration, inventory, and historical data of your choice. The backed-up data is streamed over FTP, FTPS, HTTP, HTTPS, SFTP, NFS, or SMB to a remote system. The backup is not stored on the vCenter Server.

Which hasn’t been updated since 2019. Let’s make a couple things here clear:

  1. The HTTP and HTTPS mentioned above are not like the ESXi style mentioned above where it creates a nice backup file locally on the VCSA and presents you with a simple URL to navigate to, to download it. It expects the HTTP/HTTPS to be a file based server to accept file transfers to (like dropbox).
  2. Lots of these “supported” protocols have pretty bad bugs, or simply don’t even work at all. Which well see below.

Doing the Theory

So OK, l log into VAMI, Click the Backup tab on the left hand nav, try to add a open SMB path I have available to use cause, why not, make my life some what easy…

Looking this up I get: VAMI Backup with SMB reports error: “Path not exported by the remote filesystem” (86069) (vmware.com) dated Oct 28,2021. Nice, nice.

Alrighty then, I’ll just spin up a dedicated FTP service on my freeNas box I guess. I learnt a couple things about chroot and local users via FTP, but the short and sweet was I created a local account on the FreeNAS box, created a Dataset under than existing mounted logical volume, and granted that account access to the path. Then enabled local user login for the FTP server, and specified that path as the user’s home path, and enabled chroot on the FTP service, so when this user logs in all they can see is their home path, which to that user appears as root. This (I felt) was a fair bit of security on it, even though its a lab and not needed, just nice…. ANYWAY… Once I had an FTP server ready….

Now I went to Start a File based backup of the vcenter server:

First Error: Service Not Running

In my case I got an error that the PSC Health service was not running, this might just be cause my lack of decent hardware for good performance might have caused some services to not start up in a timely manner. Either way, Navigating to Services in VAMI and started the PSC Health service. Lucky for me there was no further errors on this part.

If you have service errors you will have to check them out and get the required services up and running, which is out the scope of this post.

Second Error: Number of Connections

The next error I got complained about the allowed number of connections to the target.

Which in my case there was an option on the FreeNAS FTP service configurations for this, I adjusted it to “0” or unlimited in hopes to resolve this problem:

restart the service, and try again…

Third Error: Unknown

This is starting to get annoying…

What kind of vague error is that?!

Guy in this thread states the path has to be empty? what?

I tried that, cleared some more space, and it seems to have sorta worked?

Clear the FTP users home path, and try again:

Fourth Problem: Stuck @ 95%

The Job appeared to run but I noticed a couple things:

1) Even though the backup config said the overall size would only be roughly 400MB, the job ran to around 1.8 Gigs.

2)  All I/O appeared to stop and all Resources returned to an idle state, while the job remained stuck processing at 95%.

OK… I found this thread, which suggested to restart the autodeploy service, tried that and it didn’t work, the job remained stuck @ 95%.

I also found this VMware KB,  however,

1) I have a tiny deployment so no chance my DB would be 300Gigs.

2) When I went to check the “buggy python script” the “workaround” seemed to already have been implemented. So the versions of vCenter I was on (7.0u3a) already had this “fix” in place

3) The symptoms still remain to be exactly the same and the python scripts remain in a “sleeping” state.

FFS already….

Try Anyway

Well I saw the files were created, so I decided to try the restore method on the VCSA deployment wizard anyway…

I forgot to take a snippet here, but it basically stated there was a missing metafile.json file. I can only assume that when the backup process was stuck at 95% it never created this required json file…

FUCK….

One Scheduled Run

I noticed that I suppose overnight a scheduled job tried to run and provided yet a different error message:

Well that’s still pretty vague, as far as I know there should be no connectivity issues since file were created all the way up to 1.8 gigs, so I don’t see how it’s network, or permissions related, or even available space in this case, since all files were cleared, up to the already possible and shown to be written 1.8 gigs, which have been deleted to empty the path every time.

Liek seriously, wtf gives here. The fact there’s an entirely new KB with an entire Table of list of shit that apparently is wrong with this file based backup honestly begs the question, Where the FUCK is the QA in software these days? This shit is just fucking ridiculous already…

Check the Logs

*This Log file only gets created the first time you click “configure” under the backup section of VAMI.

Here’s how to access the logs:

Using putty or similar, SSH in as root on the appliance.
Type Shell at the prompt.
Type cd /var/log/vmware/applmgmt.
Type more backup.log or tail backup.log.

[VCDB-WAL-Backup:PID-42812] [VCDB::_backup_wal_files:VCDB.py:797] INFO: VCDB backup WAL start not received yet.

Checking the entry I find this thread. Along with this Reddit Post. Which leads right back to the first shared thread, which states some bitching about the /etc/issues files… and I have a strange feeling, just like the stuck @ 95% issue, I’ll look at the file and it will probably be correct just like the guy who created the Reddit post.

Try Alternative Protocols

When I tried alternative protocols I came across more issues:

NFS – Had the same path issue SMB did “Path not exported by remote system”

SCP – Was apparently silently dropped, much like what this thread mentioned. The amount of silence on that thread speaks volumes to me.

TFTP was also dropped.

You are so Fucked

Soo I wonder if I try to “upgrade” aka downgrade using the UI installer of a supposed version that works (7.0u2b)…

Alright so let me get this straight… I upgraded, and now I can’t make a backup cause the upgraded version is completely broken it terms of its File Basked Backups.

I can’t Roll back the upgrade without having kept the old VCSA, which was removed in my case since all other services was working, vSphere itself.

I can’t “downgrade” and existing one, I can’t make a backup to restore my old ones. OK fine well how about a huge FUCK YOU VMWARE. while I try to come up with some sort of work around for this utter fucking mess.

Infected Mushroom – U R So F**ked [HQ & 1080p] – YouTube

Work around option #1

Build a brand new vCenter, add hosts, and reconfigure.

The main issue here is the fact if you rely on CBT, you will be fucked and all the VM-IDs will have changed, so you will have to:

1) Edit and adjust all back up jobs to point to the new VM, via it’s new VM-IM.

2) Let the delta files be all recalculated (which can be major I/O on storage units depending on many different factors (# of VM, Size of VMs, change of files on VMs, etc)

Not and option I want to explore just yet.

Work Around option #2

Back and restore the config database?

Let’s try.. first backup…

copy python scripts (hope they not all buggy and messed up too..)

Stop required services:

service-control --stop vmware-vpxd
service-control --stop vmware-content-library

change the script permissions

chmod +x backup_lin.py

Run it:

Make a copy of it via WinSCP.

run the restore script… and

well was worth a shot but that failed too….

Lets try PG dump for shits…

I’d really recommend to read this blog post by Florian Grehl on Virden.net for great information around using postgres on vCenter.

Connect to server via SSH (SSH enabled required on vCenter).

“To connect to the database, you have to enable SSH for the vCenter Server, login as root, and launch the bash shell. When first connecting to the appliance, you see the “Appliance Shell”. Just enter “shell” to enter the fully-featured bash shell.

The simplest way to connect to the databases is by using the “postgres” user, which has no password. It is convenient to also use the -d option to directly connect to the VCDB instance.”

# /opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB

Cool, this lets us know the postgres DB service is running. The most important take away from Florian’s post is:

“When connecting, make sure that you use the psql binaries located in /opt/vmware/vpostgres/current/bin/ and not just the psql command. The reason is that VMware uses a more recent version than it is provided by the OS. In vSphere 7.0 for example, the OS binaries are at version 10.5 while the Postgres server is running 11.6”

Kool, I could use pg_dumpall but I found it didn’t work (maybe that was wrong version of vcenter being mixed, not sure) either way lets try just the VCDB instance…

interesting, lol, as you see I got an error about version mismatch. I found this thread about it and with the info from Florians post, had an idea, tried it out, and it actually worked. Mind… BLOWN.

rm /usr/bin/

OK let’s take this file and place it on the newly deployed vcenter.

even though restore appeared to have worked the vCenter instance booted and showed to be like new install. Was worth a shot I guess, but did not work.

Work Around Option #3

I’m not sure this is even a fair option, as it only works if you have existing backup of alternative types. In my case I use Veeam and its saved my bacon I don’t know how many times.

Sure enough Veeam saved my bacon again. I ended up restoring a copy of my vCenter before the 7.0u3a, which happened to be on 7.0u2d.

I managed to add a SMB path without it erroring, and unreal, I ran a File Based Backup and it actually succeeded!!

Now I just simply run the deploy wizard, and pick restore to build a new vCenter server from this backup.

Ahhh VMware… dammit you got me again!

alright fine… grabs yet another copy of vCenter…

and this time…

are you fucking kidding me? Mhmmm interesting… VCSA 7.0 restore issue – VMware Technology Network VMTN

ok… good to know…

From this… to this….

then Deploy again…

It stated it failed, due to user auth. However I was able to login and verify it worked, but sadly it also instantly expired the license as well. I was hoping I could get another 60 days without creating a new center, reconfiguring and breaking my VM-IDs and CBT delta points for my backup software.

Even this link states what I’m trying to do is not possible… ugh the struggles are real!

In the end just started from scratch, Ugh,

Changing vCenter Hostname

Changing vCenter Hostname

Why?!?! Cause I gotta!

Source: Changing your vCenter Server’s FQDN – VMware vSphere Blog

PreReqs, AKA Checklist

  • Backup all vCenter Servers that are in the SSO Domain before changing the FQDN of the vCenter Server(s)
  • Supports Enhanced Linked Mode (ELM)
  • Changing the FQDN is only supported for embedded vCenter Server nodes
  • Products which are registered with vCenter Server will first need to be unregistered prior to an FQDN change. Once the FQDN change is complete they can then be reregistered.
  • vCenter HA (VCHA) should be destroyed prior to an FQDN change and reconfigured after changes
  • All custom certificates will need to be regenerated
  • Hybrid Linked Mode with Cloud vCenter Server must be recreated
  • vCenter Server that has been renamed will need to be rejoined back to Active Directory
  • Make sure that the new FQDN/Hostname is resolvable to the provided IP address (DNS A records)

NOTE: If the vCenter Server was deployed using the IP as PNID/FQDN, then the following should also be considered:

  • The PNID change workflow cannot be used to change the IP address of vCenter Server
  • The PNID change workflow cannot be used to change the FQND of vCenter Server

In this scenario, use the vCenter Server Appliance Management Interface (VAMI) to update hostnames or IP changes directly. 

The main thing I was expecting was the certificate issue. In my home lab, I removed SSO domain before this change (just using vpshere.local), no ELM, already using embedded (all-in-one), no VCHA, no Hybird, oh yeah…. not sure if you “leave an SSO domain”, before joining back to AD…

My Only Pre-Req

I went into DNS and pre-created A host records for the new server hostname: vCenter.zewwy.ca

Steps

Basically log into VAMI, and change the name.

Then

and and…. well WTF…

No matter what I do it’s greyed out… I thought maybe the untrusted cert, might be an issue so tried from a machine with full trusted chain, and same issue!

Like…. Why… why is Next greyed out? It’s like whatever Button Validation code is written for it is not being triggered, is this a browser version issue? I can’t find anything online with anyone having this issue…. Why? Cause I was right, it was the input validation…

Honestly, this is one of those MASSIVE facepalm moments in my life. I only realized after the fact the username field was NOT auto filled, it was only a label that was greyed and provided as a suggestion… Fill both fields and the next is ungreyed…

Step 4, check the checkbox to acknowledge the warning, and away… she goes!

At which point I clicked redirect now (both web addresses were still available as it didn’t seem to matter which you came from, the cert was untrusted either way, cause the CA not in my trusted ca store)

5 minutes later….

I tell ya nothing more annoying than a spinning circle and the warning “don’t refresh” when the status bar simply does not move… sure got some conflicting messages here….

*Starts to sweat*…

after about 10 minutes time…

More Certificate Fun!

Alright so after this, quick take always… when I went to check the site it was “untrusted” but not for the reason I had thought, I thought it would have been from the same issue as the source blog, and be the hostname on the cert but that was not the case, instead it was imply the the cert chain seemed to be missing, and the issuer could not be verified:

as well as:

So what to do about this… You can download the CA cert from vcenter/certs/download.zip (some reason I had to use IE). Then install the CA cert. (I noticed even after I did this I still had cert warning, error, but after the next day, maybe cache clearing or update, it reported green in the web browser).

Now when I logged in, I got the ol Cert Alert in the vCenter UI

first thing to try is removing old CA’s

Which I did, following this VMware KB

I simply followed my other post about this, and just cleared reset to green on the alert. (Still good days later).

Backup Solutions

Don’t forget to change the server in your backup software, such as I had to do this in Veeam.

These were my results…

Which go figure errored out…

So right click, go to properties of the object… Next, next…

Accept the certs new certificate

Now you figure all is well, but when I went to create a new backup job, when I attempted to expand the vcenter server in Veeam. It just hung there…

I ended up rebooting the server, and then waiting for all the Veeam services to be started. I reopened Veeam, and when to Inventory, clicked the vCenter server, took a second and then showed all the hosts, and the VMs. I clicked it and rescanned to be safe and got this result which was a bit different then the applied settings confirmation above. I think maybe I forgot to rescan the host after applying the new settings, assuming it would have done that as part of the properties change wizard.

which lucky for me now worked, and I was able to select a VM in the Veeam backup wizard, and it successfully backed up the VM.

Final Caveats

like what the heck, everywhere else its changed except at the shell. Let’s see if we change change this.

Well that was easy enough, no reboot required. 🙂

I also found the local hosts file doesn’t update either, in the file it states it managed by VAMI, so many have to look there for potential solutions:

I noticed this since I had to do a work around for something else, and sure enough caught it. I’ll change it manually with vi for now and see what changes after a reboot.

Summary

Overall, literally quick n easy.

  1. Verify DNS records exist.
  2. Use VAMI to edit hostname via editing the Network MGMT settings and change the hostname, click apply and wait.
  3. Manually clear out the old Certs that were created under the old hostname.
  4. Reconfigure you backup solution, which is vender specific (I provided step for Veeam as that is the Backup Vender I like to use)

Overall the task seemed to go pretty smooth. I’ll follow up with any other issue I might come across in the future. Cheers.

 

 

How to remove a Datastore from a vSphere Cluster

How to Remove a Datastore

Intro

Hey everyone,

I figured I’d write up a quick little help guide on removing a Datastore. Now this isn’t new and likely to be buried on the internet because of it. However in my searches I have found the following sources to be great reads. I highly recommend you check them out.

1)  Official Source VMware KB2004605.

2) A Blog guide by Sam McGeown, here.

3) A post by Mike on cswitchzero.

Now let’s go through the checklist from the official source one by one.

Check List

  • If the LUN is being used as a VMFS datastore, all objects (for example, virtual machines, templates, and Snapshots) stored on the VMFS datastore must be unregistered or moved to another datastore.-This one is pretty easy navigate to the datastore files and check. You may find some remanence from the following though.
  • All CD/DVD images located on the VMFS datastore must also be unmounted/unregistered from the virtual machines.-This shouldn’t even be the case if you did check one.
  • The datastore is not used for vSphere HA heartbeat.-This setting will use a folder labeled “.vSphere-HA”
    For a Quick overview of Datastore Heart beating See here
    To “remove” aka change them See here
  • The datastore is not part of a datastore cluster.-You can find useless help on this process from VMware here. I’m assuming it’s an easy task via the WebUI
  • The datastore is not managed by Storage DRS.-If you removed it from the datastore cluster, how could this be an issue?
  • The datastore is not configured as a diagnostic coredump Partition/File and Scratch Partition. For more information, see the following:
  • Storage I/O Control is disabled for the datastore.-See here on how to enable (disabling is the exact reverse)
  • No third-party scripts or utilities running on the ESXi host can access the LUN that has issue.-Honestly I’m not sure how you could check this… even when doing some quick research, you can have scripts I guess that are not on the hosts, but run by alternative machines via PowerCLI. As described in this community post. I guess you’d have to know, either way the scripts would just fail, shouldn’t affect the vSphere cluster.
  • If the LUN is being used as an RDM, remove the RDM from the virtual machine. Click Edit Settings, highlight the RDM hard disk, and click Remove. Select Delete from disk if it is not selected and click OK.Note: This destroys the mapping file but not the LUN content.

    – This is more involving the removing of the backend physical device. Which in my case is the final goal. Though if yours was just to remove a datastore while keeping the physical storage in place this can be ignored.

  • As noted by Sam but not the official source or Mike is if you see a .dvsData folder. as stated by SAM “The .vdsData folder is created on any VMFS store that has a Virtual Machine on it that also participates in the VDS – so by migrating your VMs off the datastore you’ll be ensuring the configuration data is elsewhere.”
  • Check that there are no processes locking the VMFS with this command:
esxcli storage core device world list -d

Datastore Removal Steps

Step 1) Follow the Checklist above.

Make sure no files reside on the Datastore.

Step 2) Unmount Datastore from all ESXi hosts.

As noted by SAM blog post even in vSphere 5.x using the C# phat client, this was possible to do via a wizard against all hosts that have the datastore mounted. Even on the newer HTML5 WebUI this is still possible (I think everyone wants to fully forget that VMware chose flash for a short time).

At this point the Datastore will show up as inaccessible to vSphere. As noted by both Mike and Sam. This will be the same anywhere from 5.x-7.x (As noted by Mike it might be slightly more important to follow procedures with earlier versions of ESXi 3 or 4). If the Check list was followed, there should be no issues unmounting the datastore.

If you need to do this via esxcli (Source):

# esxcli storage filesystem list

Unmount the datastore by running the command:

# esxcli storage filesystem unmount [-u UUID | -l label | -p path ]

For example, use one of these commands to unmount the LUN01 datastore:

# esxcli storage filesystem unmount -l LUN01

# esxcli storage filesystem unmount -u 4e414917-a8d75514-6bae-0019b9f1ecf4

# esxcli storage filesystem unmount -p /vmfs/volumes/4e414917-a8d75514-6bae-0019b9f1ecf4

Step 3) Detach the LUN from all hosts.

As noted by Sam, if you are on 5.x you might want to automate this via PowerCLI. Then noted by Mike, newer 7.x can now do this in bulk via the Management WebUI.

6/7 WebUI -> Hosts n Clusters -> Hosts -> Cluster -> Host -> Configure Tab -> Storage Device (left side tree) -> Highlight Device -> Detach

for esxcli

Obtaining the NAA ID of the LUN to be removed

esxcli storage vmfs extent list

To detach the device/LUN, run the command:

# esxcli storage core device set --state=off -d NAA_ID

6. To verify that the device is offline, run the command:

# esxcli storage core device list -d NAA_ID

The output, which shows that the Status of the disk is off.

Step 4) Rescan HBAs

At this point, if you rescan all HBAs on all hosts the inaccessible datastore should be gone from the WebUI.

At this point you can remove the LUN from being seen (disc from showing up under devices) this will either be iSCSI based configurations (remove static and dynamic IPs from the iSCSI initiator settings on each host.) Mostly likely for a shared VMFS datastore.

It could be a local disc over a local storage controller (such as a logical drive created in RAID) such as behind a Pxxx storage controller.

Removing the source device will always be dependent on how it was configured in the first place.

Summary

So today we covered removing a Datastore. The important thing to remember is removing a Datastore takes a lot more steps than removing one, cause so many different VM’s and services can be applied to a datastore once it has started being used.

In many cases, the SysLog and Scratch partition are big hang ups, and should be looked at closely. Which, however, as stated if you are actually checking for files on the datastore this stuff will be pretty evident.

In most cases, ensure you follow the check list and the process should be pretty smooth. Hope this helps someone.

*Note* I often provide screen shots to provide some context, in this case I decided to leave it more generic to span multiple versions of vSphere.

Creating Custom ESXi Image

Follow these steps

  1. Download Offline Bundle of ESXi Image
  2. Download Drivers E.G The Native ESXi USB NIC drivers
  3. Install PowerCLI (Set-ExecutionPolicy Remotesigned; Import-Module PowershellGet; Install-Module -Name VMware.PowerCLI)
  4. In PowerCLI connect the standard SoftwareDepot by typing:

    Add-EsxSoftwareDepot -DepotUrl <Path to zip>

  5. Get the ImageProfile list:

    Get-EsxImageProfile

  6. Clone standard ImageProfile:

    New-EsxImageProfile -CloneProfile ESXi-6.7.0-8169922-standard -Name MyProfile -Vendor <vendor>

  7.  [Only If Required] If your vib file has Acceptance Level – CommunitySupported, we need to set this Acceptance Level for our ImageProfile:

    Set-EsxImageProfile -ImageProfile MyProfile -AcceptanceLevel CommunitySupported

  8. Add our vib to SoftwareDepot:

    Get-EsxSoftwarePackage -PackageUrl <path to vib>

  9. Add our vib to ImageProfile:

    Add-EsxSoftwarePackage -PackageUrl

Error:

Search result.

Answer driver for specfic version (7.1, need 6.5)

So I downloaded the proper driver but I couldn’t figure out how to pick the right software package since the “get” command was actually already loaded the other driver, so it kept trying to add the 7.1 driver. Only thing I could think of was to close the powershell windows and start fresh…

10. Export ImageProfile to ISO image:

Export-EsxImageProfile -ImageProfile MyProfile -ExportToIso -FilePath

That was it! Sadly the laptop I wanted to use this on was still boot looping, and sadly the USB NIC “Insagnia” didn’t seem to work and was getting NFS4 client failed to load, and not network adapters found on the machine. But was worth a shot.

VMware vCenter Updates using VAMI

This is a quick post on the latest security release notification from VMware.

VMSA-2021-0002 (vmware.com)

If for whatever reason an update is not possible you can follow these workarounds.

While you can use VUM to distribute updates and patches to ESXi hosts.

You’ll have to use VAMI for updating vCenter.

You can download the latest patches here (vmware account required).

I did this on my lab vCenter,  took a lil while but not bad.

  1. Made a backup of the VCSA using Veeam
  2. Shutdown Veeam or any other backup solution that might use vCenter
  3. Notified anyone that might use vCenter that it would be inaccessible during update
  4. Attached ISO to VCSA VM (You can do as 4sysops did and upload to a datastore, or you can simply open the VCSA console via VMRC, and attach the ISO from your Downloads folder)
  5. Log into VAMI (https://vcsa:5480)
  6. Click Update on left nav, then Update -> Check CD-ROM
  7. The update should be available as the option, then click Stage and Install
  8. Accept the EULA, use/don’t use CEIP, Check I have a backup, Click Install.

It could take an hour or so, then everything is back to running state, here’s the summary page after completion:

You can read the alternative methods such as using CLI, or how to handle a vCenter HA cluster upgrade using the link above to 4sysops guide on upgrading vCenter.

Sorry this post is not as extensive as usual, just a heads up about the latest VMware patches. Stay Safe out there.

 

ESXi 6.7 on HPE DL380 G7

I had this long blog post I was going to write about HP screwing me on a ESXi upgrade, but in a nut shell you can read these ones about that how shebang:

  1. MonsterMuffin (crude)
  2. Claud “Admin” (Less crude)

As both of them mention you have to do a clean install, and you probably won’t have a config saved from that exact version as you are just updating to it, so your config on the old 5.x or 6.0 won’t work either. If you have dvswitches and all that fun jazz probably not a huge deal but if you have standard vswitches and lots of custom configurations around them including vlan tagging, well this can be crappy.

I did manage through all my trial n errors to get a working copy but it required workarounds I don’t think would have been supported, so meh just follow those…

I tried everything to get a ESXi system upgraded to 6.7 without loosing or reconfiguring the host, you figure just do anew install and reload the config.

However you can only load config for the same version a backup of that one was created. I eventually came across other things other PSOD’s and had to even at one point edit the boot file to remove a HP dedicated driver from loading. After all that meh, just install new with the custom images mentioned in the above Blogs.

I’m really sorry I would have covered these tasks in far more detail but spent a good couple days smashing my head just trying to get it to work. something are just not worth the effort, and blogging every annoying error and steps along the way on this one… is just one of those things.

Cheers.

vCenter 503 Service Unavailable

I was going to test a auditing script from a DefCon presenter on my AD server, when I was adding the USB controller and the USB stick I was passing thorugh to get the script in my VM was being weird.

First USB 3.0 connected just fine, and connected the USB device to the VM, but diskpart was not showing it. So I went to remove it and try a USB 2.0 controller, that failed to connect since the USB 3.0 was still showing there and I selected to remove it again, which it errored another concurrent task. Makes sense, till refreshing the page told me unprivileged account. I wasn’t sure what this was about, so I decided to open another window and navigate to my center web app… 503 service unavailable:

“503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x000055aec30ef1d0] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)”

What the… rebooting the VCSA showed no success still same error even with an incognito window.. ughh.

I found this thread: https://communities.vmware.com/thread/588755

I was going through this, and decided to try to renew the certs, even though my internal PKI certs were still valide (AFAIK, and checking the cert provided when accessing the page). Now here’s the thing, while I ran the certificate-manager script and renewed all the certs, I noticed my AD server somehow was down. I booted it back up. I’m not exactly sure which fixed it. So I decided to take another snapshot while it was in this “fixed state” and revert to the  broken state. After restoring o the broken state nothing was responding at all on the https service from the VCSA, so I gave it a simple reboot (which I did initially before I noticed my AD server was down, for some reason). Sure enough after the reboot everything was working fine with my internal PKI certs.

I guess if you set vCenter to use MS AD as the primary login domain and that domain is not available the web management service becomes unavailable… that kind of sucks. I should have noticed my AD was not operational but I didn’t have monitoring on it 😉 or use my local workstation as a AD member. Mostly just random VMs I have for testing.

Like most people, should have looked at the logs for a better idea of what the root cause was. I threw 2 darts at a dart board and had to revert to find the true root cause. Not the best way to troubleshoot, but sometimes if logs are not available it is another method…

Installing PowerCLI 12.0 Offline

PowerCLI 12.0

Offline Install

Checking VMwares source wasn’t too insightful…

Just this with the “Download” button redirecting to an alternative site non-other than powershellgallery.com …clicking manual Download gives you the raw nuget package let’s try to install first normally.

Install-Module -Name VMware.PowerCLI

No way it failed, expected, and it even states a warning about the network.

Alright so using an online computer copy the nuget package to the offline (use USB sticks, Floppy drives, Zip Drives, serial modem if that’s what it takes…)

In my case I was testing this on a VM and simply used a USB stick to mount it to the VM from the VMRC console, and copied the nuget file to c:\temp\PowerCLI

This from this MS Doc page on the cmdlet, is for Visual Studio, we are using powershell only…

This topic describes the command within the Package Manager Console in Visual Studio on Windows. For the generic PowerShell Install-Package command, see the PowerShell PackageManagement reference.

Sure enough this is where I gave up on this path. All the new stuff is nice with it all being connected makes life super easy, but in those locked down situations this is a hassel. Since I wasn’t sure how to install the nuget package via a simple ID option like Install-Package for VS PS, there wasn’t one for the regular PS Install-Package cmdlet. Then I went to google how to accomplish this and was a bit annoyed at all the steps required to do it via the package manager… Read this by William on Stackoverflow for more details.

Lucky for me I found an alternative blog post, which does an alternative offline install and much, much simpler.

From the online system instead of saving the nuget package we save the modules files themselves directly.

 Save-Module -Name VMware.PowerCLI -Path C:\temp\PSModules

Copy the entire contents of the PSModules folder to a storage medium of your choice (e.g. USB flash drive) and transfer the files to the desired offline system where PowerCLI is needed.

If you have admin rights on the target system, you can copy files to the location below.

 C:\Program Files\WindowsPowerShell\Modules

At this point he goes on about some settings and stuff, I wasn’t exactly sure how to use PowerCLI, as usually it opens up in a custom PS window before. Now you simple import-module *modulename*

Import-Module VMware.PowerCLI

Now creating custom ESXi images should be a breeze!

Extra Bits

Customer Experience Improvement Program (CEIP)

The VMware Customer Experience Improvement Program collects data about the use of VMware products. You can either agree (true) or disagree (false). For offline systems, only the rejection (false) makes sense. The command shown below suppresses future notifications within PowerCLI.

Set-PowerCLIConfiguration -Scope AllUsers -ParticipateInCeip $false

Ignore invalid SSL certificates

When using self-signed certificates in vCenter, PowerCLI will deny the connection. This behavior can be suppressed with the command:

Set-PowerCLIConfiguration -Scope AllUsers -InvalidCertificateAction Warn

Found the types from this old 5.1 documentation you can also set it to ignore instead of warn. 🙂 Cheers!

VMware ESXi boot and the Config

Sadly this post will be really short as again, lots going on. Recovering a host that failed after a regular reboot, which had a superblock corruption on it’s main OS drive. Also, the BELK series will be done, I just need a bit more time. Sorry for the delays.

“Failed to load /sb.v00” [Inconsistent Data]

Since this drive was not on the main datastore on the host all the VMs were unaffected.

Now loading linux showed the drive data was till accessible, but I also had a feeling this USB drive was on it’s way out. I created a copy using DD, *sadly I didn’t do it the smart way and place it on a drive big enough to save it as a image file, but instead directly to another drive of the same size.

I tried to install the same image of ESXi on top of the current one in hopes it would fix the boot partition files along the way. This only made the host get past /sb.v00 and vault randomly past it with “Fatal Error: 6 [Buffer Too Small]”

I was pretty tired at this point since the server boot times are rather long and attempts were becoming tedious. I did another DD operation of the drive, to the same drive (still not having learned my lesson) and when I awoke to my dismay, it failed only transferring 5 gigs with an I/O error. This really made me sure the drive was on the way out, but it was still mountable (the boot partitions 5, 6 and 8)

At this point you might be wondering, why doesn’t he just re-install and reload a backup config? Which is fair question, however one was not on hand, but surely it must be somewhere on the drive. I know how to create and recover on a working host but a one that can’t boot? Then I found this gem.

Now through out my attempts I did discover the boot partitions to be 5 and 6 and I did even copy them from a new install to my copied version I made about and it did boot but was a stock config. I was stumped till I read the section from the above blog post on “How to recover config from a system that doesn’t boot”. Line 7 was what nailed it on the head for me:

“mount /dev/sda5 /mnt/sda5

7. In the /mnt/sda5 directory, you can find the state.tgz file that contains ESXi configuration. This directory (in which state.tgz is stored) is called /bootblank/ when an ESXi host is booted.”

I was just like … wat? That’s it. Grabbed the bad main drive mounted on a linux system, saw the state.tgz file and made a copy of it, connected the new drive that had a base ESXi config, replaced the state.tgz file with the one I copied, booted it and there was the host in full working state with all network configs and registered VMs and everything.

Not sure why the config is stored in the boot partition, but there you go. Huge Shout out to Michael Bose for his write I suggest you check it out. I have saved it case it disappears from the internet and I can re-publish it. For now just visit the link. 🙂

Using A USB Device as a Datastore on VMware ESXi

USB datastore

Attach USB device in Windows -> DiskPart -> Select Disk -> Clean

To see USB device on host, stop arbitrator service, save to config, reboot

Now the hard parts, normally even following guide to see USB stick on host mount 4 gig volume, I had 16 gig cruiser to use.

Source

Perform a lspci -v to get all the USB UHCI and EHCI controllers to show up.
This shows up for example as:

DEVICE=/dev/disks/t10.JMicron_USB_to_ATA2FATAPI_Bridge
partedUtil mklabel ${DEVICE} msdos
END_SECTOR=$(eval expr $(partedUtil getptbl ${DEVICE} | tail -1 | awk '{print $1 " \\* " $2 " \\* " $3}') - 1)
/sbin/partedUtil "setptbl" "${DEVICE}" "gpt" "1 2048 ${END_SECTOR} AA31E02A400F11DB9590000C2911D1B8 0"
/sbin/vmkfstools -C vmfs5 -b 1m -S $(hostname -s)-local-datastore ${DEVICE}:1

That easy 😉