vCenter 503 Service Unavailable

I was going to test a auditing script from a DefCon presenter on my AD server, when I was adding the USB controller and the USB stick I was passing thorugh to get the script in my VM was being weird.

First USB 3.0 connected just fine, and connected the USB device to the VM, but diskpart was not showing it. So I went to remove it and try a USB 2.0 controller, that failed to connect since the USB 3.0 was still showing there and I selected to remove it again, which it errored another concurrent task. Makes sense, till refreshing the page told me unprivileged account. I wasn’t sure what this was about, so I decided to open another window and navigate to my center web app… 503 service unavailable:

“503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x000055aec30ef1d0] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)”

What the… rebooting the VCSA showed no success still same error even with an incognito window.. ughh.

I found this thread: https://communities.vmware.com/thread/588755

I was going through this, and decided to try to renew the certs, even though my internal PKI certs were still valide (AFAIK, and checking the cert provided when accessing the page). Now here’s the thing, while I ran the certificate-manager script and renewed all the certs, I noticed my AD server somehow was down. I booted it back up. I’m not exactly sure which fixed it. So I decided to take another snapshot while it was in this “fixed state” and revert to the  broken state. After restoring o the broken state nothing was responding at all on the https service from the VCSA, so I gave it a simple reboot (which I did initially before I noticed my AD server was down, for some reason). Sure enough after the reboot everything was working fine with my internal PKI certs.

I guess if you set vCenter to use MS AD as the primary login domain and that domain is not available the web management service becomes unavailable… that kind of sucks. I should have noticed my AD was not operational but I didn’t have monitoring on it 😉 or use my local workstation as a AD member. Mostly just random VMs I have for testing.

Like most people, should have looked at the logs for a better idea of what the root cause was. I threw 2 darts at a dart board and had to revert to find the true root cause. Not the best way to troubleshoot, but sometimes if logs are not available it is another method…

Installing PowerCLI 12.0 Offline

PowerCLI 12.0

Offline Install

Checking VMwares source wasn’t too insightful…

Just this with the “Download” button redirecting to an alternative site non-other than powershellgallery.com …clicking manual Download gives you the raw nuget package let’s try to install first normally.

Install-Module -Name VMware.PowerCLI

No way it failed, expected, and it even states a warning about the network.

Alright so using an online computer copy the nuget package to the offline (use USB sticks, Floppy drives, Zip Drives, serial modem if that’s what it takes…)

In my case I was testing this on a VM and simply used a USB stick to mount it to the VM from the VMRC console, and copied the nuget file to c:\temp\PowerCLI

This from this MS Doc page on the cmdlet, is for Visual Studio, we are using powershell only…

This topic describes the command within the Package Manager Console in Visual Studio on Windows. For the generic PowerShell Install-Package command, see the PowerShell PackageManagement reference.

Sure enough this is where I gave up on this path. All the new stuff is nice with it all being connected makes life super easy, but in those locked down situations this is a hassel. Since I wasn’t sure how to install the nuget package via a simple ID option like Install-Package for VS PS, there wasn’t one for the regular PS Install-Package cmdlet. Then I went to google how to accomplish this and was a bit annoyed at all the steps required to do it via the package manager… Read this by William on Stackoverflow for more details.

Lucky for me I found an alternative blog post, which does an alternative offline install and much, much simpler.

From the online system instead of saving the nuget package we save the modules files themselves directly.

 Save-Module -Name VMware.PowerCLI -Path C:\temp\PSModules

Copy the entire contents of the PSModules folder to a storage medium of your choice (e.g. USB flash drive) and transfer the files to the desired offline system where PowerCLI is needed.

If you have admin rights on the target system, you can copy files to the location below.

 C:\Program Files\WindowsPowerShell\Modules

At this point he goes on about some settings and stuff, I wasn’t exactly sure how to use PowerCLI, as usually it opens up in a custom PS window before. Now you simple import-module *modulename*

Import-Module VMware.PowerCLI

Now creating custom ESXi images should be a breeze!

Extra Bits

Customer Experience Improvement Program (CEIP)

The VMware Customer Experience Improvement Program collects data about the use of VMware products. You can either agree (true) or disagree (false). For offline systems, only the rejection (false) makes sense. The command shown below suppresses future notifications within PowerCLI.

Set-PowerCLIConfiguration -Scope AllUsers -ParticipateInCeip $false

Ignore invalid SSL certificates

When using self-signed certificates in vCenter, PowerCLI will deny the connection. This behavior can be suppressed with the command:

Set-PowerCLIConfiguration -Scope AllUsers -InvalidCertificateAction Warn

Found the types from this old 5.1 documentation you can also set it to ignore instead of warn. 🙂 Cheers!

VMware ESXi boot and the Config

Sadly this post will be really short as again, lots going on. Recovering a host that failed after a regular reboot, which had a superblock corruption on it’s main OS drive. Also, the BELK series will be done, I just need a bit more time. Sorry for the delays.

“Failed to load /sb.v00” [Inconsistent Data]

Since this drive was not on the main datastore on the host all the VMs were unaffected.

Now loading linux showed the drive data was till accessible, but I also had a feeling this USB drive was on it’s way out. I created a copy using DD, *sadly I didn’t do it the smart way and place it on a drive big enough to save it as a image file, but instead directly to another drive of the same size.

I tried to install the same image of ESXi on top of the current one in hopes it would fix the boot partition files along the way. This only made the host get past /sb.v00 and vault randomly past it with “Fatal Error: 6 [Buffer Too Small]”

I was pretty tired at this point since the server boot times are rather long and attempts were becoming tedious. I did another DD operation of the drive, to the same drive (still not having learned my lesson) and when I awoke to my dismay, it failed only transferring 5 gigs with an I/O error. This really made me sure the drive was on the way out, but it was still mountable (the boot partitions 5, 6 and 8)

At this point you might be wondering, why doesn’t he just re-install and reload a backup config? Which is fair question, however one was not on hand, but surely it must be somewhere on the drive. I know how to create and recover on a working host but a one that can’t boot? Then I found this gem.

Now through out my attempts I did discover the boot partitions to be 5 and 6 and I did even copy them from a new install to my copied version I made about and it did boot but was a stock config. I was stumped till I read the section from the above blog post on “How to recover config from a system that doesn’t boot”. Line 7 was what nailed it on the head for me:

“mount /dev/sda5 /mnt/sda5

7. In the /mnt/sda5 directory, you can find the state.tgz file that contains ESXi configuration. This directory (in which state.tgz is stored) is called /bootblank/ when an ESXi host is booted.”

I was just like … wat? That’s it. Grabbed the bad main drive mounted on a linux system, saw the state.tgz file and made a copy of it, connected the new drive that had a base ESXi config, replaced the state.tgz file with the one I copied, booted it and there was the host in full working state with all network configs and registered VMs and everything.

Not sure why the config is stored in the boot partition, but there you go. Huge Shout out to Michael Bose for his write I suggest you check it out. I have saved it case it disappears from the internet and I can re-publish it. For now just visit the link. 🙂

Using A USB Device as a Datastore on VMware ESXi

USB datastore

Attach USB device in Windows -> DiskPart -> Select Disk -> Clean

To see USB device on host, stop arbitrator service, save to config, reboot

Now the hard parts, normally even following guide to see USB stick on host mount 4 gig volume, I had 16 gig cruiser to use.

Source

Perform a lspci -v to get all the USB UHCI and EHCI controllers to show up.
This shows up for example as:

DEVICE=/dev/disks/t10.JMicron_USB_to_ATA2FATAPI_Bridge
partedUtil mklabel ${DEVICE} msdos
END_SECTOR=$(eval expr $(partedUtil getptbl ${DEVICE} | tail -1 | awk '{print $1 " \\* " $2 " \\* " $3}') - 1)
/sbin/partedUtil "setptbl" "${DEVICE}" "gpt" "1 2048 ${END_SECTOR} AA31E02A400F11DB9590000C2911D1B8 0"
/sbin/vmkfstools -C vmfs5 -b 1m -S $(hostname -s)-local-datastore ${DEVICE}:1

That easy 😉

Requesting, Signing, and Applying internal PKI certificates on VCSA 6.7

The Story

Everyone loves a good story. Well today it begins with something I wanted to do for a while but haven’t got around to. I remember adjusting the certificates on 5.5 vCenter and it caused a lot of grief. Now it may have been my ignorance it also may have been due to poor documentation and guides, who knows. Now with VMware now going full linux (Photon OS) for the vCenter deployments (much more light weight) it’s still nice to see a green icon in your web browser when you navigate the nice new HTML5 based management interface. Funny that the guide I followed, even after applying their own certificate still had a “not secure” notification in their browser.

This might be because he didn’t install his Root CA certs into the computers trusted CA store on the machine he was navigating the web interface from. However I’m still going to thank RAJESH RADHAKRISHNAN for his post in VMArena. it helped. I will cover some alternatives however.

Not often I do this but I’m lazy and don’t feel like paraphrasing…

VCSA Certificate Overview

Before starting the procedure just a quick intro for managing vSphere Certificates, vSphere Certificates can manage in two different modes

VMCA Default Certificates

VMCA provides all the certificates for vCenter Server and ESXi hosts on the Virtual Infrastructure and it can manage the certificate lifecycle for vCenter Server and ESXi hosts. Using VMCA default the certificates is the simplest method and less overhead.

VMCA Default Certificates with External SSL Certificates (Hybrid Mode)
This method will replace the Platform Services Controller and vCenter Server Appliance SSL certificates, and allow VMCA to manage certificates for solution users and ESXi hosts. Also for high-security conscious deployments, you can replace the ESXi host SSL certificates as well. This method is Simple, VMCA manages the internal certificates and by using the method, you get the benefit of using your corporate-approved SSL certificates and these certificates trusted by your browsers.

Here we are discussing about the Hybrid mode, this the VMware’s recommended deployment model for certificates as it procures a good level of security. In this model only the Machine SSL certificate signed by the CA and replaced on the vCenter server and the solution user and ESXi host certificates are distributed by the VMCA.

I guess before I did the whole thing, were today I’m just going to be changing the cert that handles the web interface, which is all I really care about in this case.

Requirements

  • Working PKI based on Active directory Certificate Server.
  • Certificate Server should have a valid Template for vSphere environment
    Note : He uses a custom template he creates. I simply use the Web Server template built in to ADCS.
  • vCenter Server Appliance with root Access

Requesting the Certificate

Now requesting the certificate requires shell access, I recommend to enable SSH for ease of copying data to and from the VCSA as well as commands.

To do this log into the physical Console of the VCSA, in my case it’s a VM so I opened up the console from the VCSA web interface. Press F2 to login.

Enable both SSH and BASH Shell

OK, now we can SSH into the host to make life easier (I used putty):

Run

 /usr/lib/vmware-vmca/bin/certificate-manager

and select the operation option 1

Specify the following options:

  • Output directory path: path where will be generated the private key and the request
  • Country : your country in two letters
  • Name : The FQDN of your vCSA
  • Organization : an organization name
  • OrgUnit : type the name of your unit
  • State : country name
  • Locality : your city
  • IPAddess : provide the vCSA IP address
  • Email : provide your E-mail address
  • Hostname : the FQDN of your vCSA
  • VMCA Name: the FQDN where is located your VMCA. Usually the vCSA FQDN

Once the private key and the request is generated select Option 2 to exit

Next we have to export the Request and key from the location.

There are several options on how to compete this. Option 1 is how our source did it…

Option 1 (WinSCP)

using WinSCP for this operation .

To perform export we need additional permission on VCSA , type the following command for same

chsh -s /bin/bash root

Once connected to vCSA from winscp tool navigate the path you have mentioned on the request and download the vmca_issued_csr.csr file.

Option 2 (cat)

Simple Cat the CSR file, and use the mouse to highlight the contents. Then paste it into ADCS Request textbox field.

Signing The Request

Now you simply Navigate to your signing certificate authorizes web interface. usually you hope that the PKI admin has secured this with TLS and is not just using http like our source, but instead uses HTTPS://FQDN/certsrv or just HTTPS://hostname/certsrv.

Now we want to request a certificate, an advanced certificated…

Now simply, submit and from the next page select the Base 64 encoded option and Download the Certificate and Certificate Chain.

Note :- You have to export the Chain certificate to .cer extension , by default it will be PKCS#7

Open Chain file by right click or double click navigate the certificate -> right click -> All Tasks -> export and save it as filename.cer

Now that we have our signed certificate and chains lets get to importing them back into the VCSA.

Importing the Certificates

Again there are two options here:

Option 1 (WinSCP)

using WinSCP for this operation .

To perform export we need additional permission on VCSA , type the following command for same

chsh -s /bin/bash root

Once connected to vCSA from winscp tool navigate the path you have mentioned on the request and upload the certnew.cer file. Along with any chain CA certs.

Option 2 (cat)

Simply open the CER file in notepad, and use the mouse to highlight the contents. Then paste it into any file on the VCSA over the putty session.

E.G

vim /tmp/certnew.cer

Press I for insert mode. Right click to paste. ESC to change modes, :wq to save.

Run

 /usr/lib/vmware-vmca/bin/certificate-manager

and select the operation option 1

Enter administrator credentials and enter option number 2

Add the exported certificate and generated key path from previous steps and Press Y to confirm the change

Custom certificate for machine SSL: Path to the chain of certificate (srv.cer here)
Valid custom key for machine SSL: Path to the .key file generated earlier.
Signing certificate of the machine SSL certificate: Path to the certificate of the Root CA (root.cer , generated base64 encoded certificate).

Piss what did I miss…

That doesn’t mean shit to me.. “PC Load letter, wtf does that mean!?”

Googling, the answer was rather clear! Thanks Digicert!

Since I have an intermediate CA, and I was trying either the Intermediate or the offline it would fail.. I needed them both in one file. So opened each .cer and pasted them into one file “signedca.cer”

Now this did take a while, mostly around 70% and 85% but then it did complete!

Checking out the web interface…

Look at that green lock, seeing even IP listed in the SAN.. mhm does that mean…

Awwww yeah!!! even navigating the VCSA by IP and it still secure! Woop!

Conclusion

Changing the certificate in vCenter 6.7 is much more flexable and easier using the hybird approach and I say thumbs up. 😀 Thanks VMware.

Ohhh yea! Make sure you update your inventory hosts in your backup software with the new certificate else you may get error attempting backup and restore operations, as I did with Veeam. It was super easy to fix just validate the host under the inventory area, by going through the wizard for host configuration.

Rename a vSwitch in vSphere

I noticed I had named some vSwitches in the new hosts builds I had. This was nice. However I also noticed I couldn’t name a vSwitch when creating in vCenter. So how did I name them.

I quickly searched google, but the primary results were not what I was expecting….

1, 2, 3, 4

All of which either stated to edit the host config file, or use cli commands… well I know I did do the first thing, and I don’t remember using the CLI. Also I don’t remember having to reboot the host. The only diff I can think of is that I named them at creation, not after the fact, but the vCenter wizard has no option for that… but sure enough I checked my documentation.

If you login into a host directly, you can name a vSwitch right when creating it. This just requires to be done on each host in the cluster. It’s nice but is it worth it?

Once you have it setup it is really nice to have named vSwitches.

of course this doesn’t include dvSwitches, as those you can name and usually require uplinks to communicate between hosts. However you can still deploy a test dvSwitch to multiple hosts without an uplink though those VMs would only be useful on a single host… which defeats the purpose of it, but you can move the VMs as a whole group between VMs, and if that “Test switch” need any change it would be distributed between all hosts.

ESXi 6.5 Stuck on Initializing Scheduler
PSOD PCPU1 could not start

I’m making this post short to note this odd experience with this host build.

First weird thing was when trying to install ESXi I couldn’t get past vmkusb not sure what it was about but only found this decent reddit post with the same problem.

In short he noticed it would only get past this if a second USB was plugged into the USB2 ports, and sure enough that worked for me too. strange….

Then a couple days later while doing some more test boots, I get a Purple screen of death, complaining about the PCPU1 not starting or some shit, ugh again lets see what others have to say… well I found this vmware thread on it, he basically stated that resetting the BIOS settings worked, after farting around with some bios settings, I had other failed boots and my CPU and system was rather hot. I let everything cool down, added more powerful fans and tried again after resetting the BIOS to factory and much like the post it worked.

After finalizing the build a little more, I switched to an old 2.5″ HDD.. same problem but I noticed it gets stuck on initializing scheduler before PSODing, while I searched for what might be up with that I found this

It did help shit, same problem next boot, instead of just resetting the BIOS I played around with a couple more settings like I enabled intel AES-NI which helps for CPU offloading of AES computations. and another one which I sadly forgot, and then my next boot was fine. saving this at this point in case it comes back again.

VMware Host Oddities

I’ll keep this post short as I find it rather interesting turn of events…

I was ensuring config files were being backed up as good Sysadmin Posture would suggest. I noticed an issue on one of the hosts when I went to run the configuration backup. When I was hit with a “general system error”. I Wish I would have saved the actual error message but when I went to research it, it gave me more people complaining about the error when trying to vMotion (In this case I grabbed the most specific line from the call back in hopes to get something) most results simply pointed to saying restarted the ESXi hosts management agents. I decided to see if I could vMotion or if any other symptoms from the host, but everything showed green in vCenter, and vMotions worked without an issue.

I left it for the night (before) and decided to update the host since it was running 6.5-u2 and needed to be updated to 6.5-u3. I was hoping it would resolve the config backup issue (I had an older copy on hand but wanted most up to date) So I decided to do the update via a ISO image and a host reboot, moved VMs, no problem, placed host into Maintenance mode, no issues, send host for shutdown (I wasn’t using iLO or iDRAC, so needed to manually mount the USB hosting the installation media), while I was at the console waiting for the host to show “shutting down” it simply …. didn’t… after 10 minutes which is far more then generous I decided to press F2 to get into the console to have to tell me … uggghhh I wish I would have taken a picture of the error, something along the lines of you can’t use the console cause you been locked out…. it was weird, after waiting another 10 minutes and nothing happening (I’m assuming there must have maybe been some actual underlying issue with the datastore holding the scratch?) I decided to just do a hard shutdown (hold power for 5 seconds). I could rebuild this server faster if need be. Powering it back up was perfectly fine through all POST checks, booted the ESXi 6.5u3 installer, booted fined, checked for logical drives, found all of them perfectly fine including the SD card holding the OS (scratch changed to a persistent location, another datastore on the local host with multiple disks used for a datastore) anyway I selected it and it saw there was an existing installation and selected to update it, successful, reboot. reboots fine, but host doesn’t come back to vCenter…..

Log into host directly, and notice it states it has 18 VMs, all with a status of error, now I had moved them and they are even still running on another host (thankfully it didn’t do anything stupid to try and hijack them) at this point I had called VMware support and placed an SR. Once I had a tech on the line, I discussed my symptoms and issues. At this point I asked what was the best course of action as, I didn’t want to rebuild the host (I could, but time). In this case I simply un-registered all the VMs that were in error as they were not associated with the host), then attempted to re-connect the host. It first failed complaining about bad username and password (I’m not sure if this was the root account used to add it, or the VPX user it uses to manage the host) but it prompted the wizard like adding a new host, since all other settings were still fine on the host (network and vswitches w/ VMPGs and VMKs, and all Datatstores) after this wizard the host was re-added back into the cluster with the latest updates, and running the backup config command:

esxcli storage vmfs extent list

worked without error. So yay! all fixed, but one thing still bothered me… what was the root cause…

I decided to check my scratch settings real quick, and sure enough it hold good and is on redundant datastore on the local ESXi host, so not sure what the root cause was, but it was fixed. I’m posting this for my own reference. Just incase this issue re-arises.

 

vCenter SSO

vCenter SSO

The other day I covered installing vCenter.

Today I’ll do a very quick overview on setting up SSO with a Windows based AD Auth.

DNS

Step 1) validate vCenter can reach any AD via the Root domain name:
*USE AD SERVER FOR DNS, 3rd Party DNS leads to failure as missing specialized records, E.G. srv records)
*Ensure Time is synced to within 5 minutes of AD server*

I ssh’d into the VCSA using root and then, “shell” and a regular old ping command to validate.

Step 2) Follow Virten’s Guide for doing the Flash way, or CLI way to join vCenter to the Windows Domain. Via the HTML5 Web Client: Menu -> Administration -> SSO -> Configuration -> Active Directory Domain -> Click Join AD (hidden behind the menu in the snippet)

Enter the domain to join, and an account that is allowed to join systems to the domain, in my case I used my Domain ADmin Account:

Populate the fields, and click joing and sure enough you will join the domain without issue… if you have a proper working NTP/AD architecture that is…

Thanks VMware… Ugghh ok, and if I use the CLI maybe some more verbose error?

What do you mean you “DC not found” what kind of PCLoadLetter error is this? Like I just verified lookup via DNS which is like the primary pre-req besides firewalls, which I have already configured my actually firewalls… so what gives, Googling this error leads me to this.

and I quote “On ESXi 6.5, the command is executed from /usr/lib/likewise/bin. If you haven’t enabled the AD firewall rule mentioned earlier, you must temporarily unload the ESXi firewall – assuming it is enabled – for this to work. Failing this, you will get an Error: NERR_DCNotFound [code 0x00000995] error.”

Are you ****in’ with me…. for reals… man wtf VMware….

Shit, right this is the VCSA not a ESXi host… ugggh quick research…

What… da… How, did I not know about this?! There’s a special VCSA management page, everything online just uses the “Web Client” which all VMware’s documentation assumes this to be the Flash client, which doesn’t even reference this at all!

https://vcsa:5480

Alrighty then… logging in… mhmm

That’s awesome but I don’t see firewall, maybe if I navigate to networking…

Nope, NICs settings and that’s about it:

C’mon those firewall settings have to be here, I don’t want to have to be forced to use flash…. cmon…..

F*** it says it’s for 6.7 I’m clearly on 6.5 there has to be a way…

After some deeper digging ( I found out VCSA uses python scripts to use specific files to build the firewall) then also talking this problem over with someone on the IRC channel #wmware, and digging a bit further and finding this vmware post….

I was at first simply using a third part DNS, having JUST an A host record for the AD server, not any of the other service records for LDAP or anything else, after changing my DNS settings on the VCSA to point to the AD server itself I got a different error at the CLI:

Bahhh what? oh wait… lol all my time is wrong, everywhere…

NTP – Fixing Time

Actual time 8:20 PM Winnipeg Central Time. Mon Oct 7, 2019

AD server time: 2:09 PM Mon Oct 7, 2019 (CST)

VCSA time: Tue Oct 8 01:15:08 UTC 2019

What a gong show… let’s fix this! First MS states to leave the PDC to system time to get form the host as host gets acurate time, well not for me. I could point the host to external, and wait then changing PDC time auto. But if you want to Domain join the hosts they should follow the hierarchy and use the PDC as time, catch 22, so instead PDC points to external source, and hosts will point to PDC for time and DNS (this allows for ease for changing external time provider and no issues with time sync).

So fixing PDC time:

before:

after

NOw time has changed and my firewall shows the successful packets, but why is my offset still so off? and why is my time an hour off?

Here’s my local workstation:

Yet here’s my PDC:

ok everything I checked online I’m sure I did it right but the syntax on one of the guides I was following didn’t seem right and I tried again and this time it worked, finally!

K, now I can update each host in my lab….

Before:

Configure:

After:

Finally VCSA itself, https://vcsa:5480 (login as root) -> Time

Before:

Configure:

After:

Yay, after fixing my time everywhere:

Joining VSCA to Windows Domain via CLI

/opt/likewise/bin/domainjoin-cli join $domain $user '$password'

YAY!

Quick Re-Cap:

So bad news is this isn’t as short a blog as I wanted, but good news is we are all learning something! Yay!

Now that we got our system domain joined (reboot required)

waiting… waiting….

Verifying AD object on AD server (core, via powerhsell)

and on the HTML 5 Web Client:

Adding Identity Source

Now I can finally follow adding the Identity source A) AD Auth from here.

Click on Identity Sources -> Add Identity Source:

omg finally something that was dead simple…

Defining Permissions

Now click on global Permissions.

Click “+” icon, and if system join is all good it should be able to query the AD and find the users when typed into the Name field:

Lets test it….

Second attempt but pushing to children objects:

and yay this time I was able to get in successfully:

but I had to put in my UPN (user@doman.local) what if I just want to enter my user name…

What a bunch of poop, that’s cause we didn’t set the primary SSO domain… back in the VCSA settings https://vcsa:5480 – summary shows

back on vCenter Web Client, Menu -> Administration -> SSO -> Configure -> Identity Sources -> select new source -> click Set as Default:

login again:

success, and finally as the source virten post stated, the “Use Windows Authentication” option is greyed out unless the Enhanced Authentication Plugin is installed. You can find the download link at the bottom of the login screen.

Summary

That was a bit more painful then I wanted it to be, but it really was nice that it was this painful cause it reminded me of the moving parts that have to be setup correct for this all to play nicely to begin with.

I hope this guide has helped someone. Please leave a comment, any comment will do!!!

 

Using VMware Update Manager (VUM)

VUM

Overview

In this post I’m going to try and upgrade one of my ESXi 5.5 host to 6.5 using VCSA’s now built in by default VUM. I followed this video on youtube for reference.

First thing I noticed but the video doesn’t mention is that (at least for 6.5) VUM tab is only available using the flash based client. If you use the HTML5 based web client the Update tab isn’t shown:

HTML 5 Missing update Tab:

Flash based client with Update Tab:

As soon as I see a video, I can tell which version is being used as its sooo different. (Flash sucks)

Import Image

About a minute into the source video he gets to where the major images are for major host upgrades. Since I had no existing images provided like the video I decided to try the import wizard, which poped up a useful Windows Selection Dialog box (as I was testing this form the latest version of Windows 10 – 1903, with the latest Chrome (Built in flash after enabling it)), on top of that I uploaded the the image from a UNC path (\\ip\some\folder\file.iso) after verifying access via windows explorer, I simply pasted the UNC path into the Windows Selection Dialog box address bar, and selected the ISO image. and it worked.

only once a image is uploaded, and selected does the creation of a baseline option make itself shown.

Baseline

Let the beat drop! *Bass beat drops* What the point of this is I can’t exactly tell yet, it seems to be a one to one mapping between a name and the ISO image being used?

paraphrasing the video guide “Now that we completed this useless step, we navigate to the cluster needing to be upgraded” In my case a single 5.5 test host, and by clicking on “Go to compliance view”

Attach Baseline

It would seem the Update Tab is available, at either the vSphere host level, Datacenter level, cluster level or host level depending on the scope you wish to deploy a “baseline”. Once within the scope you choose, I’m at host, click on “attach baseline” after ensuring you are still on the update tab.

much like the source after attaching the baseline the compliance level was shown as unknown, let’s follow along and “scan for updates”.

Now I’m assuming cause I am at the host level I don’t see the tabs with compliant and others cause there is only one host. and in this case it does change to “non-compliant” cause as the speaker states “The hosts listed as non-compliant do not match the version of ESXi associated with the attached baseline” AKA these hosts need to be upgraded.

Remediate

Click it to being the upgrade process for the host/cluster, which will being a wizard! Ohhh might wizard guide me to the light at the end of the tunnel!

while I clicked next, flash gave me an error prompt telling me my session had expired, and kicked me out back to the login page, even though I was still pretty actively working on it (snippets don’t take that long). Stupid flash, logged back in and back to the wizard:

agree to the EULA

Schedule it or do it now by not checking a scheduled time.

Pick your additional options and remediation options (I picked for my test to suspend my VMs as they are unable to be vmotions live due to no EVC based cluster of the hosts. they are all stand alone at the time of this writing. so lets try that.

After clciking finish I didn’t see anyting much happening at the vcenter tasks, so I logge dinto the host being upgraded and saw it was suspending the vms in question:

Now it has to copy all the memory from these VMs to disk so this could take a bit of time… then I’d assume I’ll be disconnected from the host once it reboots.

Monitoring my pings for these servers the pings have dropped starting the suspend stat (makes sense) but the host is still responding (makes sense).

I decided at this point to go get some food, I’m lazy and don’t cook, so by the time I had returned I was rather shocked to see the host had succefully been updated, showed compliant and my systems were right back to operational…

Summary

Besides the flash rubbish, this was overall a rather good experience. :O

I think I may upgrade more hosts this way in the future. I didn’t even have to step into my basement at all. That was great!

Until…

I was going to update my second host at home and was hit with this…

well wtf… then it hit me in the face… oh yeah…. I forgot about that, this is a nice real possible word example of third party, unsupported drivers. When I checked my own blog, and lucky the reference to the driver, and where I got it, it appears it still works for 6.x, so I can only guess I’ll have to remove the VIB, run the VUM update procedure, then manually re-install the third party driver… lets try this!

Remove VIB

Following this as a reference, I did the same thing:

esxcli software vib list

esxcli software vib remove --vibname DLink-528T

Ughhhh…

I remember that script/VIB, He was generally really cool guy an dI really loved his blog posts, but his VIB has been rather garbage…. as others have mentioned

Errors

As the picture shows a reboot is required now…. I let VUM do its thing with the standard ESXi 6.5u2 baseline I was using, after the server rebooted I got a problem:

“There was a problem with the Network Device specified on the command line. Error: No NIC found with MAC address.”

Discussion :

The NIC to be used as the management NIC has no drivers installed for it.

Ohhh crap, I forgot when I installed ESXi on this desktop I had to make a custom image, and this is a requirement for systems with custom builds, I removed the drivers for the one NIC but it was not for the ESXi mgmt, but the built in NIC on the mainboard is Realtek and… yeah… anyway, I’ll make a post on creating a custom image, but after a good while of failing to get what I need (as it was my hypervisor with my internet providing VM). I managed to find my initial build and re-install it manually and re-register the VMs and re-create the vSwitches and brought everything back up.

In this case I could have used my Veeam server, however none of my other hypervisors have multiple NICs and thus not an option to use them. My lab is def no redundant lab setup.