Upgrading Windows 10 2016 LTSB to 2019 LTSC

*Note 1* – This retains the Channel type.
*Note 2* – Requires a new Key.
*Note 3* – You can go from LTSB to SA, keeping files if you specify new key.
*Note 4* – LTSC versions.
*Note 5* – Access to ISO’s. This is hard and most places state to use the MS download tool. I however, managed to get the image and key thanks to having a MSDN aka Visual Studio subscription.

I attempted to grab the 2021 Eval copy and ran the setup exe. When it got to the point of wanting to keep existing file (aka upgrading) it would grey them all out… 🙁

So I said no to that, and grabbed the 2019 copy which when running the setup exe directly asks for the key before moving on in the install wizard… which seems to let me keep existing files (upgrade) 🙂

My enjoyment was short lived when I was presented with a nice window update failed window.

Classic. So the usual, “sfc /scannow”

Classic. So fix it, “dism /online/ cleanup-image /restorehealth”

Stop, Disable Update service, then clear cache:

Scan system files again, “sfc /scannow”

reboot make sure system still boots fine, check, do another sfc /scannow, returns 100% clean. Run Windows update (after enabling the service) comes back saying 100% up to date. Run installer….

For… Fuck… Sakes… what logs are there for this dumb shit? Log files created when you upgrade Windows 11/10 to a newer version (thewindowsclub.com)

setuperr.log Same as setupact.log Data about setup errors during the installation. Review all errors encountered during the installation phase.

Coool… where is this dumb shit?

Log files created when an upgrade fails during installation before the computer restarts for the second time.

  • C:\$Windows.~BT\Sources\panther\setupact.log
  • C:\$Windows.~BT\Sources\panther\miglog.xml
  • C:\Windows\setupapi.log
  • [Windows 10:] C:\Windows\Logs\MoSetup\BlueBox.log

OK checking the log…..

Lucky me, something exists as documented, count my graces, what this file got for me?

PC Load letter? WTF does that mean?!  While it’s not listed in this image it must have been resolved but I had a line that stated “required profile hive does not exist” in which I managed to find this MS thread of the same problem, and thankfully someone in the community came back with an answer, which was to create a new local temp account, and remove all old profiles and accounts on the system (this might be hard for some, it was not an issue for me), sadly I still got, Windows 10 install failed.

For some reason the next one that seems to stick out like a sore thumb for me is “PidGenX function failed on this product key”. Which lead me to this thread all the way back from 2015.

While there’s a useless comment by “SaktinathMukherjee”, don’t be this dink saying they downloaded some third party software to fix their problem, gross negligent bullshit. The real hero is a comment by a guy named “Nathan Earnest” – “I had this same problem for a couple weeks. Background: I had a brand new Dell Optiplex 9020M running Windows 8.1 Pro. We unboxed it and connected it to the domain. I received the same errors above when attempting to do the Windows 10 upgrade. I spent about two weeks parsing through the setup error logs seeing the same errors as you. I started searching for each error (0x800xxxxxx) + Windows 8.1. Eventually I found one suggesting that there is a problem that occurs during the update from Windows 8 to Windows 8.1 in domain-connected machines. It doesn’t appear to cause any issues in Windows 8.1, but when you try to upgrade to Windows 10… “something happened.”

In my case, the solution: Remove the Windows 8.1 machine from the domain, retry the Windows 10 upgrade, and it just worked. Afterwards, re-join the machine to the domain and go about your business.

Totally **** dumb… but it worked. I hope it helps someone else.”

Again, I’m free to try stuff, so since I was testing I cloned the machine and left it disconnected from the network, then under computer properties changed from domain to workgroup (which means it doesn’t remove the computer object from AD, it just removes itself from being part of a domain). After this I ran another sfc /scannow just to make sure no issue happened from the VM cloning, with 100% green I ran the installer yet again, and guess what… Nathan was right. The update finally succeeded, I can now choose to rename the PC and rejoin the domain, or whatever, but the software on the machine shouldn’t need to be re-installed.

Another fun dumb day in paradise, I hope this blog post ends up helping someone.

 

Updating Power CLI 12

If you did an offline install, you may need to grab the package files from an online machine. Otherwise, you may have come across a warning error about an existing instance of power CLI when you go to run the main install cmdlet.

When I first went to run this, it told me the version would be installed “side-by-side” with my old version. Oh yeah, I forgot I did that…

Alright, so I use the force toggle, and it fails again… Oi…

Lucky for me the world is full of blogger these days and someone else had also come across this problem for the exact same reason.

VMware.PowerCLI install update error – Install-Package: Authenticode issuer | vGeek – Tales from real IT system Administration environment (vcloud-lab.com)

If you want all the nitty details check out their post, the main part I need was this one line, “This issue can be resolved deleting modules from the PowerShell modules folder inside Program Files. Once the modules folder for VMware are deleted try installing modules again, you can also mention the modules installation scope.”

AKA, Delete the old one, or point install to other location. He states he needed the old version but doesn’t specify for what. Anyway, I’ll just delete the old files.

So, at this point I figured I was going to have a snippet of a 100% clean install, but no, again something happened, and it is discussed here.

If I’m lucky I will not need to use any of the conflicting cmdlets and if I do; I’ll follow the suggestions in that thread.

OK let’s move on. Well, the commands were still not there, looks this has to succeed, and there’s no prefix option during install only import, which you can only do after install, the other option was to clobber the install. Not interested, so I went into Windows add/remove features, and removed the PowerShell module for Hyper-V. No reboot required, and the install worked.

the Hyper-v MMC snap in still works for most of my needs. Now that I finally have the 2 required pre-reqs in place.

Step 2a) connect to server via Power CLI

Why did this happen?

A: Cause self signed certificate on vCenter, and system accessing it doesn’t have the vCenter’s CA certificate in its own trusted ca store.

How can it be resolved?

A:  Option 1) Have a proper PKI deployed, get a proper signed cert for this service by the CA admin, assign the cert to the vCenter mgmt services. This option is outside the scope of this post.

Option 2) Install the Self Sign CA cert into the machine that’s running PowerCLI’s machine store’s trusted CA folder.

Option 3) Set the PowerCLI parameter settings to prompt to accept untrusted certificates.

I chose option 3:

Make sure when you set your variable to use single quotes and not double quotes (why this parameter takes system.string instead of secureString is beyond me).

While I understand the importance of PowerShell for scripting and automation and mass deployment situations, requiring it to apply a single toggle setting is a bit redic, take note VMware; Do better.

vSphere HA Agent cannot be correctly installed or configured… again

Story

Another vCenter Patch, Another problem 😀

This seems to be a reoccurring story these last couple posts…

Error on Host

This time after updating again a host in the cluster had the error message.

Troubleshooting

Un like the last time this happened, the event log wasn’t as blatant (flooded) complaining about the /tmp being full. and checking the host with

vdf -h

which showed only 90% full, which was still pretty high, which might have explained the one log event that I did see about it:

The ramdisk 'tmp' is full. As a result, the file /tmp/img-stg/data/vmware_f.v00 could not be written

Which was in the log right after this event of attempting to install a base ESXi image?

Installing image profile '(Updated) HPE-ESXi-Image' with acceptance level checking disabled

This seemed a bit weird but I could find any info other than what’s usuallly a very Microsoft type answer of “you can just ignore it” or “usually this is not an issue, just it says vCenter saying it is connecting to esxi host and installing it’s agent

OK I guess… moving on… the very next error event was:

Could not stage image profile '(Updated) HPE-ESXi-Image': ('VMware_bootbank_vmware-fdm_7.0.2-18455184', '[Errno 28] No space left on device')

Huh, Now note this host was installed running the official VMware Image provided by HPE for this exact hardware supported by the VMware HCL. So there should be no funny business. However I feel maybe there’s a bit of the known HPE bug as mentioned the last time this happened. It just hasn’t fully flooded /tmp just yet.

Lil Side Trail

So couple things to note here, first the ESXi image is installed on a USB/SD Card style setup as such it should be well know to define the persistent log location, as well as the scratch location. However, not many source specify changing the system swap location.

  1. Persistent Log; VMware KB; Tech Blogger
    (Most standard ESXi Log info)
  2. Scratch Log: VMware KB; Tech Blogger 1; Tech Blogger 2
    (Crash Logs, Support log creations)
  3. Swap Location: VMware Doc 1 (Configure), VMware Doc2 (About), Tech Blogger Who seem to regurgitate the exact about page from VMware.

However, researching this even more lots of posts on reddit mentioned the swap file for VM’s being on their VM directories, so if using a shared datastore they will reside there, and I shouldn’t see issues around swap usage at all at the host level.

Which if you look on the vCenter Web UI on a ESXi hosts there are two options available: VM – Swap, and System Swap.

The VMware docs doesn’t seem to describe accurately the difference between these two options.

Lookup up the error about not being able to stage the file I found this one blog post which of course mentioned changing the swap location to get past the error…

The main thing mentioned by the blogger is “The problem is caused by ESXi not having enough free space available to extract the installation packages.” but failed to specify where that exactly is, and the event log didn’t specify that either. Now since his solution was to adjust the system swap location, it begs the question. Is the package extraction location the System Swap location?

Since the host settings seem to be only specified with the alternative option checkboxes as:

Can use host cache
Can use datastore specified by host for swap files

It’s still not fully clear to me where the swap is actually located with these, assumed default settings. Or if extraction of the image actually using swap, or why the same imagine already on the ESXi host is being re-applied when your upgrade vCenter?

Resolution

So many question, so little answers, so unfortunately I’m going to go on a bit of a whim, and simply try exactly what I did before, clear the file from the /tmp location that was takin up a lot of it’s space, install the HPE patch for the known bug, in hopes it resolves the issue….

Sure enough the exact same thing happened, as in my initial post it just seems it wasn’t fully full. So the symptoms were just a bit different.

  1. vMotion all VMs to another host in the cluster (amazing vMotion works without issue)
  2. Ignore the HA warning on the VMs migrated
  3. Place Host into Maintenance mode (This clears the HA warnings on the VMs and cluster)
  4. Verify /tmp has room. Update any ESXi packages from the hardware vendor if applicable.
  5. Reboot the host.
  6. Exit Maintenance mode.

Hope this helps someone who might see the same type of error events in their ESXi event logs.

Microsoft Exchange Vulns and Buggy Updates

I’ll keep this post short. If you are unaware, there’s been a big hack on exchange servers.

Microsoft Exchange hack, explained (cnbc.com)

I ran the IOC scripts from MS, was I affected, it appears I may have.

Initiated my own lab DRP/BCP. Informed myself that services would be down, and restored AD and Exchange from backups before the logged incidents. Took the OWA Rev proxy rule  down till the servers could be fully patched.

Booted restored VMs, patched, hopefully good to go.

Then doing patch Tuesday updates users laptops start failing to boot after installing KB5000802. All I could find was news of prints causing BSODs classic.. BSODs! In my case it was causing boot crashing, I did my usual trick, but I got a different error, then ran the Windows Start up repair process, which amazingly got it to boot but said it reverted an updated (the one above). i attempted a install again, but same problem. I didn’t want to re-image as it was an VIPs machine, and time was of the essence. I took a whim, and decided to install all the latest drivers from the laptop OEM vendor (In case some was using MS drivers instead), after that tried the update again, and got a successful install.  Phewwww!

VMware HA down after 6.5 patch

The Story

So the other day I tested the latest VMware patch that was released as blogged about here.

Then I ran the patch on a clients setup which was on 6.5 instead of 6.7. Didn’t think would be much different and in terms of steps to follow it wasn’t.

First thing to note though is validating the vCenter root password to ensure it isn’t expired. (On 6.7u1 a newer)Else the updater will tell you the upgrade can’t continue.

Logged into vCenter (SSH/Console) once in the shell:

passwd

To see the status of the account.

chage -l root

To set the root password to never expire (do so at your own risk, or if allowed by policies)

chage -I -1 -m 0 -M 99999 -E -1 root

Install patch update, and reboot vCenter.

All is good until…

ERROR: HA Down

So after I logged into the vCenter server, an older cluster was fine, but a newer cluster with newer hosts showed a couple errors.

For the cluster itself:

“cannot find vSphere HA master”

For the ESXi hosts

“Cannot install the vCenter Server agent service”

So off to the internet I go! I also ask people on  IRC if they have come across this, and crickets. I found this blog post, and all the troubleshooting steps lead to no real solution unfortunately. It was a bit annoying that “it could be due to many reason such as…” and list them off with vCenter update being one of them, but then goes throw common standard troubleshooting steps. Which is nice, but non of them are analytical to determine which of the root causes caused it, as to actual resolve it instead of “throwing darts at a dart board”.

Anyway I decided to create an SR with VMware, and uploaded the logs. While I kept looking for an answer, and found this VMware KB.

Which funny the resolution states… “This issue is resolved in vCenter Server 6.5.x, available at VMware Downloads.”

That’s ironic, I Just updated to cause this problem, hahaha.

Anyway, my Colleague notices the “work around”…

“To work around this issue in earlier versions, place the affected host(s) in maintenance mode and reboot them to clear the reboot request.”

I didn’t exactly check the logs and wasn’t sure if there actually was a pending reboot, but figured it was worth a shot.

The Reboot

So, vMotion all VMs off the host, no problem, put into maintenance mode, no problem, send host for reboot….

Watching screen, still at ESXi console login…. monitoring sensors indicate host is inaccessible, pings are still up and the Embedded Host Controller (EHC) is unresponsive…. ugghhhh ok…..

Press F2/F12 at console “direct management as been disabled” like uhhh ok…

I found this, a command to hard reboot, but I can’t SSH in, and I can’t access the Embedded Host Controller… so no way to enter it…

reboot -n -f

Then found this with the same problem… the solution… like computer in a stuck state, hard shutdown. So pressed the power button for 10-20 seconds, till the server was fully off. Then powered it back on.

The Unexpected

At this point I was figuring the usual, it comes back up, and shows up in vCenter. Nope, instead the server showed disconnected in vcenter, downed state. I managed to log into the Embedded Host Controller, but found the VMs I had vMotion still on it in a ghosted state. I figured this wouldn’t be a problem after reconnecting to vCenter it should pick up on the clean state of those VM’s being on the other hosts.

Click reconnect host…

Error: failed to login with the vim admin password

Not gonna lie, at  this point I got pretty upset. You know, HULK SMASH! Type deal. However instead of smashing my monitors, which wouldn’t have been helpful, I went back to Google.

I found this VMware KB, along with this thread post and pieced together a resolution from both. The main thing was the KB wanted to reinstall the agents, the thread post seemed most people just need the services restarted.

So I removed the host from vCenter (Remove from inventory), also removed the ghosted VM’s via the EHC, enabled SSH, restarted the VPXA and HOSTD services.

/etc/init.d/hostd restart

/etc/init.d/vpxa restart

Then re-added the host to vCenter and to the cluster, and it worked just fine.

The Next Server

Alright now so now vMotion all the VMs to this now rebooted host. So we can do the same thing on the alternative ESXi host to make sure they are all good.

Go to set the host into maintenance mode, and reboot, this server sure enough hangs at the reboot just like the other host. I figured the process was going to be the same here, however the results actually were not.

This time the host actually did reconnect to vCenter after the reboot but it was not in Maintenance mode…. wait what?

I figured that was weird and would give it another reboot, when I went to put it into Maintenance Mode, it got stuck at 2%… I was like ughhhh wat? weird part was they even stated orphaned ghosted VM’s so I thought maybe it had them at this point.

Googling this, I didn’t find of an answer, and just when I was about to hard reboot the host again (after 20 minutes) it succeeded. I was like wat?

Then sent a reboot which I think took like 5 minutes to apply, all kinds of weird were happening. While it was rebooting I disconnected the host from vCenter (not removed), and waited for the reboot, then accessed this hosts EHC.

It was at this point I got a bit curious about how you determine if a host needs a reboot, since the vCenter didn’t tell, and the EHC didn’t tell… How was I suppose to know considering I didn’t install any additional VIBs after deployment… I found this reddit post with the same question.

Some weird answers the best being:

vim-cmd hostsvc/hostsummary|grep -i reboot

The real thing that made me raise my brow was this convo bit:

Like Wat?!?!?! hahaha Anyway, by this time I got an answer from VMware support, and they simply asked when the error happened, and if I had a snippet of the error, and if I rebooted the vCenter server….

Like really…. ok don’t look at the logs I provided. So ignoring the email for now to actually fix the problem. At this point I looked at the logs my self for the host I was currently working on and noticed one entry which should be shown at the summary page of the host.

“Scratch location not set”… well poop… you can see this KB so after correcting that, and rebooting the server again, it seemed to be working perfectly fine.

So removed from the inventory, ensured no VPXuser existed on the host, restarted the services, and re-added the host.

Moment of Truth

So after ALL that! I got down on my knees, I put my head down on my chair, I locked my hands together, and I prayed to some higher power to let this work.

I proceeded to enable HA on the cluster. The process of configuring HA on both host lingered @ 8% for a while. I took a short walk, in preparation for the failure, to my amazement it worked!

WOOOOOOOOO!!!

Summary

After this I’d almost recommend validating rebooting hosts before doing a vCenter update, but that’s also a bit excessive. So maybe at least try the commands on ESXi servers to ensure there’s no pending reboot on ESXi hosts before initiating a vCenter update.

I hope this blog posts helps anyone experiencing the same type of issue.