VMware ESXi boot and the Config

Sadly this post will be really short as again, lots going on. Recovering a host that failed after a regular reboot, which had a superblock corruption on it’s main OS drive. Also, the BELK series will be done, I just need a bit more time. Sorry for the delays.

“Failed to load /sb.v00” [Inconsistent Data]

Since this drive was not on the main datastore on the host all the VMs were unaffected.

Now loading linux showed the drive data was till accessible, but I also had a feeling this USB drive was on it’s way out. I created a copy using DD, *sadly I didn’t do it the smart way and place it on a drive big enough to save it as a image file, but instead directly to another drive of the same size.

I tried to install the same image of ESXi on top of the current one in hopes it would fix the boot partition files along the way. This only made the host get past /sb.v00 and vault randomly past it with “Fatal Error: 6 [Buffer Too Small]”

I was pretty tired at this point since the server boot times are rather long and attempts were becoming tedious. I did another DD operation of the drive, to the same drive (still not having learned my lesson) and when I awoke to my dismay, it failed only transferring 5 gigs with an I/O error. This really made me sure the drive was on the way out, but it was still mountable (the boot partitions 5, 6 and 8)

At this point you might be wondering, why doesn’t he just re-install and reload a backup config? Which is fair question, however one was not on hand, but surely it must be somewhere on the drive. I know how to create and recover on a working host but a one that can’t boot? Then I found this gem.

Now through out my attempts I did discover the boot partitions to be 5 and 6 and I did even copy them from a new install to my copied version I made about and it did boot but was a stock config. I was stumped till I read the section from the above blog post on “How to recover config from a system that doesn’t boot”. Line 7 was what nailed it on the head for me:

“mount /dev/sda5 /mnt/sda5

7. In the /mnt/sda5 directory, you can find the state.tgz file that contains ESXi configuration. This directory (in which state.tgz is stored) is called /bootblank/ when an ESXi host is booted.”

I was just like … wat? That’s it. Grabbed the bad main drive mounted on a linux system, saw the state.tgz file and made a copy of it, connected the new drive that had a base ESXi config, replaced the state.tgz file with the one I copied, booted it and there was the host in full working state with all network configs and registered VMs and everything.

Not sure why the config is stored in the boot partition, but there you go. Huge Shout out to Michael Bose for his write I suggest you check it out. I have saved it case it disappears from the internet and I can re-publish it. For now just visit the link. 🙂

Fixing Veeam (Veeam Service won’t Start)

Veeam Won’t Start

Yeap, the one thing you don’t want can happen at the worst time. For me I was testing a hypervisor upgrade scenario, and my host sure enough failed to come up successfully. Well…. shit.

While I was going crazy trying to bring my host back up (the stock ESXi images wasn’t good enough cause…. RealTek, yeah… this Mobo I picked was an overall bad choice, sad cause it’s ASUS… anyway…

I went to go restore some VMs from backup onto other hosts till I could recover my main host (find that custom ESXi install image) and to my dismay… Veeam console failed to connect…

Failed to connect to the Veeam Backup & Replication server:
No connection could be made because the target machine actively refused it :9392

ughhhh, what? this is a standalone server, not domain joined, no special services account or MSAs, or separate servers, like what gives?

Event viewer is literally useless… as nothing shows anywhere for any hints.

First Fix Attempt

OK so, the usual, google, and let’s see here

Like other symptoms not much help and a generic console error, so this fix was worth a shot, what I took away from it was how to do a manual DB backup (assuming this is all the settings and configurations if re-install required) and some registry keys used by Veeam and that this was not the problem (not the droids you are after). I thought maybe I had updated and not tested, as I do tend to do shutdown instead of reboot, with my limited resources and well windows is heavy on resources.

HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication\SqlServerName (This is the server name where SQL is running)
HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication\SqlInstanceName (This is the instance name needed for the connection, which is in the format Servername\InstanceName)
HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication\SqlDatabaseName (This is the database name in the Databases folder once you connect)

But sadly no good, as I guess my issue is not related to any lock files on the SQL DB… ok so what else is there…

Second Fix

So I started reading this one and at first I was thinking, yup same problem, and reading along, I like Foggy but them not sharing the answer was rather annoying… then after some others reported the solution and my jaw literally dropped (probably why they tell you call support, cause this is some dirty laundry…)

as Tommy stated

“It is very likely to caused by the changing of the host name, do refer to the following link, i managed to my Veeam service started again.”

What….

sure enough running the req query command and hostname showed I had indeed changed the hostname to something more suitable AFTER installation.

Why they’d rely on a reg key vs a simply enviroment variable is really beyond me, cause the problem with using a reg key for this is pretty clear here….

So let’s try to fix this, thanks to the second guys reply by spacecrab:

“I know this is an old post, but thank you for replying with this information. I installed Veeam Backup and Replication before changing the default generated hostname, and it was really throwing me through a loop. The fix noted at that url worked perfectly after I rebooted to reset the services. I’ll relay the content here in case that sources goes away.

In my case I had renamed the computer from a default WIN234dfasd type name to a ‘much’ better alternative. Veeam refers to the local computer name in a couple of registry entries and promptly stopped working – which we didn’t notice until later.

The keys are:

HKLM\SOFTWARE\Veeam\Veeam Backup and Replication\SqlServerName
HKLM\SOFTWARE\Veeam\Veeam Backup Catalog\CatalogSharedFolderPath

Backup of the site’s Virtual Machines is now running again.”

alright let’s update some keys to be Veeam…. I just used reg edit to do this vs figuring out the exact query (although I probably should figure out a query in-case other keys but meh….

and after a reboot… Woah! all the Veeam services are running, sure enough I can connect to my standalone Veeam Server! Wooo thanks Spacecrab!