PVE Hosts Won’t Boot, Missing Drive

This is pretty dumb… every other hypervisor I’ve ever played with, if the boot drive is fine… the OS boots… period….

Yet the other day I tried to boot my PVE host and it just won’t boot it would get stuck stating that a datastore (nothing that’s a dependency for the OS to actually boot) was causing the OS not to boot….

I found this PVE thread that was more recent with a comment that worked for the OP.

“if you created the partition via the gui, we create a systemd-mount unit under /etc/systemd/system

(e.g. mnt-datastore-foo.mount)

you can disable that unit with ‘systemctl disable <unit>’

or delete the file

we’ll improve the docs in the near future and have planned to make the gui disk management a bit easier in regards to removing/reusing”

This was posted in 2021, yet when I checked that path (booting into Recovery mode) it didn’t contain any file with a ending of .mount… So not sure what this is about. I did however find this thread which was exactly my problem… and funny enough the OP literally posted their own answer (which is the answer here as well) and no other comments made on the post, which was created in  2018…

[SOLVED] – Reboot Proxmox Host now will not boot | Proxmox Support Forum

“So I had upgraded some packages and the proxmox host recommended rebooting the system. After rebooting the system hangs at the screen showing [DEPEND] in yellow for 3 lines:
Dependency failed for /mnt/Media
Dependency failed for local file system
Dependency failed for File system check on /dev/Data-Storage/Media

I tried running control-D to continue but it does not continue.

I’m guessing I need to clean up the entries how can i do that? I’m assuming I just need to boot into emergency and edit /etc/fstab and remove the entries?

OK yes removing those from /etc/fstab fixed it and now it boots.”

This is exactly what I did as well… I saw the offending entry which was a BTRFS storage I had configured in the past and that storage unit had been shutdown. (I thought I blogged this, but I only blogged about using LVM over iscsi.. Configuring shared LVM over iSCSI on Proxmox – Zewwy’s Info Tech Talks)

Anyway, removing the entry from fstab and rebooting.. bam PVE host came right up.

Constructive criticism to PVE, while yes any knowledgeable Linux sysadmin will figure out how to fix this, as I just did here. However, how about NOT having the boot process fail simply cause a configured storage is not available… like all other hypervisors… BOOT the host and show the storage as failed in the management UI to clean it up that way…. Just.. food for thought….

HP Laptop – OS Boot Loop

I just wanted to make a short post today on how I fixed a laptop I thought was fully toast.

The Story

This story being months ago, a user’s laptop wouldn’t boot properly following a Windows Update. Taking a look at it, and after he mentioned it just going into a “looping cycle” it was acting really weird! Symptoms of the device:

  1. The system would boot into the EFUI/BIOS menu without any issues, and could stay running here endlessly.
  2. The system could run all EFUI based hardware testes, and all reported functional hardware with no faults.
  3. As soon as you would get into the boot loader of any OS, the system would hard shutdown and power back on, wash, rinse, repeat.

What had me so baffled was that any OS boot would cause the hard shutdown (power lights all go off, screen goes dead blank), and then the power LED would come back on, and the POST screen would show, If I interrupted it, by going into the BIOS or doing self tests, it wouldn’t hard shutdown at all.

I tried everything (I had a few of these laptops already taken apart, so even tried swapping all the parts, including the battery (which is these particular laptops source of power for the CMOS) yup,  the laptop battery is the BIOS config saving power source. However even that didn’t fix it, and thus it sat on a shelf for months.

Till Today

I was working on another project when I got hit with a layer 2 segregation issue in the design plans, which had me really upset, and mind hurting. So I decided to step back from the problem and just happed to have this particular laptop on my desk that day as I needed some laptops for testing and realized it was this machine, so it just sat there.

I decided to take another shot at it. Since I was already on a path of failure, figured what’s the worst, just a bit more wasted time before going home.

So anyway, I thought I might as well see if there’s some new firmware and maybe that might help fix it (seems almost firmware related). So low n behold I grab the latest firmware for this laptop and create a “recovery USB stick”, then find out you simply plug that USB stick into the laptop, power off the machine, press n hold the “Windows Key + b” then power the unit on while still holding that key combination.

Holy crap, first time I follow instructions and it actually works, mind blown. So it completes the firmware update, everything seems find try to boot a linux OS from a USB drive. Boot loop, ahhh FFS.

I decided to vent my fustrations on the local #SkullSpace IRC channel, and another IT tech from the states, said something of the usual nature “Open and reseat all the things?”. Which of course as I stated about had a couple of these already open for repair and swapped all the goodies with no different result.

When I made the moment back to them about what I stated above: “I tried everything (I had a few of these laptops already taken apart, so even tried swapping all the parts, including the battery (which is these particular laptops source of power for the CMOS) yup,  the laptop battery is the BIOS config saving power source”, and when I mentioned that to them I noticed I had done the whole firmware upgrade without the battery plugged in at all.

I decided to plug in the battery and try to boot (of course this was always done before so didn’t think anything of it), when I booted it stated the CMOS had been reset (well yeah the battery was unplugged the whole time), and pressed enter to continue… and it didn’t boot loop.

At this moment I was like “WTF”. I was blown away to see after months of collecting dust I somehow magically managed to get this laptop to boot normally.

That’s what I call a good way to end the day…. now about that layer 2 segregation issue….

*Update* It went right back into the OS boot loop, it’s effed. 😉 would require a full mainboard replacement, not happening.

VMware ESXi boot and the Config

Sadly this post will be really short as again, lots going on. Recovering a host that failed after a regular reboot, which had a superblock corruption on it’s main OS drive. Also, the BELK series will be done, I just need a bit more time. Sorry for the delays.

“Failed to load /sb.v00” [Inconsistent Data]

Since this drive was not on the main datastore on the host all the VMs were unaffected.

Now loading linux showed the drive data was till accessible, but I also had a feeling this USB drive was on it’s way out. I created a copy using DD, *sadly I didn’t do it the smart way and place it on a drive big enough to save it as a image file, but instead directly to another drive of the same size.

I tried to install the same image of ESXi on top of the current one in hopes it would fix the boot partition files along the way. This only made the host get past /sb.v00 and vault randomly past it with “Fatal Error: 6 [Buffer Too Small]”

I was pretty tired at this point since the server boot times are rather long and attempts were becoming tedious. I did another DD operation of the drive, to the same drive (still not having learned my lesson) and when I awoke to my dismay, it failed only transferring 5 gigs with an I/O error. This really made me sure the drive was on the way out, but it was still mountable (the boot partitions 5, 6 and 8)

At this point you might be wondering, why doesn’t he just re-install and reload a backup config? Which is fair question, however one was not on hand, but surely it must be somewhere on the drive. I know how to create and recover on a working host but a one that can’t boot? Then I found this gem.

Now through out my attempts I did discover the boot partitions to be 5 and 6 and I did even copy them from a new install to my copied version I made about and it did boot but was a stock config. I was stumped till I read the section from the above blog post on “How to recover config from a system that doesn’t boot”. Line 7 was what nailed it on the head for me:

“mount /dev/sda5 /mnt/sda5

7. In the /mnt/sda5 directory, you can find the state.tgz file that contains ESXi configuration. This directory (in which state.tgz is stored) is called /bootblank/ when an ESXi host is booted.”

I was just like … wat? That’s it. Grabbed the bad main drive mounted on a linux system, saw the state.tgz file and made a copy of it, connected the new drive that had a base ESXi config, replaced the state.tgz file with the one I copied, booted it and there was the host in full working state with all network configs and registered VMs and everything.

Not sure why the config is stored in the boot partition, but there you go. Huge Shout out to Michael Bose for his write I suggest you check it out. I have saved it case it disappears from the internet and I can re-publish it. For now just visit the link. 🙂

Upgrading a Windows Volume from MBR to GPT to support EUFI boot and features

I’m going to keep this post short, so there won’t be any use of the TOC plug I recently deployed. 😛

I recently used a img of a sysprepped machine I used to deploy new machines. To my dismay the image was created with an MBR partition and was mostly used via BIOS boot options. This isn’t very secure as many of the security features of EUFI.

It’s been well known that moving from MBR to GPT back in the day was a painful process. I won’t go over the details as this “Microsoft Mechanics” video does a decent job of doing this.

If you’d like a little more nitty gritty details, you can view this Technet Blog.

In short:

  1. Boot into PE.
  2. use “mbr2gpt” command to validate and convert the partition.
  3. Boot into the Mainboard Config (Bios/EUFI)
  4. Configure boot option to EUFI

Now THAT was easy!