Fix Orphaned Datastore in vCenter

Story

The Precursor

I did NOT want to write this blog post. This post comes from the fact that VMware is not perfect and I’m here to air some dirty laundry…. Let’s get started.

UPDATE* Read on if you want to get into the nitty gritty, otherwise go to the Summary section, for me rebooting the VCSA resolved the issue.

The Intro

OK, I’ll keep this short. 1 vCenter, 2 hosts, 1 cluster. 1 host started to act “weird”; Random power off,   Boots normal but USB controller not working.

Now this was annoying … A LOT, so I decided I would install ESXi on the local RAID array instead of USB.

Step 1) Make a backup of the ESXi config.

Step 2) Re-install ESXi. When I went to re-install ESXi it stated all data in the exiting datastore would be deleted. Whoops lets move all data first.

Step 2a) I removed all data from the datastore

Step 2b) Delete the Datastore, , and THIS IS THE STEP THAT CAUSED ME ALL FUTURE GRIEF IN THIS BLOG POST! DO NOT FORGET TO DO THIS STEP! IF YOU DO YOU WILL HAVE TO DO EVERYTHING ELSE THIS POST IS TALKING ABOUT!

Unmount, and delete the datastore. YOU HAVE BEEN WARNED!

*during my testing I found this was not always the case. I was however able to replicate the issue in my lab after a couple of attempts.

Step 3) Re-install ESXi

Step 4) Reload saved Config file, and all is done.

This is when my heart sunk.

The Assumptions

I had the following wrong assumptions during this terrible mistake:

  1. Datastore names are saved in the backup config.
    INCORRECT – Datastore names are literally volume labels and stay with the volume in which they were created on.
    UUID is stored on the device FS SuperBlock.
  2. Removing an orphaned Object in vCenter would be easy.
  3. Renaming a Datastore would be easy.
    1. If the host is managed by vCenter Server, you cannot rename the datastore by directly accessing the host from the VMware Host Client. You must rename the datastore from vCenter Server.
  4. Installing on USB drive defaults all install mount points on the USB drive.
    INCORRECT – There’s magic involved.

Every one of these assumptions burnt me hard.

The Problem

So it wasn’t until I clicked on the datastore section of vCenter when my heart sunk. The old datastore was listed attempting to right click and delete the orphaned datastore shot me with another surprise…. the options were greyed out, I went to google to see if I was alone. It turns out I was not alone, but the blog source I found also did not seem very promising… How to easily kill a zombie datastore in your VMware vSphere lab | (tinkertry.com)

Now this blog post title is very misleading, one can say the solution he did was “easy” but guess what … it’s not support by VMware. As he even states “Warning: this is a bit of a hack, suited for labs only”. Alright so this is no good so far.

There was one other notable source. This one mentioned looking out for related objects that might still be linked to the Datastore, in this case there was none. It was purely orphaned.

Talking to other in #VMware on libera chat, told me it might be possibly linked to a scratch location which is probably the reason for the option being greyed out, while this might be a reasonable case for a host, for vCenter in which the scratch location is dependent on a host itself, not vCenter, it should have the ability to clear the datastore, as the ESXi host itself will determine where the scratch location is stored (foreshadowing, this causes me more grief).

In my situation, unlike tinkertry’s situation, I knew exactly what caused the problem, I did not rename the datastore accordingly. Since the datastore name was not named appropriately after being re-created, it was mounted and shown as a new datastore.

The Plan

It’s one thing to fuck up, it another to fess up, and it’s yet another to have a plan. If you can fix your mistake, it’s prime evidence of learning and growing as you live life. One must always perceiver. Here’s my plan.

Since building the host new and restoring the config with a wrong datastore, I figured I’d I did the same but with the proper datastore in place, I should be able to remove it by bringing it back up.

I had a couple issues to overcome. First one was my 3rd assumption: That renaming a datastore was easy. Which, usually, it is, however… in this case attempting to rename it the same as the missing datastore simply told me the datastore already exists. Sooo poop, you can’t do it directly from a ESXi host unless it’s not managed by vCenter. So as you can tell a catch22, the only way to get past this was to do my plan, which was the same as how I got in this mess to being with. But sadly I didn’t know how bad a hole I had created.

So after installing brand new on another USB stick, I went to create the new datastore with the old name, overwriting the partition table ESXi install created… and you guessed it. Failed to create VMFS datastore. Cannot change host configuration. – Zewwy’s Info Tech Talks

Obviously I had gone through this before, but this time was different. it turned out attempting to clear the GPT partition table and replace it with msdos(MBR) based one failed telling me it was a read-only disk. Huh?

Googling this, I found this thread which seemed to be the root cause… Yeap my 4th assumption: “Installing on USB drive defaults all install mount points on the USB drive.”

so doing a “ls -l”, and “esxcli storage filesystem list” then “vmkfstools -P MOUNTPOINT” I was veriy esay to discover that the scratch and coredump were pointing to the local RAID logical volume I created which overwrote the initial datastore when ESXi was installed. Talk about a major annoyance, like I get why it did what it did, but in this case it is  major hindrance as I can’t clear the logical disk partition to create a new one which will be hold the datastore I need to have mounted there… mhmmm

So I kept trying to change the core dump location and the scratch location and the host on reboot kept picking the old location which was on the local RAID logical volume that kept preventing me from moving forward. Regardless if I did it via the GUI or if I did it via the backend cmd “vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /tmp/scratch” even though VMware KB mentions to create this path path first with mkdir what I found was the creation of this path was not persistent, and it would seem that since it doesn’t exist at boot ESXi changes it via it’s usual “Magic”:

“ESXi selects one of these scratch locations during startup in this order of preference:
The location configured in the /etc/vmware/locker.conf configuration file, set by the ScratchConfig.ConfiguredScratchLocation configuration option, as in this article.
A Fat16 filesystem of at least 4 GB on the Local Boot device.
A Fat16 filesystem of at least 4 GB on a Local device.
A VMFS datastore on a Local device, in a .locker/ directory.
A ramdisk at /tmp/scratch/.”

So in this case, I found this post around a similar issue, and turns out setting the scratch location to just /tmp, worked.

When I attempted to wipe the drive partitions I was again greeted by read-only, however this time it was right back to the coredump location issues, which I verified by running:

esxcli system coredump partition get

which showed me the drive, so I used the unmounted final partition of the USB stick in it’s place:

esxcli system coredump partition set -p USBDriveNAA:PartNum

Which sure enough worked, and I was able to set the logical drive to have a msdos based partition, yay I can finally re-create the datastore and restore the config!

So when the OP in that one VMware thread post said congrats you found 50% of the problem I guess he was right it goes like this.

  1. Scratch
  2. Coredump

Fix these and you can reuse the logical drive for a datastore. Let’s re-create that datastore…

This is hen my heart sunk yet again…

So I created the datastore successfully however… I had to learn about those peskey UUID’s…

The UUID is comprised of four components. Lets understand this by taking example of one of the vmfs volume’s UUID : 591ac3ec-cc6af9a9-47c5-0050560346b9

System Time (591ac3ec)
CPU Timestamp (cc6af9a9)
Random Number (47c5)
MAC Address – Management Port uplink of the host used to re-signature or create the datastore (0050560346b9)

FFS… I can never be able to reproduce that… and sure enough thats why my UUIDs not longer aligned:

I figured maybe I could make the file, and create a custom symlink to that new file with the same name, but nope “operation not permitted”:

Fuck! well now I don’t know if i can fix this, or if restoring the config with the same datastore name but different UUID will fix it or make things worse…. fuck me man…. not sure I want to try this… might have to do this on my home lab first…

Alright I finally was able to reproduce the problem in my home lab!

Let’s see if my idea above will work…

Step 1) Make config Backup of ESXi host. (should have one before mess up but will use current)

Step 2) Reload host to factory defaults.

Step 3) rename datastore

Step 4) reload config

poop… I was afraid of that…

ok i even tried, disconnecting host from vcenter after deleting the datstore  I could, recreate with same name and it always attaches with appending (1) cause the datastore exists as far as vCenter thinks, since the UUID can never be recovered… I heard a vCenter reboot may help let’s see…

But first I want to go down a rabbit hole…. the Datastore UUID, in this case the ACTUAL datastore UUID, not the one listed in a VM’s config file (.vmx), not the one listed in the vCenter DB’s (that we are trying to fix), but the one actually associated with the Datastore… after much searching…  it seems it is saved in the File Systems “SuperBlock“, in most other File Systems there’s some command to edit the UUID if you really need to. However, for VMFS all I could find was re-signaturing for cloned volumes

So it would seem if I simply would have saved the first 4MB of the logical disk, or partition, not 100% sure which at this time, but I could have in theory done a DD to replace it and recovered the original UUID and then connect the host back to vCenter.

I guess I’ll try a reboot here see what happens….

Well look at that.. it worked…

Summary

  1. Try a reboot
  2. If reboot does not Fix it call VMware Support.
  3. If you don’t have support, You can try to much the with backend DB (do so at your own risk).

 

ESXi 6.x Datastore Not Mounted

Quick post here, I had to recover from a flooded basement. Sorry for the day outage. I had to put my disc in another server and load FreeNAS, and import my ZFS volumes, recreate the iSCSI targets, and then I added them to my ESXi hosts, and rescanning the HBAs shows the disks…

but the datastores were not visible…

so I googled and found this VMware thread with some helpful commands to try. (I do kind of agree with the OP, that its annoying they removed the front end UI for import that could handle this)

esxcli storage vmfs snapshot list

esxcfg-volume -M UUID

Ehh it worked!

Hope this helps someone. If this doesn’t work you might have some other underling issue?

How to remove a Datastore from a vSphere Cluster

How to Remove a Datastore

Intro

Hey everyone,

I figured I’d write up a quick little help guide on removing a Datastore. Now this isn’t new and likely to be buried on the internet because of it. However in my searches I have found the following sources to be great reads. I highly recommend you check them out.

1)  Official Source VMware KB2004605.

2) A Blog guide by Sam McGeown, here.

3) A post by Mike on cswitchzero.

Now let’s go through the checklist from the official source one by one.

Check List

  • If the LUN is being used as a VMFS datastore, all objects (for example, virtual machines, templates, and Snapshots) stored on the VMFS datastore must be unregistered or moved to another datastore.-This one is pretty easy navigate to the datastore files and check. You may find some remanence from the following though.
  • All CD/DVD images located on the VMFS datastore must also be unmounted/unregistered from the virtual machines.-This shouldn’t even be the case if you did check one.
  • The datastore is not used for vSphere HA heartbeat.-This setting will use a folder labeled “.vSphere-HA”
    For a Quick overview of Datastore Heart beating See here
    To “remove” aka change them See here
  • The datastore is not part of a datastore cluster.-You can find useless help on this process from VMware here. I’m assuming it’s an easy task via the WebUI
  • The datastore is not managed by Storage DRS.-If you removed it from the datastore cluster, how could this be an issue?
  • The datastore is not configured as a diagnostic coredump Partition/File and Scratch Partition. For more information, see the following:
  • Storage I/O Control is disabled for the datastore.-See here on how to enable (disabling is the exact reverse)
  • No third-party scripts or utilities running on the ESXi host can access the LUN that has issue.-Honestly I’m not sure how you could check this… even when doing some quick research, you can have scripts I guess that are not on the hosts, but run by alternative machines via PowerCLI. As described in this community post. I guess you’d have to know, either way the scripts would just fail, shouldn’t affect the vSphere cluster.
  • If the LUN is being used as an RDM, remove the RDM from the virtual machine. Click Edit Settings, highlight the RDM hard disk, and click Remove. Select Delete from disk if it is not selected and click OK.Note: This destroys the mapping file but not the LUN content.

    – This is more involving the removing of the backend physical device. Which in my case is the final goal. Though if yours was just to remove a datastore while keeping the physical storage in place this can be ignored.

  • As noted by Sam but not the official source or Mike is if you see a .dvsData folder. as stated by SAM “The .vdsData folder is created on any VMFS store that has a Virtual Machine on it that also participates in the VDS – so by migrating your VMs off the datastore you’ll be ensuring the configuration data is elsewhere.”
  • Check that there are no processes locking the VMFS with this command:
esxcli storage core device world list -d

Datastore Removal Steps

Step 1) Follow the Checklist above.

Make sure no files reside on the Datastore.

Step 2) Unmount Datastore from all ESXi hosts.

As noted by SAM blog post even in vSphere 5.x using the C# phat client, this was possible to do via a wizard against all hosts that have the datastore mounted. Even on the newer HTML5 WebUI this is still possible (I think everyone wants to fully forget that VMware chose flash for a short time).

At this point the Datastore will show up as inaccessible to vSphere. As noted by both Mike and Sam. This will be the same anywhere from 5.x-7.x (As noted by Mike it might be slightly more important to follow procedures with earlier versions of ESXi 3 or 4). If the Check list was followed, there should be no issues unmounting the datastore.

If you need to do this via esxcli (Source):

# esxcli storage filesystem list

Unmount the datastore by running the command:

# esxcli storage filesystem unmount [-u UUID | -l label | -p path ]

For example, use one of these commands to unmount the LUN01 datastore:

# esxcli storage filesystem unmount -l LUN01

# esxcli storage filesystem unmount -u 4e414917-a8d75514-6bae-0019b9f1ecf4

# esxcli storage filesystem unmount -p /vmfs/volumes/4e414917-a8d75514-6bae-0019b9f1ecf4

Step 3) Detach the LUN from all hosts.

As noted by Sam, if you are on 5.x you might want to automate this via PowerCLI. Then noted by Mike, newer 7.x can now do this in bulk via the Management WebUI.

6/7 WebUI -> Hosts n Clusters -> Hosts -> Cluster -> Host -> Configure Tab -> Storage Device (left side tree) -> Highlight Device -> Detach

for esxcli

Obtaining the NAA ID of the LUN to be removed

esxcli storage vmfs extent list

To detach the device/LUN, run the command:

# esxcli storage core device set --state=off -d NAA_ID

6. To verify that the device is offline, run the command:

# esxcli storage core device list -d NAA_ID

The output, which shows that the Status of the disk is off.

Step 4) Rescan HBAs

At this point, if you rescan all HBAs on all hosts the inaccessible datastore should be gone from the WebUI.

At this point you can remove the LUN from being seen (disc from showing up under devices) this will either be iSCSI based configurations (remove static and dynamic IPs from the iSCSI initiator settings on each host.) Mostly likely for a shared VMFS datastore.

It could be a local disc over a local storage controller (such as a logical drive created in RAID) such as behind a Pxxx storage controller.

Removing the source device will always be dependent on how it was configured in the first place.

Summary

So today we covered removing a Datastore. The important thing to remember is removing a Datastore takes a lot more steps than removing one, cause so many different VM’s and services can be applied to a datastore once it has started being used.

In many cases, the SysLog and Scratch partition are big hang ups, and should be looked at closely. Which, however, as stated if you are actually checking for files on the datastore this stuff will be pretty evident.

In most cases, ensure you follow the check list and the process should be pretty smooth. Hope this helps someone.

*Note* I often provide screen shots to provide some context, in this case I decided to leave it more generic to span multiple versions of vSphere.

ESXi new install; failed to create new Datastore

Well I booted up a new server, created a new logical drive, bot ESXi and Failed to create datastore… what is this?

Google help? Yeah Forms help.

1. Show connected disks.

ls -lha /vmfs/devices/disks/

(Verify the disk is seen. You will probably see your disk ID then :1. This is a partition on the disk. We only need to work about the main disk ID.)

Neat. next

2. Show the error on disk.

partedUtil getptbl /vmfs/devices/disks/(disk ID)

(It will probably indicate that the GPT is located beyond the end of the disk.)

Ohhh yeah, huh… fix it

3. Wipe disk and rewrite with a basic MSDOS partion.

partedUtil setptbl /vmfs/devices/disks/(disk ID) msdos

(The output from this should be similar to msdos and the next line will be o o o o)

Go to create data store after this, yay it worked. Please note to use your own values, images are just for reference.

*UPDATE* I went to reuse some old drives from an old RAID controller. In this case I had removed the logical drive from the old RAID configuration, pulled the disks. Since they were same Caddy as an alternative server, and went on to create some new logical drives to use as an alternative datastore on this particular host.

In the examples above, it would fail at creation of the datastore. In this example it failed at the point in the wizard to define the partition to create. with an error as follows:

“Either the selected disk already has a VMFS datastore or the host cannot perform a partition table conversion. Select another disk” in a nice red banner.

Now attempting My usual fix as mentioned above resulted in…

… to be updated (i have such a headache right now from the endless issues)

Had to clear the drives to fix this problem (delete logical drive) rip Drives out of server, use a USB enclosure to use “diskpart” and the “clean command on windows to clean the drives.

Then after that the health light on the server went off, saying my one disk or caddy is “unauthentic” even though it was just working. Apparently terrible engineer caddy’s.

Which to find out this issue I had to get into iLO which the admin password was unknown so had to run up my old blog post to get into that. and now after all that.. I have a headache.

Good job computers, you managed to make my day fantastic… again.

Using A USB Device as a Datastore on VMware ESXi

USB datastore

Attach USB device in Windows -> DiskPart -> Select Disk -> Clean

To see USB device on host, stop arbitrator service, save to config, reboot

Now the hard parts, normally even following guide to see USB stick on host mount 4 gig volume, I had 16 gig cruiser to use.

Source

Perform a lspci -v to get all the USB UHCI and EHCI controllers to show up.
This shows up for example as:

DEVICE=/dev/disks/t10.JMicron_USB_to_ATA2FATAPI_Bridge
partedUtil mklabel ${DEVICE} msdos
END_SECTOR=$(eval expr $(partedUtil getptbl ${DEVICE} | tail -1 | awk '{print $1 " \\* " $2 " \\* " $3}') - 1)
/sbin/partedUtil "setptbl" "${DEVICE}" "gpt" "1 2048 ${END_SECTOR} AA31E02A400F11DB9590000C2911D1B8 0"
/sbin/vmkfstools -C vmfs5 -b 1m -S $(hostname -s)-local-datastore ${DEVICE}:1

That easy 😉

Remove “inaccessable” datastore from VCSA

In my previous post I mentioned restoring my ESXi after a bad upgrade. Today when I attempted to add it back into vCenter, it complained stating a Datastore with the same name exists. I was a bit stumped when I saw it showing up under the datastore area as inaccessible, when there should be nothing referencing it. Googling led me to this gem where MikeOD states:

“I figured it out. I was double checking on VM’s on those datastores. Under “related objects”, there were no VM’s or hosts, but there were two old templates that were still referenced by the original VCenter. When I right clicked on the template and selected “remove from inventory”, the data stores disappeared.”

mhmmm, looking at the associated VM, I checked one of it’s settings and sure enough, an old ISO was mounted on it:

just as Mike said, as soon as I removed the association, by changing the VM to client device, the inaccessible datastore went away.

You can also check for templates, snapshots, etc.