TPM security on a ESXi VM

Great part about vSphere 7 is it introduced the ability to add a TPM based hardware to a VM.

Let’s see if we can pull it off in our lab.

What I need a Key Provider, Lucky for use with 7.0.3 VMware provides a “Native Key Provider

During my deployment of the NKP, one requirement is to make a backup of the key I guess, which was failing for me. I found this VMware thread with someone having the same issue.

Sure enough, the comment by “acartwright” was pretty helpful, as I too opened the browser console and noticed the CORS errors. The only diff was I wasn’t using CNAMEs, per say, but I had done a pilot of vCenter renaming. the fact the names showing up as not matching and the ones that were listed in the console reminded me of that. When I went to check the hostname, and local host file, sure enough they had the incorrect name in there.

So, after following the steps in my old blog post to fix the hostname and the localhosts file, I tried to backup the NKP and it worked this time. 😀

So, sure there after this I went to add the TPM and I couldn’t find it, oh right it’s a newer feature, I’ll have to update the VM’s compatibility mode.

Made snapshot, updated to latest hardware ID, boots fine, lets add the TPM hardware, error can’t add TPM with snapshots. Ugh, fine delete snapshot (tested VM boots fine before doing this), add TPM success.

Before changing the VM boot option to EFI, boot the VM and boot the OS into Windows RE, use mbr2gpt command to convert the boot partitions to the proper type supported by EFI.

Once completed, change VM boot options to EFI, and check off secure boot.

Congrats you just configured a ESXi VM with a vTPM module. 🙂

 

Updating Power CLI 12

If you did an offline install, you may need to grab the package files from an online machine. Otherwise, you may have come across a warning error about an existing instance of power CLI when you go to run the main install cmdlet.

When I first went to run this, it told me the version would be installed “side-by-side” with my old version. Oh yeah, I forgot I did that…

Alright, so I use the force toggle, and it fails again… Oi…

Lucky for me the world is full of blogger these days and someone else had also come across this problem for the exact same reason.

VMware.PowerCLI install update error – Install-Package: Authenticode issuer | vGeek – Tales from real IT system Administration environment (vcloud-lab.com)

If you want all the nitty details check out their post, the main part I need was this one line, “This issue can be resolved deleting modules from the PowerShell modules folder inside Program Files. Once the modules folder for VMware are deleted try installing modules again, you can also mention the modules installation scope.”

AKA, Delete the old one, or point install to other location. He states he needed the old version but doesn’t specify for what. Anyway, I’ll just delete the old files.

So, at this point I figured I was going to have a snippet of a 100% clean install, but no, again something happened, and it is discussed here.

If I’m lucky I will not need to use any of the conflicting cmdlets and if I do; I’ll follow the suggestions in that thread.

OK let’s move on. Well, the commands were still not there, looks this has to succeed, and there’s no prefix option during install only import, which you can only do after install, the other option was to clobber the install. Not interested, so I went into Windows add/remove features, and removed the PowerShell module for Hyper-V. No reboot required, and the install worked.

the Hyper-v MMC snap in still works for most of my needs. Now that I finally have the 2 required pre-reqs in place.

Step 2a) connect to server via Power CLI

Why did this happen?

A: Cause self signed certificate on vCenter, and system accessing it doesn’t have the vCenter’s CA certificate in its own trusted ca store.

How can it be resolved?

A:  Option 1) Have a proper PKI deployed, get a proper signed cert for this service by the CA admin, assign the cert to the vCenter mgmt services. This option is outside the scope of this post.

Option 2) Install the Self Sign CA cert into the machine that’s running PowerCLI’s machine store’s trusted CA folder.

Option 3) Set the PowerCLI parameter settings to prompt to accept untrusted certificates.

I chose option 3:

Make sure when you set your variable to use single quotes and not double quotes (why this parameter takes system.string instead of secureString is beyond me).

While I understand the importance of PowerShell for scripting and automation and mass deployment situations, requiring it to apply a single toggle setting is a bit redic, take note VMware; Do better.

Fix Orphaned Datastore in vCenter

Story

The Precursor

I did NOT want to write this blog post. This post comes from the fact that VMware is not perfect and I’m here to air some dirty laundry…. Let’s get started.

UPDATE* Read on if you want to get into the nitty gritty, otherwise go to the Summary section, for me rebooting the VCSA resolved the issue.

The Intro

OK, I’ll keep this short. 1 vCenter, 2 hosts, 1 cluster. 1 host started to act “weird”; Random power off,   Boots normal but USB controller not working.

Now this was annoying … A LOT, so I decided I would install ESXi on the local RAID array instead of USB.

Step 1) Make a backup of the ESXi config.

Step 2) Re-install ESXi. When I went to re-install ESXi it stated all data in the exiting datastore would be deleted. Whoops lets move all data first.

Step 2a) I removed all data from the datastore

Step 2b) Delete the Datastore, , and THIS IS THE STEP THAT CAUSED ME ALL FUTURE GRIEF IN THIS BLOG POST! DO NOT FORGET TO DO THIS STEP! IF YOU DO YOU WILL HAVE TO DO EVERYTHING ELSE THIS POST IS TALKING ABOUT!

Unmount, and delete the datastore. YOU HAVE BEEN WARNED!

*during my testing I found this was not always the case. I was however able to replicate the issue in my lab after a couple of attempts.

Step 3) Re-install ESXi

Step 4) Reload saved Config file, and all is done.

This is when my heart sunk.

The Assumptions

I had the following wrong assumptions during this terrible mistake:

  1. Datastore names are saved in the backup config.
    INCORRECT – Datastore names are literally volume labels and stay with the volume in which they were created on.
    UUID is stored on the device FS SuperBlock.
  2. Removing an orphaned Object in vCenter would be easy.
  3. Renaming a Datastore would be easy.
    1. If the host is managed by vCenter Server, you cannot rename the datastore by directly accessing the host from the VMware Host Client. You must rename the datastore from vCenter Server.
  4. Installing on USB drive defaults all install mount points on the USB drive.
    INCORRECT – There’s magic involved.

Every one of these assumptions burnt me hard.

The Problem

So it wasn’t until I clicked on the datastore section of vCenter when my heart sunk. The old datastore was listed attempting to right click and delete the orphaned datastore shot me with another surprise…. the options were greyed out, I went to google to see if I was alone. It turns out I was not alone, but the blog source I found also did not seem very promising… How to easily kill a zombie datastore in your VMware vSphere lab | (tinkertry.com)

Now this blog post title is very misleading, one can say the solution he did was “easy” but guess what … it’s not support by VMware. As he even states “Warning: this is a bit of a hack, suited for labs only”. Alright so this is no good so far.

There was one other notable source. This one mentioned looking out for related objects that might still be linked to the Datastore, in this case there was none. It was purely orphaned.

Talking to other in #VMware on libera chat, told me it might be possibly linked to a scratch location which is probably the reason for the option being greyed out, while this might be a reasonable case for a host, for vCenter in which the scratch location is dependent on a host itself, not vCenter, it should have the ability to clear the datastore, as the ESXi host itself will determine where the scratch location is stored (foreshadowing, this causes me more grief).

In my situation, unlike tinkertry’s situation, I knew exactly what caused the problem, I did not rename the datastore accordingly. Since the datastore name was not named appropriately after being re-created, it was mounted and shown as a new datastore.

The Plan

It’s one thing to fuck up, it another to fess up, and it’s yet another to have a plan. If you can fix your mistake, it’s prime evidence of learning and growing as you live life. One must always perceiver. Here’s my plan.

Since building the host new and restoring the config with a wrong datastore, I figured I’d I did the same but with the proper datastore in place, I should be able to remove it by bringing it back up.

I had a couple issues to overcome. First one was my 3rd assumption: That renaming a datastore was easy. Which, usually, it is, however… in this case attempting to rename it the same as the missing datastore simply told me the datastore already exists. Sooo poop, you can’t do it directly from a ESXi host unless it’s not managed by vCenter. So as you can tell a catch22, the only way to get past this was to do my plan, which was the same as how I got in this mess to being with. But sadly I didn’t know how bad a hole I had created.

So after installing brand new on another USB stick, I went to create the new datastore with the old name, overwriting the partition table ESXi install created… and you guessed it. Failed to create VMFS datastore. Cannot change host configuration. – Zewwy’s Info Tech Talks

Obviously I had gone through this before, but this time was different. it turned out attempting to clear the GPT partition table and replace it with msdos(MBR) based one failed telling me it was a read-only disk. Huh?

Googling this, I found this thread which seemed to be the root cause… Yeap my 4th assumption: “Installing on USB drive defaults all install mount points on the USB drive.”

so doing a “ls -l”, and “esxcli storage filesystem list” then “vmkfstools -P MOUNTPOINT” I was veriy esay to discover that the scratch and coredump were pointing to the local RAID logical volume I created which overwrote the initial datastore when ESXi was installed. Talk about a major annoyance, like I get why it did what it did, but in this case it is  major hindrance as I can’t clear the logical disk partition to create a new one which will be hold the datastore I need to have mounted there… mhmmm

So I kept trying to change the core dump location and the scratch location and the host on reboot kept picking the old location which was on the local RAID logical volume that kept preventing me from moving forward. Regardless if I did it via the GUI or if I did it via the backend cmd “vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /tmp/scratch” even though VMware KB mentions to create this path path first with mkdir what I found was the creation of this path was not persistent, and it would seem that since it doesn’t exist at boot ESXi changes it via it’s usual “Magic”:

“ESXi selects one of these scratch locations during startup in this order of preference:
The location configured in the /etc/vmware/locker.conf configuration file, set by the ScratchConfig.ConfiguredScratchLocation configuration option, as in this article.
A Fat16 filesystem of at least 4 GB on the Local Boot device.
A Fat16 filesystem of at least 4 GB on a Local device.
A VMFS datastore on a Local device, in a .locker/ directory.
A ramdisk at /tmp/scratch/.”

So in this case, I found this post around a similar issue, and turns out setting the scratch location to just /tmp, worked.

When I attempted to wipe the drive partitions I was again greeted by read-only, however this time it was right back to the coredump location issues, which I verified by running:

esxcli system coredump partition get

which showed me the drive, so I used the unmounted final partition of the USB stick in it’s place:

esxcli system coredump partition set -p USBDriveNAA:PartNum

Which sure enough worked, and I was able to set the logical drive to have a msdos based partition, yay I can finally re-create the datastore and restore the config!

So when the OP in that one VMware thread post said congrats you found 50% of the problem I guess he was right it goes like this.

  1. Scratch
  2. Coredump

Fix these and you can reuse the logical drive for a datastore. Let’s re-create that datastore…

This is hen my heart sunk yet again…

So I created the datastore successfully however… I had to learn about those peskey UUID’s…

The UUID is comprised of four components. Lets understand this by taking example of one of the vmfs volume’s UUID : 591ac3ec-cc6af9a9-47c5-0050560346b9

System Time (591ac3ec)
CPU Timestamp (cc6af9a9)
Random Number (47c5)
MAC Address – Management Port uplink of the host used to re-signature or create the datastore (0050560346b9)

FFS… I can never be able to reproduce that… and sure enough thats why my UUIDs not longer aligned:

I figured maybe I could make the file, and create a custom symlink to that new file with the same name, but nope “operation not permitted”:

Fuck! well now I don’t know if i can fix this, or if restoring the config with the same datastore name but different UUID will fix it or make things worse…. fuck me man…. not sure I want to try this… might have to do this on my home lab first…

Alright I finally was able to reproduce the problem in my home lab!

Let’s see if my idea above will work…

Step 1) Make config Backup of ESXi host. (should have one before mess up but will use current)

Step 2) Reload host to factory defaults.

Step 3) rename datastore

Step 4) reload config

poop… I was afraid of that…

ok i even tried, disconnecting host from vcenter after deleting the datstore  I could, recreate with same name and it always attaches with appending (1) cause the datastore exists as far as vCenter thinks, since the UUID can never be recovered… I heard a vCenter reboot may help let’s see…

But first I want to go down a rabbit hole…. the Datastore UUID, in this case the ACTUAL datastore UUID, not the one listed in a VM’s config file (.vmx), not the one listed in the vCenter DB’s (that we are trying to fix), but the one actually associated with the Datastore… after much searching…  it seems it is saved in the File Systems “SuperBlock“, in most other File Systems there’s some command to edit the UUID if you really need to. However, for VMFS all I could find was re-signaturing for cloned volumes

So it would seem if I simply would have saved the first 4MB of the logical disk, or partition, not 100% sure which at this time, but I could have in theory done a DD to replace it and recovered the original UUID and then connect the host back to vCenter.

I guess I’ll try a reboot here see what happens….

Well look at that.. it worked…

Summary

  1. Try a reboot
  2. If reboot does not Fix it call VMware Support.
  3. If you don’t have support, You can try to much the with backend DB (do so at your own risk).

 

How to vMotion a VM without vCenter

Well here I am… again…

In short, you figure… “Ummm just vMotion the VM in vCenter” and for the most part I would agree, however what do you do if you need to move a VM, for example vCenter, and it just so happens to be on an ESXi host that is not within a cluster with other similar ESXi hosts, or in a cluster without EVC? (In most cases rare, sure) However I happened to be just in that situation recently.

First thing I thought I’d just copy the files via ESXi console, using the CP command, and it for the most part it seemed to work for one smaller VM. However when I went to do it against vCenter. It seemed to be going longer then I had expected. After nearly an hour… I decided to see what was going on… but since I was just using CP command how do I find out the processes time?

Yes, by running stat on target file and local file, and get a file size,

i.e stat -c “%s” /bin/ls”

Oh neat, so when I went to check the source was 28.5 Gigs…. and the target was 94 Gigs… wait wait what??? I can only assume something messed up with the copying cause the files were thin provisioned… not sure stopped the process and deleted the files…

Now I began to Google search and I wasn’t searching properly and found useless results such as this: Moving virtual machines with Storage vMotion (1005544) (vmware.com) then I got my act together and found exactly what I was looking for from here: How to move VMware ESXi VM to new datastore using vmkfstools | Alessandro Arrichiello (alezzandro.com)

So basically I liked this, and it was what I needed, I was just slightly annoyed that 1) there wasn’t a nice way to do multiple VMDKs via his examples, just for all the other files, so I took the one liner from the other files trick, and found out how to get the path I need from the files in question.

Low and behold here’s how to do the magic!

1) This assumes a shared datastore between hosts (if you need to move files between hosts without a shared datastore, follow this guide from VMware arena.) (I’m not sure but I think you can leave the VM’s registered, but they have to be powered down, and that there are no snapshots.)

2) Ensure you make the directory you wish to move the VM files to.

mkdir /vmfs/volumes/DatastoreTarget/VMData

3) Copy/Clone VMDK files to target.

find "/vmfs/volumes/DatastoreSource/VMData" -maxdepth 1 -type f | grep ".vmdk" | grep -v "flat" | while read file; do vmkfstools -i $file -d thin /vmfs/volumes/DatastoreTarget/VMData/${file##*/}; done

4) Copy remain files to target.

find "/vmfs/volumes/DatastoreSource/VMData/" -maxdepth 1 -type f | grep -v ".vmdk" | while read file; do cp "$file" "/vmfs/volumes/DatastoreTarget/VMData/"; done

Once done cloning and copying all necessary files, add the VM from the new datastore back to inventory.

In the vSphere client go to: Configuration->Storage->Data Browser, right click the destination datastore which you moved your VM to and click “Browse datastore”.

Browse to your VM and right click the .vmx file, then click “Add to inventory”

Boot up the VM to see if it works, when asked whether you copied or moved it, just answer that you moved it. In this case it all depends on if you want the VM DI to stay the same as it is known within vCenter. As long as you properly delete the old files and removed it from the host inventory, this will complete the VM migration. If you don’t plan on deleting the old VM, or do not care about VM IDs or backups, then select “I copied it”.

Hope this helps someone.

Failed to create VMFS datastore. Cannot change host configuration.

Quick one here. Create a new logical disk via RAID5, after an old logical unit failed from only a single bad disk.

No issues deleting the old logical disk, and creating a new one via HP storage controller commands.

However was greeted with this nice error.

From here, by Cookies04: ”

I had the same problem and in order to fix it I had to run three commands through an SSH connection. From what I have seen and found this error comes from having disks that were part of different arrays and contain some data on them. When I ran the commands I was then able to connect the data stores with no issues.

1. Show connected disks.

ls -lha /vmfs/devices/disks/

(Verify the disk is seen. You will probably see your disk ID then :1. This is a partition on the disk. We only need to work about the main disk ID.)

2. Show the error on disk.

partedUtil getptbl /vmfs/devices/disks/(disk ID)

(It will probably indicate that the GPT is located beyond the end of the disk.)

3. Wipe disk and rewrite with a basic MSDOS partion.

partedUtil setptbl /vmfs/devices/disks/(disk ID) msdos

(The output from this should be similar to msdos and the next line will be o o o o)

I hope this helps you out.”

Looks like it worked… Thanks Cookie04!

vMotion – Not Allowed in the Current State

First things first, I vMotioned the vm to another host and that worked fine, so the issue appeared to be target related. I also found this post, which states to restart the mgmt and vpxa services:

/etc/init.d/hostd restart

/etc/init.d/vpxa restart

doing this on the source ESXi did nothing, again seeming the issue is on the target. Did the same tasks on the target and it still failed.

I then disconnected the target esxi, put it in maintenance mode, rebooted it, took it out of maintenance mode, reconnected to vCenter, and this time the vMotino worked.

Hope this helps someone.

ESXi 6.x Datastore Not Mounted

Quick post here, I had to recover from a flooded basement. Sorry for the day outage. I had to put my disc in another server and load FreeNAS, and import my ZFS volumes, recreate the iSCSI targets, and then I added them to my ESXi hosts, and rescanning the HBAs shows the disks…

but the datastores were not visible…

so I googled and found this VMware thread with some helpful commands to try. (I do kind of agree with the OP, that its annoying they removed the front end UI for import that could handle this)

esxcli storage vmfs snapshot list

esxcfg-volume -M UUID

Ehh it worked!

Hope this helps someone. If this doesn’t work you might have some other underling issue?

ESXi /tmp is Full

I’ll keep this post short and to the point. Gott errors in the alerts.

I was like huh, interesting… go to validate it on the host by logging in via SSH then typing the command:

vdf -h

At the bottom you can see /tmp space usage:

I then found out about this cool command from this thread:

find /tmp/ -exec ls -larth '{}' \;

This will list all the files and their sizes to gander at, when I noticed a really large file:

I decided to look up this file and found this lovely VMware KB:

The Workaround:

echo > /tmp/ams-bbUsg.txt

The solution:

To fix the issue, upgrade to VMware AMS to version 11.4.5 (included in the HPE Offline Bundle for ESXi version 3.4.5), available at the following URLs:

HPE Offline Bundle for ESXi 6.7 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-a38161c3e8674777a8c664e05a

HPE Offline Bundle for ESXi 6.5 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-7d214544a7e5457e9bb48e49af

HPE Offline Bundle for ESXi 6.0 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-98c6268c29b3435e8d285bcfcc

Procedure

  1. Power off any virtual machines that are running on the host and place the host into maintenance mode.
  2. Transfer the offline bundle onto the ESXi host local path, or extract it onto an online depot.
  3. Install the bundle on the ESXi host.
    1. Install remotely from client, with offline bundle contents on a online depot:
      esxcli -s <server> -u root -p mypassword software vib install -d <depotURL/bundle-index.xml>
    2. Install remotely from client, with offline bundle on ESXi host:
      esxcli -s <server> -u root -p mypassword software vib install -d <ESXi local path><bundle.zip>
    3. Install from ESXi host, with offline bundle on ESXi host:
      esxcli software vib install -d <ESXi local path><bundle.zip>
  4. After the bundle is installed, reboot the ESXi host for the updates to take effect.
  5. (Optional) Verify that the vibs on the bundle are installed on your ESXi host.
    esxcli software vib list
  6. (Optional) Remove individual vibs. <vib name> can be identified by listing the vibs as shown in #5.
    esxcli software vib remove -n <vib name>

    Summary

    Use the commands shown to trace the source of the usage, your case may not be as easy. Once found hopefully find a solution. In my case I got super lucky and other people already found the problem and solution.

Veeam – More Than One Replica Candidate Found

Story Time!

The Problem!

So real quick one here. I edited a Replication job and changed it source form production to a backup dataset within the Veeam Replication Job settings. I went to run the replication job and was presented with an error I have no seen before…

I had an idea of what happened (I believe the original ESXi host might have been rebuilt) I’m not 100% sure, but just speculating. I was pretty sure the change I made on the job was not the source of the problem.

Since I wasn’t concerned about the target VM being re-created entirely I decided to go to Veeam’s Replica’s, and right clicked the target VM, and picked Delete from Disk… to my amazement the same error was presented…

Alright… kind of sucks, but here’s how I resolved it.

The Solution

Sadly I had to right click the Target VM under Veeams Replicas, and instead picked “Remove from Configuration”. What’s really annoying about this is it will remove the source VM from the replication job itself as well.

Why? Unno Veeams coding choices...

So after successfully removing the target VM from Veeam’s configuration, I manually deleted the target VM on the host ESXi host. Then I had to reconfigure the replication job and point it to the source VM again. Again if your interested in why that’s the case see the link above.

After that the job ran successfully. Hope this helps someone.