ESXi 6.5 Stuck on Initializing Scheduler
PSOD PCPU1 could not start

I’m making this post short to note this odd experience with this host build.

First weird thing was when trying to install ESXi I couldn’t get past vmkusb not sure what it was about but only found this decent reddit post with the same problem.

In short he noticed it would only get past this if a second USB was plugged into the USB2 ports, and sure enough that worked for me too. strange….

Then a couple days later while doing some more test boots, I get a Purple screen of death, complaining about the PCPU1 not starting or some shit, ugh again lets see what others have to say… well I found this vmware thread on it, he basically stated that resetting the BIOS settings worked, after farting around with some bios settings, I had other failed boots and my CPU and system was rather hot. I let everything cool down, added more powerful fans and tried again after resetting the BIOS to factory and much like the post it worked.

After finalizing the build a little more, I switched to an old 2.5″ HDD.. same problem but I noticed it gets stuck on initializing scheduler before PSODing, while I searched for what might be up with that I found this

It did help shit, same problem next boot, instead of just resetting the BIOS I played around with a couple more settings like I enabled intel AES-NI which helps for CPU offloading of AES computations. and another one which I sadly forgot, and then my next boot was fine. saving this at this point in case it comes back again.

VMware Host Oddities

I’ll keep this post short as I find it rather interesting turn of events…

I was ensuring config files were being backed up as good Sysadmin Posture would suggest. I noticed an issue on one of the hosts when I went to run the configuration backup. When I was hit with a “general system error”. I Wish I would have saved the actual error message but when I went to research it, it gave me more people complaining about the error when trying to vMotion (In this case I grabbed the most specific line from the call back in hopes to get something) most results simply pointed to saying restarted the ESXi hosts management agents. I decided to see if I could vMotion or if any other symptoms from the host, but everything showed green in vCenter, and vMotions worked without an issue.

I left it for the night (before) and decided to update the host since it was running 6.5-u2 and needed to be updated to 6.5-u3. I was hoping it would resolve the config backup issue (I had an older copy on hand but wanted most up to date) So I decided to do the update via a ISO image and a host reboot, moved VMs, no problem, placed host into Maintenance mode, no issues, send host for shutdown (I wasn’t using iLO or iDRAC, so needed to manually mount the USB hosting the installation media), while I was at the console waiting for the host to show “shutting down” it simply …. didn’t… after 10 minutes which is far more then generous I decided to press F2 to get into the console to have to tell me … uggghhh I wish I would have taken a picture of the error, something along the lines of you can’t use the console cause you been locked out…. it was weird, after waiting another 10 minutes and nothing happening (I’m assuming there must have maybe been some actual underlying issue with the datastore holding the scratch?) I decided to just do a hard shutdown (hold power for 5 seconds). I could rebuild this server faster if need be. Powering it back up was perfectly fine through all POST checks, booted the ESXi 6.5u3 installer, booted fined, checked for logical drives, found all of them perfectly fine including the SD card holding the OS (scratch changed to a persistent location, another datastore on the local host with multiple disks used for a datastore) anyway I selected it and it saw there was an existing installation and selected to update it, successful, reboot. reboots fine, but host doesn’t come back to vCenter…..

Log into host directly, and notice it states it has 18 VMs, all with a status of error, now I had moved them and they are even still running on another host (thankfully it didn’t do anything stupid to try and hijack them) at this point I had called VMware support and placed an SR. Once I had a tech on the line, I discussed my symptoms and issues. At this point I asked what was the best course of action as, I didn’t want to rebuild the host (I could, but time). In this case I simply un-registered all the VMs that were in error as they were not associated with the host), then attempted to re-connect the host. It first failed complaining about bad username and password (I’m not sure if this was the root account used to add it, or the VPX user it uses to manage the host) but it prompted the wizard like adding a new host, since all other settings were still fine on the host (network and vswitches w/ VMPGs and VMKs, and all Datatstores) after this wizard the host was re-added back into the cluster with the latest updates, and running the backup config command:

esxcli storage vmfs extent list

worked without error. So yay! all fixed, but one thing still bothered me… what was the root cause…

I decided to check my scratch settings real quick, and sure enough it hold good and is on redundant datastore on the local ESXi host, so not sure what the root cause was, but it was fixed. I’m posting this for my own reference. Just incase this issue re-arises.

 

vCenter SSO

vCenter SSO

The other day I covered installing vCenter.

Today I’ll do a very quick overview on setting up SSO with a Windows based AD Auth.

DNS

Step 1) validate vCenter can reach any AD via the Root domain name:
*USE AD SERVER FOR DNS, 3rd Party DNS leads to failure as missing specialized records, E.G. srv records)
*Ensure Time is synced to within 5 minutes of AD server*

I ssh’d into the VCSA using root and then, “shell” and a regular old ping command to validate.

Step 2) Follow Virten’s Guide for doing the Flash way, or CLI way to join vCenter to the Windows Domain. Via the HTML5 Web Client: Menu -> Administration -> SSO -> Configuration -> Active Directory Domain -> Click Join AD (hidden behind the menu in the snippet)

Enter the domain to join, and an account that is allowed to join systems to the domain, in my case I used my Domain ADmin Account:

Populate the fields, and click joing and sure enough you will join the domain without issue… if you have a proper working NTP/AD architecture that is…

Thanks VMware… Ugghh ok, and if I use the CLI maybe some more verbose error?

What do you mean you “DC not found” what kind of PCLoadLetter error is this? Like I just verified lookup via DNS which is like the primary pre-req besides firewalls, which I have already configured my actually firewalls… so what gives, Googling this error leads me to this.

and I quote “On ESXi 6.5, the command is executed from /usr/lib/likewise/bin. If you haven’t enabled the AD firewall rule mentioned earlier, you must temporarily unload the ESXi firewall – assuming it is enabled – for this to work. Failing this, you will get an Error: NERR_DCNotFound [code 0x00000995] error.”

Are you ****in’ with me…. for reals… man wtf VMware….

Shit, right this is the VCSA not a ESXi host… ugggh quick research…

What… da… How, did I not know about this?! There’s a special VCSA management page, everything online just uses the “Web Client” which all VMware’s documentation assumes this to be the Flash client, which doesn’t even reference this at all!

https://vcsa:5480

Alrighty then… logging in… mhmm

That’s awesome but I don’t see firewall, maybe if I navigate to networking…

Nope, NICs settings and that’s about it:

C’mon those firewall settings have to be here, I don’t want to have to be forced to use flash…. cmon…..

F*** it says it’s for 6.7 I’m clearly on 6.5 there has to be a way…

After some deeper digging ( I found out VCSA uses python scripts to use specific files to build the firewall) then also talking this problem over with someone on the IRC channel #wmware, and digging a bit further and finding this vmware post….

I was at first simply using a third part DNS, having JUST an A host record for the AD server, not any of the other service records for LDAP or anything else, after changing my DNS settings on the VCSA to point to the AD server itself I got a different error at the CLI:

Bahhh what? oh wait… lol all my time is wrong, everywhere…

NTP – Fixing Time

Actual time 8:20 PM Winnipeg Central Time. Mon Oct 7, 2019

AD server time: 2:09 PM Mon Oct 7, 2019 (CST)

VCSA time: Tue Oct 8 01:15:08 UTC 2019

What a gong show… let’s fix this! First MS states to leave the PDC to system time to get form the host as host gets acurate time, well not for me. I could point the host to external, and wait then changing PDC time auto. But if you want to Domain join the hosts they should follow the hierarchy and use the PDC as time, catch 22, so instead PDC points to external source, and hosts will point to PDC for time and DNS (this allows for ease for changing external time provider and no issues with time sync).

So fixing PDC time:

before:

after

NOw time has changed and my firewall shows the successful packets, but why is my offset still so off? and why is my time an hour off?

Here’s my local workstation:

Yet here’s my PDC:

ok everything I checked online I’m sure I did it right but the syntax on one of the guides I was following didn’t seem right and I tried again and this time it worked, finally!

K, now I can update each host in my lab….

Before:

Configure:

After:

Finally VCSA itself, https://vcsa:5480 (login as root) -> Time

Before:

Configure:

After:

Yay, after fixing my time everywhere:

Joining VSCA to Windows Domain via CLI

/opt/likewise/bin/domainjoin-cli join $domain $user '$password'

YAY!

Quick Re-Cap:

So bad news is this isn’t as short a blog as I wanted, but good news is we are all learning something! Yay!

Now that we got our system domain joined (reboot required)

waiting… waiting….

Verifying AD object on AD server (core, via powerhsell)

and on the HTML 5 Web Client:

Adding Identity Source

Now I can finally follow adding the Identity source A) AD Auth from here.

Click on Identity Sources -> Add Identity Source:

omg finally something that was dead simple…

Defining Permissions

Now click on global Permissions.

Click “+” icon, and if system join is all good it should be able to query the AD and find the users when typed into the Name field:

Lets test it….

Second attempt but pushing to children objects:

and yay this time I was able to get in successfully:

but I had to put in my UPN (user@doman.local) what if I just want to enter my user name…

What a bunch of poop, that’s cause we didn’t set the primary SSO domain… back in the VCSA settings https://vcsa:5480 – summary shows

back on vCenter Web Client, Menu -> Administration -> SSO -> Configure -> Identity Sources -> select new source -> click Set as Default:

login again:

success, and finally as the source virten post stated, the “Use Windows Authentication” option is greyed out unless the Enhanced Authentication Plugin is installed. You can find the download link at the bottom of the login screen.

Summary

That was a bit more painful then I wanted it to be, but it really was nice that it was this painful cause it reminded me of the moving parts that have to be setup correct for this all to play nicely to begin with.

I hope this guide has helped someone. Please leave a comment, any comment will do!!!

 

Using VMware Update Manager (VUM)

VUM

Overview

In this post I’m going to try and upgrade one of my ESXi 5.5 host to 6.5 using VCSA’s now built in by default VUM. I followed this video on youtube for reference.

First thing I noticed but the video doesn’t mention is that (at least for 6.5) VUM tab is only available using the flash based client. If you use the HTML5 based web client the Update tab isn’t shown:

HTML 5 Missing update Tab:

Flash based client with Update Tab:

As soon as I see a video, I can tell which version is being used as its sooo different. (Flash sucks)

Import Image

About a minute into the source video he gets to where the major images are for major host upgrades. Since I had no existing images provided like the video I decided to try the import wizard, which poped up a useful Windows Selection Dialog box (as I was testing this form the latest version of Windows 10 – 1903, with the latest Chrome (Built in flash after enabling it)), on top of that I uploaded the the image from a UNC path (\\ip\some\folder\file.iso) after verifying access via windows explorer, I simply pasted the UNC path into the Windows Selection Dialog box address bar, and selected the ISO image. and it worked.

only once a image is uploaded, and selected does the creation of a baseline option make itself shown.

Baseline

Let the beat drop! *Bass beat drops* What the point of this is I can’t exactly tell yet, it seems to be a one to one mapping between a name and the ISO image being used?

paraphrasing the video guide “Now that we completed this useless step, we navigate to the cluster needing to be upgraded” In my case a single 5.5 test host, and by clicking on “Go to compliance view”

Attach Baseline

It would seem the Update Tab is available, at either the vSphere host level, Datacenter level, cluster level or host level depending on the scope you wish to deploy a “baseline”. Once within the scope you choose, I’m at host, click on “attach baseline” after ensuring you are still on the update tab.

much like the source after attaching the baseline the compliance level was shown as unknown, let’s follow along and “scan for updates”.

Now I’m assuming cause I am at the host level I don’t see the tabs with compliant and others cause there is only one host. and in this case it does change to “non-compliant” cause as the speaker states “The hosts listed as non-compliant do not match the version of ESXi associated with the attached baseline” AKA these hosts need to be upgraded.

Remediate

Click it to being the upgrade process for the host/cluster, which will being a wizard! Ohhh might wizard guide me to the light at the end of the tunnel!

while I clicked next, flash gave me an error prompt telling me my session had expired, and kicked me out back to the login page, even though I was still pretty actively working on it (snippets don’t take that long). Stupid flash, logged back in and back to the wizard:

agree to the EULA

Schedule it or do it now by not checking a scheduled time.

Pick your additional options and remediation options (I picked for my test to suspend my VMs as they are unable to be vmotions live due to no EVC based cluster of the hosts. they are all stand alone at the time of this writing. so lets try that.

After clciking finish I didn’t see anyting much happening at the vcenter tasks, so I logge dinto the host being upgraded and saw it was suspending the vms in question:

Now it has to copy all the memory from these VMs to disk so this could take a bit of time… then I’d assume I’ll be disconnected from the host once it reboots.

Monitoring my pings for these servers the pings have dropped starting the suspend stat (makes sense) but the host is still responding (makes sense).

I decided at this point to go get some food, I’m lazy and don’t cook, so by the time I had returned I was rather shocked to see the host had succefully been updated, showed compliant and my systems were right back to operational…

Summary

Besides the flash rubbish, this was overall a rather good experience. :O

I think I may upgrade more hosts this way in the future. I didn’t even have to step into my basement at all. That was great!

Until…

I was going to update my second host at home and was hit with this…

well wtf… then it hit me in the face… oh yeah…. I forgot about that, this is a nice real possible word example of third party, unsupported drivers. When I checked my own blog, and lucky the reference to the driver, and where I got it, it appears it still works for 6.x, so I can only guess I’ll have to remove the VIB, run the VUM update procedure, then manually re-install the third party driver… lets try this!

Remove VIB

Following this as a reference, I did the same thing:

esxcli software vib list

esxcli software vib remove --vibname DLink-528T

Ughhhh…

I remember that script/VIB, He was generally really cool guy an dI really loved his blog posts, but his VIB has been rather garbage…. as others have mentioned

Errors

As the picture shows a reboot is required now…. I let VUM do its thing with the standard ESXi 6.5u2 baseline I was using, after the server rebooted I got a problem:

“There was a problem with the Network Device specified on the command line. Error: No NIC found with MAC address.”

Discussion :

The NIC to be used as the management NIC has no drivers installed for it.

Ohhh crap, I forgot when I installed ESXi on this desktop I had to make a custom image, and this is a requirement for systems with custom builds, I removed the drivers for the one NIC but it was not for the ESXi mgmt, but the built in NIC on the mainboard is Realtek and… yeah… anyway, I’ll make a post on creating a custom image, but after a good while of failing to get what I need (as it was my hypervisor with my internet providing VM). I managed to find my initial build and re-install it manually and re-register the VMs and re-create the vSwitches and brought everything back up.

In this case I could have used my Veeam server, however none of my other hypervisors have multiple NICs and thus not an option to use them. My lab is def no redundant lab setup.

Reset ESXi trial license

Quoted directly by Aaron from:

“This guide will give you the steps needed to reset the license file so that you can apply the evaluation license back to your ESXi host.

WARNING: This is for education/informational testing/development purposes only, and should not be used on a production server.

To reset your expired ESX 4.x, ESXi 4.x, ESXi 5.x or ESXi 6.x 60 day evaluation license:

  1. Login to the HOST via SSH or Shell
  2. Remove /etc/vmware/license.cfg
  3. Copy /etc/vmware/.#license.cfg to /etc/vmware/license.cfg
  4. Restart the vpxa service

Or simply copy the code below and paste it into your SSH session.

rm -r /etc/vmware/license.cfg 
cp /etc/vmware/.#license.cfg /etc/vmware/license.cfg 
/etc/init.d/vpxa restart

Then open the “Licensed Features” option in the configuration tab of the ESXi host through the vSphere Client.

Click on “Edit” in the top right of the “Licensed Features” page

Once the “Assign License” window opens you will see two options. There will be a category for “Evaluation Mode” and Assigned License. Click on the “(No License Key)” option and then click “OK”. This will set the host back to “evaluation” mode and will give you access to all features for 60-days!”

*Update* This works if the current trial licensed hasn’t expired yet. If it has already expired, ether it be existing trial or an alternative license type, the above trick doesn’t seem to work. When navigating to the license area of the host afterwards it expects you to enter a key. (I didn’t test actually putting in all 0’s which might just be the field text indicator and not actually filled). but it wouldn’t let me proceed. In this case a host reboot might be required by following these commands instead:

rm /etc/vmware/vmware.lic
rm /etc/vmware/license.cfg
reboot

This worked and I was able to spin up my VMs on the host again.

Installing vCenter

Installing vCenter

Since vCenter will not be support on Windows moving forward, all discussion of vCenter will simply be referenced by its new known acronym; VCSA. vCenter based on linux.

I just signed up for VMUG advantage as such I get to play with vCenter at home, yay, else get the required ISO from VMware’s product portal using your own VMware login ID.

Although 6.7 is out, and well polished, 6.7 cannot manage ESXi 5.5 hosts, since I still have a few I’d like to use in my cluster, I’m going to be using VSCA 6.5 for this guide.

Also, I technically only have 5.5 based hosts at this moment (I love the phat (C#) client).

new version PhotonOS?

VCSA CPU and RAM Requirements

VCSA Storage Requirements

Open/Mount the ISO on your OS of choice. For me in Windows, simply mount the ISO and navigate into the vcsa-ui-installer\Win32\installer.exe

Run it!

Stage 1

*Drools* I’m not sure what to do… *Clicks Install*

Introduction; Next
EULA; Accept; Next
VCSA + PSC
Target Host + port + username + Password; next
VM Name + Root Password
Select Datastore (I enabled Thin Disk)
Give a system name (which you’ll want to point to the IP address you define, in the DNS servers used by the VCSA and any client systems needing management access)
IP Address
IP MASK
Gateway + DNS Server


Finish.

Now it states this will take a few minutes as it depends on, the hardware specs of the ESXi host it was deployed to, and maybe internet speed if these RPMs are not on the OVF template that was deployed. Also the VM has to boot.

Quick Break time!

Interesting default… until it finally completes…

Stage 2

NEXT!

NTP servers (0.ca.pool.ntp.org,1.ca.pool.ntp.org,2.ca.pool.ntp.org)

Next

New SSO domain, create a password for administrator@vsphere.local (I’ll create a SSO domain for zewwy.ca later to allow my local AD based accounts to have logon rights later on in this or another tutorial).

DEPLOY!

Mhmm, after 2 attempts I kept getting a pschealth service error. I googled it but the VMware KB was rather useless.

On the third try, I set the system name to IP address, as well as set the vCenter to simply use the hosts time, instead of NTP (even though I used the same NTP server the host was using… so shrug), also waited a little bit longer when starting stage 2, and on the third try it finally succeeded the installation.

Then I added the license key and assigned it to vCenter. which was provided to me when I checked out the “purchase” on VMUG advantage partner site.

Summary

Over all the process is very straight forward. In the next post I’ll cover adding hosts, assigning keys, connecting VCSA to an AD server for an alternative SSO domain. Stay tuned!

ESXi 6.5 on Proliant Gen9 Hardware Status Unknown

I’ll keep this post short.

If you have a Proliant Gen9 server and running ESXi 6.5 u2 along with VSCA 6.5u2.

You will get all hosts not displaying any hardware status. This should fixed immediately as you don’t get alerts on any hardware faults via IPMI. This includes status from hosts running ESXi 5.5 or 6.5.

The first fix is to upgrade the VCSA to 6.5u3.

After upgrading the VCSA to 6.5u3… Hardware status will come back for each host.. however.. if you are running ESXi 6.5u2 on the Gen9 servers you’ll something like this:

as you can see some sensors are a lil wonky…

The fix here is to upgrade the host to 6.5u3 via the HPE build.

After the hosts and the VCSA are on 6.5u3 all is good and hardware faults will again will trigger critical alarms on vSphere.

A general system error occurred: Launch failure

Failure to Launch

Sound the Alarm! Sound the Alarm!

*ARRRRRRREEEEEEEEERRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
ARRRRRRREEEEEEEEERRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR

and one heck of a day sweating bullets, am-I-right?!

Also sorry about the lack of updates, with spring and work, it’s been hard to find time to blog. Deepest apologies.

The Story

Well it’s Tuesday so it’s clearly a day for a story, and boy do I have a story for you. I was going about my usual way of working.. endlessly to meet my goal… that never able to acquire goal of perfection…. anyway… I had completed a couple major tasks of going from vCenter 5.5 to 6.5u2, and while I have yet to blog about that fun bag (cause I had used my own PKI and certificates so the migration scripts VMware provides pooped the bed till I replaced them with self signed) but I digress this has nothing to do with that… well sort of.

Where was I… oh right… I was vMotioning a couple VMs to some new 6.5U2 hosts I had setup on new hardware (yeah buddy this was a whole new world (Why did I just say that in the voice of Princess Jasmine?)) but alas the final VM was not meant to move for it had stalled @ 20 percent (I had this happen once before during my deployment and discovered some interesting things about multi-NIC vmotions) I probably should blog about that toooooo but alas my time is running short. (I still have many other things to tackle and blog about). and then……..

“A general system error occurred: Launch failure”

ugh…. I could have jumped right to the logs but I jumped on google instead…

CloudSpark apparently recently came across this issue and posted on all the many places he apparently could think… reddit, and VMware forums

the VMware post got a tad long, but it was more the post on reddit that got me wondering….

“Haven’t seen this personally, but there’s a few things via google on the “Failed to connect to peer process” error that suggest it could be due to running out of storage. Specifically in one scenario, the /tmp/vmware-root folder on the ESXi host fills up with logs.” – Astat1ne

When I SSH into the host and ran “df -h” I was surprised to be reported back with an error:

esxcli returned an error: 1

Something to that extent anyway. I ended up Moving all the VMs back off the host, and swapped out the SD card the ESXi software was installed on (I had a copy of the SD card with a clone of the ESXi software and host configure that was created right after the host was installed and configured).

Sure enough after reboot powering it back on, “df -h” returned clean. I was then able to vMotion all the VMs back onto the set of new hosts.

Def a generic error message I’ve never seen, and thinking about it now is rather comical. (I know when your in the middle of the problem it’s not so funny)…

BUT it sure is now! :D…. this is so going to come back to bite me….

(await updates here post April 30th)

How to Shrink a VMDK

Hey all,

Not often you have to shrink a VMDK file, expanding one is super easy, even on a live Virtual Machine. Shrinking one however, isn’t as straight forward.

This guy does a decent job giving a step by step tutorial, but you can soon realize you can do it even faster, and without cloning…

1) Use his math to get the disk size you need to edit inside the vmdk:

The number highlighted above, under the heading #Extent description, after the letters RW, defines the size of the VMware virtual disk (VMDK).

this number – 83886080, and it’s calculated as follows:

40 GB = 40 * 1024 * 1024 * 1024 / 512 = 83886080

2) Only shrink VMDKs in which you know the end of the disk contains allocated blocks, do this in a test only, make sure you have backups.

Now instead of cloning, simply remove the disk from the vm, and re-attach it. watch it’s reattached size be smaller, and it matches, much like the source guys post.

Free Hypervisor Backup
Before Part 3

The Story

I’m currently in the quest to fulfill my needs for a free VM backup solution, my hypervisor of choice right now is ESXi, while there are great alternative free hypervisors ( Citrix’s Xen, Microsoft’s Hyper-V, Linux’s KVM) I personally always felt comfortable with VMwares user interface (I’m talking the old Windows Phat Client). I totally understood the need for a web based client, however I felt it a sham to drop the phat client as to provided multiple benefits that the Web UI just doesn’t. Although the latest 6.7 Web UI has been pretty decent, besides their placement of the Hostname location…. :@

Anyway… if you’ve been following my posts you’ll see I decided to give Veeam Free a shot. I love these guys, great software, so I figured best to get better antiquated with their software, to my dis-may their simply relied on VMwares APIs solely. Which meant no SSH backdoor tricks for use ESXi free users.

However as I also mentioned in my previous post, an awesome dude who runs virtuallyGhetto.com, William Lam; wrote a script to complete the task we wanted via the hosts local CLI, which can be connected to via SSH. The script is called GhettoVCB. I took a quick look at the source code and did find a couple instances of zombie code and other anomalies but for the most part looked decent enough to give it a shot. Now I will get to this stuff in my next post using the same example VM I will specify here where I came across a couple issues and interesting facts I discovered during this adventure.

My Discoveries

The first thing I noticed about the script was the lack of certain dependency checks, in this case there’s no actual source code dependencies that his script relies on, it’s all meant to run on ESXi, and sure enough there’s some line of code to check for that…

Nice, however there doesn’t seem to be a check to validate if the datastore specified in the very first variable has enough space to complete it’s task.

In this case the script would simply error out once the destination ran out of space stating the source VMDK was the issue, this lead me briefly down the wrong rabbit whole. After validating I had no issues with my source VMDK (booted the VM and checked all services and FileSystem integrity) I noticed a couple things.

1) Even though I specified Thin disc for my destination, which had enough space to store the VM data (60~ GBs), the thin disc was attempting to create the full provisioned size of the disc, cause…..

2) I forgot the source disc was set to Thick Provisioned Eagered Zero

So there was a couple things about this VM….

1) It’s my ZoneMinder VM which holds my IP camera footage on motion detection

2) This data is mostly useless due to no events having taken place

Alright so now my goal was the following:

1) Remove all the un-needed data

A) Open ZoneMinder Web UI
B) Click on Camera
C) Delete All Events
D) wait a while before all the MySQL queries to kick off to clear the data
E) Used “df -h” to watch usage drop

2) Once I had all the data removed, I had to re-claim the space. I decided to dig up my old blog post on the subject matter… Well that was a bit underwhelming and simply provided my links instead of any valid examples… So I hope to provide a bit better details here:

A) I’m running Linux not Windows so sdelete is out.

B) My first attempts I decided to follow this DD exampledd if=/dev/zero of=/zeroes && rm -f /zeroes” (DO NOT DO THIS, from my tests it caused mySQL service to not come up properlly). I then found people stating to use secure-delete or zerofree, however I had some issues with these against a live system and wanted a simple live system, general technique… if there so many references stating dd can do it… how.. then I found this
“dd if=/dev/zero of=/tmp/empty.dd bs=1048576; rm /tmp/empty.dd”
I’m assuming maybe cause I used /tmp instead of /… only diff I can think of.

C) At this point I made a backup using ghettoVCB which I had to use an alternate Datastore to save the VMDK that I would then finally hole punch, in this case the ghettoVCB script converts the VMDK from thick to Thin however will still be the size of the provisioned Disc.

D) Now this is where I started to get a bit… annoyed… However I did learn a few things… one is that & to bring a process to the background doesn’t disconnect that process from the current terminal session. So while I was SSH’d in and running the clone operation, it wasn’t fully completing when my SSH session timed out, even though I used & to run it in the background. Read this for more info but the gist take away is this: “nohup and disown both can be said to suppress SIGHUP, but in different ways. nohup makes the program ignore the signal initially (the program may change this). nohup also tries to arrange for the program not to have a controlling terminal, so that it won’t be sent SIGHUP by the kernel when the terminal is closed. disown is purely internal to the shell; it causes the shell not to send SIGHUP when it terminates.” –Gilles

E) OK so after a few annoying hours of failed transfers due to my own ignorance. I finally had a legit copy of my VMDK in thin format but sucking up a lot of space (See pic above) *Note I technically did the DD trick after making a backup, but the size shown on the datastore would still be teh same* the final part… hole punching: So quick re-cap, we deleted unused data, then zeroed out the unused blocks, we see a low size in the guest system (E.G. df -h), but we still see a large size used by the thin provisioned disc. K let’s hole punch
“vmkfstools -K /vmfs/volume/Datastore/VM/VM.vmdk”

Host Status During Hole Punch

CPU ESXi Host

CPU Storage Unit (FreeNAS)

Disk Usage Host

Disk Usage Storage Unit (FreeNAS)

Datastore Latency Host

Network Usage Host

As you can tell from the source images we can tell a couple things:

1) HolePunching a VMDK does a high read I/O on the datastore

2) If the Datastore is iSCSI based it’ll saturate the iSCSI NIC (this can cause a performance degradation for other VMs utilizing the same Datastore

3) The latency increases due to the high Read I/O on the Datastore, again this directly affects performance of VMs running on the same Datastore

Once done the VMDK looked liked this:

Results / Summary

*NOTE* You can use the following command to convert a Thick Disk to thin manually, if you wish to Hole Punch a VMDK without the script.

vmkfstools -i Source-Thick.vmdk -d thin Destination-thin.vmdk

So why did I go through all this pain? Well I didn’t like the fact I was backing up useless data, weather it be pointless old images, or zeros. In the end the same VM backup went from taking almost an hour down to 5 minutes!