vCLS High CPU usage

The Story

So I went to vMotion a VM to do some maintenance work on a host. Target machine well over 50% CPU usage.. what?! That can’t be right, it’s not running anything…

I tried hard powering the VM off, but it just came right back up suckin CPU cycles with it….

The Hunt

alright Google, what ya got for me… I found this blog post by “Tripp W Black” he mentions stopping a vCenter Service called “VMware ESX Agent Manager”, which he stops and then deletes the offending VMs, sounds like a plan. Let’s try it, so login into VAMI. (vcenter.consonto.com:5480)

K, let’s stop it… let me hard power off the VM now… ehh the VM is staying dead and host CPU:

K let’s go kill the other droid I have causing an issue…

ok I got them all down now, but the odd part is I can’t delete them from disk much like Sir Black mentioned in their blog post. The options is greyed out for me, let’s start the service and see what happens…

The Pain

Well, that was extremely annoying, it seemed to have worked only for a moment and the CPU usages came right back, so I stopped the service again, but I can’t delete the VMs…

Similar issues in vSphere 8, even suggestions to stay running in retreat mode, which I’ll get to in a moment. So, if you are unfamiliar, vCLS are small VMs that are distributed to ESXi hosts to keep HA and DRS features operational, even if vCenter itself goes down. The thing is, I’m not even using HA or DRS, I created a cluster for merely EVC purposes, so I can move VMs between hosts live at my own leisure and without downtime. What’s annoying is I shouldn’t have to spend half my weekend day trying to solve a bug in my HomeLab due to poor design choices.

The Constructive Criticism

VMware…. do not assume a cluster alone requires vCLS. Instead, enable vCLS only when HA or DRS features are enabled.

Now that we have that very simple thing out of the way.

The Fix

So, as we mentioned we are able to stop the vCLS VMs when we stop the EAM service on vCenter, but that won’t be a solution if the server gets rebooted. I decided to Google to see how other people delete vCLS when it doesn’t seem possible.

I found this reddit thread, in which they discuss the same thing mentioned above “Retreat Mode”. However, after setting the required settings (which is apparently tattoo’d after done), I still couldn’t delete the VMs, even after restarting the vpxd service. Much like ‘bananna_roboto’ I ended up deleting the vCLS VMs from the ESXi host UI directly, however when checking vCenter UI the still showed on all the hosts.

After rebooting the vCenter server, all the vCLS VMs were gone, at first, I thought they’d come back, but since the retreat mode setting was applied it seems they do not get recreated. Hence, I will leave Retreat mode enabled as suggested in the reddit thread for now, since I am not using HA or DRS.

So if you want to use EVC in a cluster, but not HA and DRS and would like to skim even more memory from your hosts, while saving on buggy CPU cycles, apparently “Retreat mode” is what you need.

If you do need those features, and you are unable to delete the old vCLS VMs, and restarting the EAM service doesn’t resolve your issue (which it didn’t for me), you may have to open a support case with VMware.

Any, I hope this helped someone. Cheers.