No coredump target has been configured. Host core dumps cannot be saved.

ESXi on SD Card

Ohhh ESXi on SD cards, it got a little controversial but we managed to keep you, doing the latest install I was greet with the nice warning “No coredump target has been configured. Host core dumps cannot be saved.

What does this mean you might ask. Well in short, if there ever was a problem with the host, log files to determine what happened wouldn’t be available. So it’s a pick your poison kinda deal.

Store logs and possibly burn out the SD/USB drive storage, which isn’t good at that sort of thing, or point it somewhere else. Here’s a nice post covering the same problem and the comments are interesting.

Dan states “Interesting solution as I too faced this issue. I didn’t know that saving coredump files to an iSCSI disk is not supported. Can you please provide your source for this information. I didn’t want to send that many writes to an SD card as they have a limited number (all be it a very large number) of read/writes before failure. I set the advanced system setting, Syslog.global.logDir to point to an iSCSI mounted volume. This solution has been working for me for going on 6 years now. Thanks for the article.”

with the OP responding “Hi Dan, you can definately point it to an iscsi target however it is not supported. Please check this KB article: https://kb.vmware.com/s/article/2004299 a quarter of the way down you will see ‘Note: Configuring a remote device using the ESXi host software iSCSI initiator is not supported.’”

Options

Option 1 – Allow Core Dumps on USB

Much like the source I mentioned above: VMware ESXi 7 No Coredump Target Has Been Configured. (sysadmintutorials.com)

Edit the boot options to allow Core Dumps to be saved on USB/SD devices.

Option 2 – Set Syslog.global.logDir

You may have some other local storage available, in that case set the variable above to that local or shared storage (shared storge being “unsupported”).

Option 3 – Configure Network Coredump

As mentioned by Thor – “Apparently the “supported” method is to configure a network coredump target instead rather than the unsupported iSCSI/NFS method: https://kb.vmware.com/s/article/74537

Option 4 – Disable the notification.

As stated by Clay – ”

The environment that does not have Core Dump Configured will receive an Alarm as “Configuration Issues :- No Coredump Target has been Configured Host Core Dumps Cannot be Saved Error”.
In the scenarios where the Core Dump partition is not configured and is not needed in the specific environment, you can suppress the Informational Alarm message, following the below steps,

Select the ESXi Host >

Click Configuration > Advanced Settings

Search for UserVars.SuppressCoredumpWarning

Then locate the string and and enter 1 as the value

The changes takes effect immediately and will suppress the alarm message.

To extract contents from the VMKcore diagnostic partition after a purple screen error, see Collecting diagnostic information from an ESX or ESXi host that experiences a purple diagnostic screen (1004128).”

Summary

In my case it’s a home lab, I wasn’t too concerned so I followed Option 4, then simply disabled file core dumps following the second steps in Permanently disable ESXi coredump file (vmware.com)

Note* Option 2 was still required to get rid of another message: System logs are stored on non-persistent storage (2032823) (vmware.com)

Not sure, but maybe still helps with I/O to disable coredumps. Will update again if new news arises.

Manually Fix Veeam Backup Job after VM-ID change

The Story

There’s been a couple time where my VM-IS’s change:

  • A vSphere server has crashed beyond a recoverable state.
  • A server has been removed and added back into the inventory in vSphere.
  • Manually move a VM to a new ESXi host.
    • VM removed from inventory, and readded.
  • Loss vCenter Server.
  • Full VM Recovery via Veeam.

What sucks is when you go to run the Job in Veeam after any of the above, the job simply fails to find the object. You can edit the job by removing the VM and re-adding it, but this will build a whole new chain, which you can see in the repo of Veeam after such events occur:

As you can see two chains, this has been an annoyance for a long time for me, as there’s no way to manually set the VM-ID in vCenter, it’s all automanaged.

I found this Veeam thread discussing the same issue, and someone mentioned “an old trick” which may apply, and linked to a blog post by someone named “Ideen Jahanshahi”.

I had no idea about this, let’s try…

Determine VM-ID on vCenter

The source uses powerCLI, which I’ve covered installing, but easier is to just use the Web UI, and in the address bar grab it after the vms parameter.

Determine VM-ID in Veeam

The source installs SSMS, and much like my fixing WSUS post, I don’t like installing heavy stuff on my servers to do managerial tasks. Lucky for me, SQLCMD is already installed on the Veeam server so no extra software needed.

Pre-reqs for SQLCMD

You’ll need the hostname. (run command hostname).

You’ll need the Instance name. (Use services.msc to list SQL services)

Connect to Veeam DB

Open CMD as admin

sqlcmd -E -S Veeam\VEEAMSQL2012

use VeeamBackup
:setvar SQLCMDMAXVARTYPEWIDTH 30
:setvar SQLCMDMAXFIXEDTYPEWIDTH 30
SELECT bj.name, bo.object_id FROM bjob bj INNER JOIN ObjectsInJobs oij ON bj.id = oij.job_id INNER JOIN Bobjects bo ON bo.id = oij.object_id WHERE bj.type=0
go

Some reason above code wouldn’t work on my latest build/install of Veeam, but this one worked:

SELECT name, job_id, bo.object_id FROM bjobs bj INNER JOIN ObjectsInJobs oij ON bj.id = oij.job_id INNER JOIN BObjects bo ON bo.id = oij.object_id WHERE bj.type=0

In my case after remove the VM from inventory and readding it:

As you can see they do not match, and when I check the VM size in the job properties the size can’t be calculated cause the link is gone.

Fix the Broken Job

UPDATE bobjects SET object_id = 'vm-55633' WHERE object_id='vm-53657'

After this I checked the VM size in the job properties and it was calculated, to my amazement it fully worked it even retained the CBT points, and the backup job ran perfectly. Woo-hoo!

This info is for educational purposes only, what you do in your own environment is on you. Cheers, hope this helps someone.

vCLS High CPU usage

The Story

So I went to vMotion a VM to do some maintenance work on a host. Target machine well over 50% CPU usage.. what?! That can’t be right, it’s not running anything…

I tried hard powering the VM off, but it just came right back up suckin CPU cycles with it….

The Hunt

alright Google, what ya got for me… I found this blog post by “Tripp W Black” he mentions stopping a vCenter Service called “VMware ESX Agent Manager”, which he stops and then deletes the offending VMs, sounds like a plan. Let’s try it, so login into VAMI. (vcenter.consonto.com:5480)

K, let’s stop it… let me hard power off the VM now… ehh the VM is staying dead and host CPU:

K let’s go kill the other droid I have causing an issue…

ok I got them all down now, but the odd part is I can’t delete them from disk much like Sir Black mentioned in their blog post. The options is greyed out for me, let’s start the service and see what happens…

The Pain

Well, that was extremely annoying, it seemed to have worked only for a moment and the CPU usages came right back, so I stopped the service again, but I can’t delete the VMs…

Similar issues in vSphere 8, even suggestions to stay running in retreat mode, which I’ll get to in a moment. So, if you are unfamiliar, vCLS are small VMs that are distributed to ESXi hosts to keep HA and DRS features operational, even if vCenter itself goes down. The thing is, I’m not even using HA or DRS, I created a cluster for merely EVC purposes, so I can move VMs between hosts live at my own leisure and without downtime. What’s annoying is I shouldn’t have to spend half my weekend day trying to solve a bug in my HomeLab due to poor design choices.

The Constructive Criticism

VMware…. do not assume a cluster alone requires vCLS. Instead, enable vCLS only when HA or DRS features are enabled.

Now that we have that very simple thing out of the way.

The Fix

So, as we mentioned we are able to stop the vCLS VMs when we stop the EAM service on vCenter, but that won’t be a solution if the server gets rebooted. I decided to Google to see how other people delete vCLS when it doesn’t seem possible.

I found this reddit thread, in which they discuss the same thing mentioned above “Retreat Mode”. However, after setting the required settings (which is apparently tattoo’d after done), I still couldn’t delete the VMs, even after restarting the vpxd service. Much like ‘bananna_roboto’ I ended up deleting the vCLS VMs from the ESXi host UI directly, however when checking vCenter UI the still showed on all the hosts.

After rebooting the vCenter server, all the vCLS VMs were gone, at first, I thought they’d come back, but since the retreat mode setting was applied it seems they do not get recreated. Hence, I will leave Retreat mode enabled as suggested in the reddit thread for now, since I am not using HA or DRS.

So if you want to use EVC in a cluster, but not HA and DRS and would like to skim even more memory from your hosts, while saving on buggy CPU cycles, apparently “Retreat mode” is what you need.

If you do need those features, and you are unable to delete the old vCLS VMs, and restarting the EAM service doesn’t resolve your issue (which it didn’t for me), you may have to open a support case with VMware.

Any, I hope this helped someone. Cheers.

USB NICs on ESXi hosts

Quick post here, I wanted to use a USB based NIC to allow one of my hosts to be able to host the firewall used for internet access, this would allow for host upgrades without downtime.

My first concern was the USB bus on the host, being a bit older, I double checked and sad days it was only USB 2.0. Checking my internet speed, it turns out it’s 300 mbps, and USB 2.0 is 480 mbps, so while I may only be able to use less then half of the full speed of the gig NIC, it was still within spec of the backend, and thus won’t be a bottle neck.

Now when I plugged in the USB nic, I sadly was not presented with a new NIC option on the host.

When I googled this I found an awesome post by non-other than one of my online hero’s Willam Lam. Which he states the following:

“With the release of ESXi 7.0, a USB CDCE (Communication Device Class Ethernet) driver was added to enable support for hardware platforms that now leverages a Virtual EEM (Ethernet Emulation Module) for their out-of-band (OOB) management interface, which was the primary motivation for this enhancement.

One interesting and beneficial side effect of this enhancement is that for any USB network adapters that conforms to the CDCE specification, they would automatically get claimed by ESXi and show up as an available network interface demonstrated in my homelab with the screenshot below.”

Then shows a snippet of running a command:

esxcfg-nics -l

Which for me listed the same results as the UI:

Considering I’m running the latest built of 7.x, I guess the device not “conform to the CDCE specification”.

A bit further in the post he shows running:

lsusb

When ran shows the device is seen by the host:

Let’s try to install the Flings USB Driver, see if it works.

“This Fling supports the most popular USB network adapter chipsets found in the market. The ASIX USB 2.0 gigabit network ASIX88178a, ASIX USB 3.0 gigabit network ASIX88179, Realtek USB 3.0 gigabit network RTL8152/RTL8153 and Aquantia AQC111U.”

Step 1 – Download the ZIP file for the specific version of your ESXi host and upload to ESXi host using SCP or Datastore Browser. Done

Luck the error message was clickable, and it provided a helpful hint to navigate to the host as it maybe due to certificate not trusted, and sure enough that was the case.

Step 2 – Place the ESXi host into Maintenance Mode using the vSphere UI or CLI (e.g. esxcli system maintenanceMode set -e true)

Some reason the command line wasn’t returning from the command above, and I had to enable Maintenance mode via the UI. Done.

Step 3 – Install the ESXi Offline Bundle (6.5/6.7) or Component (7.0)

For (7.0+) – Run the following command on ESXi Shell to install ESXi Component:

esxcli software component apply -d /path/to/the component zip

For (6.5/6.7) – Run the following command on ESXi Shell to install ESXi Offline Bundle:

esxcli software vib install -d /path/to/the offline bundle zip

and my results:

Ohhh FFS… Google!!!!!! HELP!!!! Only one hit…

only only 2 responses close to an answer are… “Ok I can confirm that if you create a 7u1 ISO and upgrade to that first, you can then add the latest fling module to it. Key bit of info that is not in the installation instructions” and “Workaround: Update the ESXi host to 7.0 Update 1. Retry the driver installation.”

Uhhhh I thought I just updated my hosts to the latest patches… what am I running?

“7.0.3, 21686933″… checking the source Flings page, oh… it’s a dropdown menu… *facepalm*

I downloaded the ESXi 8 version, let me try the 703 one…

ESXi703-VMKUSB-NIC-FLING-55634242-component-19849370.zip

Reboot! and?

Ehhh it worked, I can now bind it to a vSwitch. I hope this helps someone :). I’m also wondering if this will burn me on future ESXi updates/upgrade. I’ll post any updates if it does.

TPM security on a ESXi VM

Great part about vSphere 7 is it introduced the ability to add a TPM based hardware to a VM.

Let’s see if we can pull it off in our lab.

What I need a Key Provider, Lucky for use with 7.0.3 VMware provides a “Native Key Provider

During my deployment of the NKP, one requirement is to make a backup of the key I guess, which was failing for me. I found this VMware thread with someone having the same issue.

Sure enough, the comment by “acartwright” was pretty helpful, as I too opened the browser console and noticed the CORS errors. The only diff was I wasn’t using CNAMEs, per say, but I had done a pilot of vCenter renaming. the fact the names showing up as not matching and the ones that were listed in the console reminded me of that. When I went to check the hostname, and local host file, sure enough they had the incorrect name in there.

So, after following the steps in my old blog post to fix the hostname and the localhosts file, I tried to backup the NKP and it worked this time. 😀

So, sure there after this I went to add the TPM and I couldn’t find it, oh right it’s a newer feature, I’ll have to update the VM’s compatibility mode.

Made snapshot, updated to latest hardware ID, boots fine, lets add the TPM hardware, error can’t add TPM with snapshots. Ugh, fine delete snapshot (tested VM boots fine before doing this), add TPM success.

Before changing the VM boot option to EFI, boot the VM and boot the OS into Windows RE, use mbr2gpt command to convert the boot partitions to the proper type supported by EFI.

Once completed, change VM boot options to EFI, and check off secure boot.

Congrats you just configured a ESXi VM with a vTPM module. 🙂

 

ESXi 6.x Datastore Not Mounted

Quick post here, I had to recover from a flooded basement. Sorry for the day outage. I had to put my disc in another server and load FreeNAS, and import my ZFS volumes, recreate the iSCSI targets, and then I added them to my ESXi hosts, and rescanning the HBAs shows the disks…

but the datastores were not visible…

so I googled and found this VMware thread with some helpful commands to try. (I do kind of agree with the OP, that its annoying they removed the front end UI for import that could handle this)

esxcli storage vmfs snapshot list

esxcfg-volume -M UUID

Ehh it worked!

Hope this helps someone. If this doesn’t work you might have some other underling issue?

vSphere HA Agent cannot be correctly installed or configured… again

Story

Another vCenter Patch, Another problem 😀

This seems to be a reoccurring story these last couple posts…

Error on Host

This time after updating again a host in the cluster had the error message.

Troubleshooting

Un like the last time this happened, the event log wasn’t as blatant (flooded) complaining about the /tmp being full. and checking the host with

vdf -h

which showed only 90% full, which was still pretty high, which might have explained the one log event that I did see about it:

The ramdisk 'tmp' is full. As a result, the file /tmp/img-stg/data/vmware_f.v00 could not be written

Which was in the log right after this event of attempting to install a base ESXi image?

Installing image profile '(Updated) HPE-ESXi-Image' with acceptance level checking disabled

This seemed a bit weird but I could find any info other than what’s usuallly a very Microsoft type answer of “you can just ignore it” or “usually this is not an issue, just it says vCenter saying it is connecting to esxi host and installing it’s agent

OK I guess… moving on… the very next error event was:

Could not stage image profile '(Updated) HPE-ESXi-Image': ('VMware_bootbank_vmware-fdm_7.0.2-18455184', '[Errno 28] No space left on device')

Huh, Now note this host was installed running the official VMware Image provided by HPE for this exact hardware supported by the VMware HCL. So there should be no funny business. However I feel maybe there’s a bit of the known HPE bug as mentioned the last time this happened. It just hasn’t fully flooded /tmp just yet.

Lil Side Trail

So couple things to note here, first the ESXi image is installed on a USB/SD Card style setup as such it should be well know to define the persistent log location, as well as the scratch location. However, not many source specify changing the system swap location.

  1. Persistent Log; VMware KB; Tech Blogger
    (Most standard ESXi Log info)
  2. Scratch Log: VMware KB; Tech Blogger 1; Tech Blogger 2
    (Crash Logs, Support log creations)
  3. Swap Location: VMware Doc 1 (Configure), VMware Doc2 (About), Tech Blogger Who seem to regurgitate the exact about page from VMware.

However, researching this even more lots of posts on reddit mentioned the swap file for VM’s being on their VM directories, so if using a shared datastore they will reside there, and I shouldn’t see issues around swap usage at all at the host level.

Which if you look on the vCenter Web UI on a ESXi hosts there are two options available: VM – Swap, and System Swap.

The VMware docs doesn’t seem to describe accurately the difference between these two options.

Lookup up the error about not being able to stage the file I found this one blog post which of course mentioned changing the swap location to get past the error…

The main thing mentioned by the blogger is “The problem is caused by ESXi not having enough free space available to extract the installation packages.” but failed to specify where that exactly is, and the event log didn’t specify that either. Now since his solution was to adjust the system swap location, it begs the question. Is the package extraction location the System Swap location?

Since the host settings seem to be only specified with the alternative option checkboxes as:

Can use host cache
Can use datastore specified by host for swap files

It’s still not fully clear to me where the swap is actually located with these, assumed default settings. Or if extraction of the image actually using swap, or why the same imagine already on the ESXi host is being re-applied when your upgrade vCenter?

Resolution

So many question, so little answers, so unfortunately I’m going to go on a bit of a whim, and simply try exactly what I did before, clear the file from the /tmp location that was takin up a lot of it’s space, install the HPE patch for the known bug, in hopes it resolves the issue….

Sure enough the exact same thing happened, as in my initial post it just seems it wasn’t fully full. So the symptoms were just a bit different.

  1. vMotion all VMs to another host in the cluster (amazing vMotion works without issue)
  2. Ignore the HA warning on the VMs migrated
  3. Place Host into Maintenance mode (This clears the HA warnings on the VMs and cluster)
  4. Verify /tmp has room. Update any ESXi packages from the hardware vendor if applicable.
  5. Reboot the host.
  6. Exit Maintenance mode.

Hope this helps someone who might see the same type of error events in their ESXi event logs.

ESXi Update Network Config Failed
Set ESXi IP via CLI

Real quick post here. I was moving my ESXi hosts and vCenter to a new dedicated subnet. I did the usual; had a temp Windows System in the new subnet, create VMK with temp IP in new subnet, connect to ESXi Web UI via new Temp IP in new Subnet via temp Windows machine. Reconfigure default TCP/IP stack default gateway, change VMK0 IP address (and edit management port group VLAN id if applicable). and Away I’d go.

However on this one host for some unknown stupid reason it would simply fail “Failed – An Error occurred during host configuration”, and the detailed log was just as vague “operation failed diagnostics report unable to set network unreachable” OK… whatever, that shouldn’t matter do as I tell you! Here’s a snippet of the error, and the CLI command that simply worked without bitching.

I just figured let’s try the CLI way and see if it worked, and it turns out it did. The source I used to figure out the command syntax.

The commands I used:

Get IPs:

esxcli network ip interface ipv4 get

Set new IP:

esxcli network ip interface ipv4 set -i vmk1 -I 1.1.1.1 -N 255.255.255.0 -t static

Hope this helps someone.

ESXi new install; failed to create new Datastore

Well I booted up a new server, created a new logical drive, bot ESXi and Failed to create datastore… what is this?

Google help? Yeah Forms help.

1. Show connected disks.

ls -lha /vmfs/devices/disks/

(Verify the disk is seen. You will probably see your disk ID then :1. This is a partition on the disk. We only need to work about the main disk ID.)

Neat. next

2. Show the error on disk.

partedUtil getptbl /vmfs/devices/disks/(disk ID)

(It will probably indicate that the GPT is located beyond the end of the disk.)

Ohhh yeah, huh… fix it

3. Wipe disk and rewrite with a basic MSDOS partion.

partedUtil setptbl /vmfs/devices/disks/(disk ID) msdos

(The output from this should be similar to msdos and the next line will be o o o o)

Go to create data store after this, yay it worked. Please note to use your own values, images are just for reference.

*UPDATE* I went to reuse some old drives from an old RAID controller. In this case I had removed the logical drive from the old RAID configuration, pulled the disks. Since they were same Caddy as an alternative server, and went on to create some new logical drives to use as an alternative datastore on this particular host.

In the examples above, it would fail at creation of the datastore. In this example it failed at the point in the wizard to define the partition to create. with an error as follows:

“Either the selected disk already has a VMFS datastore or the host cannot perform a partition table conversion. Select another disk” in a nice red banner.

Now attempting My usual fix as mentioned above resulted in…

… to be updated (i have such a headache right now from the endless issues)

Had to clear the drives to fix this problem (delete logical drive) rip Drives out of server, use a USB enclosure to use “diskpart” and the “clean command on windows to clean the drives.

Then after that the health light on the server went off, saying my one disk or caddy is “unauthentic” even though it was just working. Apparently terrible engineer caddy’s.

Which to find out this issue I had to get into iLO which the admin password was unknown so had to run up my old blog post to get into that. and now after all that.. I have a headache.

Good job computers, you managed to make my day fantastic… again.

ESXi VM disconnected after applying patch

Keep this short.

I had to update a ESXi host locally as it’s mgmt connection would drop with all VMs having to go down on it. As it’s a single host with virtualized network components.

On one of the VMs this was an opportunity to update it since it needed to be shutdown temperately anyway. I took a snapshot of the VM, updated it, validated updates were fine, removed the snapshot. Then proceeded to update the host:

vim-cmd hostsvc/Maintenance_mode_enter
esxcli software vib update -d "/path/to/file.zip"
vim-cmd hostsvc/Maintenance_mode_exit
reboot

Nothing special here. However once the host came back up and I was able to access it via vCenter, one of the VM’s was shown as “disconnected” I’ve seen this with ESXi hosts before, but not particularly with a VM.

Oddly enough there’s only one datastore on the host and all other VMs are fine, and checking the datastore, all files are where they should be.

I figured maybe remove the VM from inventory and just re-add it via the vmx file, however the option was greyed out.

It turned out there was apparently still a snapshot left on the VM (noticed via delta files existing within the VMs folder path).

Removing all the snapshots resolved the issue. Turns out the VM was also running, but didn’t show the green play icon, thus I wasn’t aware of it’s powered on state. Which also explains the greyed out context menu for removing from inventory.

Hope this helps someone.