Failed to create VMFS datastore. Cannot change host configuration.

Quick one here. Create a new logical disk via RAID5, after an old logical unit failed from only a single bad disk.

No issues deleting the old logical disk, and creating a new one via HP storage controller commands.

However was greeted with this nice error.

From here, by Cookies04: ”

I had the same problem and in order to fix it I had to run three commands through an SSH connection. From what I have seen and found this error comes from having disks that were part of different arrays and contain some data on them. When I ran the commands I was then able to connect the data stores with no issues.

1. Show connected disks.

ls -lha /vmfs/devices/disks/

(Verify the disk is seen. You will probably see your disk ID then :1. This is a partition on the disk. We only need to work about the main disk ID.)

2. Show the error on disk.

partedUtil getptbl /vmfs/devices/disks/(disk ID)

(It will probably indicate that the GPT is located beyond the end of the disk.)

3. Wipe disk and rewrite with a basic MSDOS partion.

partedUtil setptbl /vmfs/devices/disks/(disk ID) msdos

(The output from this should be similar to msdos and the next line will be o o o o)

I hope this helps you out.”

Looks like it worked… Thanks Cookie04!

vMotion – Not Allowed in the Current State

First things first, I vMotioned the vm to another host and that worked fine, so the issue appeared to be target related. I also found this post, which states to restart the mgmt and vpxa services:

/etc/init.d/hostd restart

/etc/init.d/vpxa restart

doing this on the source ESXi did nothing, again seeming the issue is on the target. Did the same tasks on the target and it still failed.

I then disconnected the target esxi, put it in maintenance mode, rebooted it, took it out of maintenance mode, reconnected to vCenter, and this time the vMotino worked.

Hope this helps someone.

Veeam – More Than One Replica Candidate Found

Story Time!

The Problem!

So real quick one here. I edited a Replication job and changed it source form production to a backup dataset within the Veeam Replication Job settings. I went to run the replication job and was presented with an error I have no seen before…

I had an idea of what happened (I believe the original ESXi host might have been rebuilt) I’m not 100% sure, but just speculating. I was pretty sure the change I made on the job was not the source of the problem.

Since I wasn’t concerned about the target VM being re-created entirely I decided to go to Veeam’s Replica’s, and right clicked the target VM, and picked Delete from Disk… to my amazement the same error was presented…

Alright… kind of sucks, but here’s how I resolved it.

The Solution

Sadly I had to right click the Target VM under Veeams Replicas, and instead picked “Remove from Configuration”. What’s really annoying about this is it will remove the source VM from the replication job itself as well.

Why? Unno Veeams coding choices...

So after successfully removing the target VM from Veeam’s configuration, I manually deleted the target VM on the host ESXi host. Then I had to reconfigure the replication job and point it to the source VM again. Again if your interested in why that’s the case see the link above.

After that the job ran successfully. Hope this helps someone.

vSphere HA Agent cannot be correctly installed or configured… again

Story

Another vCenter Patch, Another problem 😀

This seems to be a reoccurring story these last couple posts…

Error on Host

This time after updating again a host in the cluster had the error message.

Troubleshooting

Un like the last time this happened, the event log wasn’t as blatant (flooded) complaining about the /tmp being full. and checking the host with

vdf -h

which showed only 90% full, which was still pretty high, which might have explained the one log event that I did see about it:

The ramdisk 'tmp' is full. As a result, the file /tmp/img-stg/data/vmware_f.v00 could not be written

Which was in the log right after this event of attempting to install a base ESXi image?

Installing image profile '(Updated) HPE-ESXi-Image' with acceptance level checking disabled

This seemed a bit weird but I could find any info other than what’s usuallly a very Microsoft type answer of “you can just ignore it” or “usually this is not an issue, just it says vCenter saying it is connecting to esxi host and installing it’s agent

OK I guess… moving on… the very next error event was:

Could not stage image profile '(Updated) HPE-ESXi-Image': ('VMware_bootbank_vmware-fdm_7.0.2-18455184', '[Errno 28] No space left on device')

Huh, Now note this host was installed running the official VMware Image provided by HPE for this exact hardware supported by the VMware HCL. So there should be no funny business. However I feel maybe there’s a bit of the known HPE bug as mentioned the last time this happened. It just hasn’t fully flooded /tmp just yet.

Lil Side Trail

So couple things to note here, first the ESXi image is installed on a USB/SD Card style setup as such it should be well know to define the persistent log location, as well as the scratch location. However, not many source specify changing the system swap location.

  1. Persistent Log; VMware KB; Tech Blogger
    (Most standard ESXi Log info)
  2. Scratch Log: VMware KB; Tech Blogger 1; Tech Blogger 2
    (Crash Logs, Support log creations)
  3. Swap Location: VMware Doc 1 (Configure), VMware Doc2 (About), Tech Blogger Who seem to regurgitate the exact about page from VMware.

However, researching this even more lots of posts on reddit mentioned the swap file for VM’s being on their VM directories, so if using a shared datastore they will reside there, and I shouldn’t see issues around swap usage at all at the host level.

Which if you look on the vCenter Web UI on a ESXi hosts there are two options available: VM – Swap, and System Swap.

The VMware docs doesn’t seem to describe accurately the difference between these two options.

Lookup up the error about not being able to stage the file I found this one blog post which of course mentioned changing the swap location to get past the error…

The main thing mentioned by the blogger is “The problem is caused by ESXi not having enough free space available to extract the installation packages.” but failed to specify where that exactly is, and the event log didn’t specify that either. Now since his solution was to adjust the system swap location, it begs the question. Is the package extraction location the System Swap location?

Since the host settings seem to be only specified with the alternative option checkboxes as:

Can use host cache
Can use datastore specified by host for swap files

It’s still not fully clear to me where the swap is actually located with these, assumed default settings. Or if extraction of the image actually using swap, or why the same imagine already on the ESXi host is being re-applied when your upgrade vCenter?

Resolution

So many question, so little answers, so unfortunately I’m going to go on a bit of a whim, and simply try exactly what I did before, clear the file from the /tmp location that was takin up a lot of it’s space, install the HPE patch for the known bug, in hopes it resolves the issue….

Sure enough the exact same thing happened, as in my initial post it just seems it wasn’t fully full. So the symptoms were just a bit different.

  1. vMotion all VMs to another host in the cluster (amazing vMotion works without issue)
  2. Ignore the HA warning on the VMs migrated
  3. Place Host into Maintenance mode (This clears the HA warnings on the VMs and cluster)
  4. Verify /tmp has room. Update any ESXi packages from the hardware vendor if applicable.
  5. Reboot the host.
  6. Exit Maintenance mode.

Hope this helps someone who might see the same type of error events in their ESXi event logs.

ESXi Update Network Config Failed
Set ESXi IP via CLI

Real quick post here. I was moving my ESXi hosts and vCenter to a new dedicated subnet. I did the usual; had a temp Windows System in the new subnet, create VMK with temp IP in new subnet, connect to ESXi Web UI via new Temp IP in new Subnet via temp Windows machine. Reconfigure default TCP/IP stack default gateway, change VMK0 IP address (and edit management port group VLAN id if applicable). and Away I’d go.

However on this one host for some unknown stupid reason it would simply fail “Failed – An Error occurred during host configuration”, and the detailed log was just as vague “operation failed diagnostics report unable to set network unreachable” OK… whatever, that shouldn’t matter do as I tell you! Here’s a snippet of the error, and the CLI command that simply worked without bitching.

I just figured let’s try the CLI way and see if it worked, and it turns out it did. The source I used to figure out the command syntax.

The commands I used:

Get IPs:

esxcli network ip interface ipv4 get

Set new IP:

esxcli network ip interface ipv4 set -i vmk1 -I 1.1.1.1 -N 255.255.255.0 -t static

Hope this helps someone.

New-MailboxImportRequest Failed

This is going to be another short post.

Working on an Exchange migration this weekend, I was using our backup software to simply export users mailbox’s from the most recent backup of your old Exchange server, then importing them into the new Exchange server for each mailbox after creation.

I would have loved to have simply selected each user as a whole and import those pst files. However from testing showed it simply created a sub item with the users name and all their folders, instead of properly placing them under the primary parent hierarchy. So I was forced to export Each item individually (Inbox, Sent Items,Drafts, Etc) and Import them. I initially didn’t script this as there were only about 30-40 users I had to migrate, i figured it was easier to just go through the wizards… until I discovered some users created folders outside of their Inbox! Ohhh boy…. Anyway, turns out if you exceed 9 imports for a single mailbox without specifying a special name for it (even after they succeed) you will get en error as follows:

“The name must be unique per mailbox. There isn’t a default name available for a new request owned by mailbox xyz”

The solution was easy enough to find a good band-aid indeed.

Get-MailboxImportRequest -status completed | Remove-MailboxImportRequest

However sometimes in my case I found I was still getting there error even though I cleared all completed import requests (with default names obviously). I found out I was having a weird bug happen to be where imports where showing as Queued, yet if I piped them into Get-MailboxImportRequestStatstics | Select Status, they reported a status of Completed…(If you want all the details, pipe into Format-List, instead of Select)

Get-MailboxImportRequest -status Queued | Get-MailboxImportRequestStatstics | Select Status

lol I wasn’t sure what to make of this but there was 2 solutions.

  1. Clear the “Queued” imports that are really Completed.
  2. Give your new import a unique name using the -name parameter

I’ll admit though Exchange 2016 is more intuitive to manage then old Exchange 2010.