Veeam Backup Failed – SSL/TLS handshake failed

Another day, another issue.

Processing VirtualMachineName Error: Cannot get service content.
Soap fault. SSL_ERROR_SYSCALL
Error observed by underlying SSL/TLS BIO: Unknown errorDetail: 'SSL/TLS handshake failed', endpoint: 'https://vcenter.domain.localca:443/sdk'
SOAP connection is not available. Connection ID: [vcenter.domain.local].
Failed to create NFC download stream. NFC path: [nfc://conn:vcenter.domain.local,nfchost:host-#,stg:datastore-#@VirtualMachineName/VirtualMachineName.vmx].
--tr:Unable to open source file

If you come across this error, check if you have any firewalls between your Veeam proxy Server, and the vCenter server.

I’ve blogged about this type of problem before, but in that case it was DNS, in this case it’s a Firewall.

In most cases it’s either:

1) PEBKAC
2) DNS
3) Firewall <— This Case
4) A/V
5) a Bug

You may have noticed a lack in posts lately. It’s not that I can’t figure out content to share, it’s a lack pf motivation.  I’ve been burnt out with work from the pandemic when everyone got a bunch of free money and time off… I just got more work, did I get more pay? I’ll let you decide. The amount of support calls, sheesh. That’s my only real motivation — is not to be hassled. That and the fear of losing my job, but y’know, it will only make someone work just hard enough not to get fired.

This site has earned me $0, so that also doesn’t help. Thanks everyone for all the support keeping this site alive.

Send an Email using Powershell

Source: Send-MailMessage (Microsoft.PowerShell.Utility) – PowerShell | Microsoft Docs

Build your object….

$mailParams = @{
SmtpServer = 'heimdall.dgcm.ca'
Port = 25
UseSSL = $false
From = 'notifications@dgcm.ca'
To = 'nos_rulz@msn.com'
Subject = ('ON-PREM SMTP Relay - ' + (Get-Date -Format g))
Body = 'This is a test email using ON-PREM SMTP Relay'
DeliveryNotificationOption = 'OnFailure', 'OnSuccess'
}

And then send it….

Send-MailMessage $mailParams

if there’s any pre-reqs required I’ll update this blog post. That should be it though. Easy Peasy Lemon Squeezy.

Fix Orphaned Datastore in vCenter

Story

The Precursor

I did NOT want to write this blog post. This post comes from the fact that VMware is not perfect and I’m here to air some dirty laundry…. Let’s get started.

UPDATE* Read on if you want to get into the nitty gritty, otherwise go to the Summary section, for me rebooting the VCSA resolved the issue.

The Intro

OK, I’ll keep this short. 1 vCenter, 2 hosts, 1 cluster. 1 host started to act “weird”; Random power off,   Boots normal but USB controller not working.

Now this was annoying … A LOT, so I decided I would install ESXi on the local RAID array instead of USB.

Step 1) Make a backup of the ESXi config.

Step 2) Re-install ESXi. When I went to re-install ESXi it stated all data in the exiting datastore would be deleted. Whoops lets move all data first.

Step 2a) I removed all data from the datastore

Step 2b) Delete the Datastore, , and THIS IS THE STEP THAT CAUSED ME ALL FUTURE GRIEF IN THIS BLOG POST! DO NOT FORGET TO DO THIS STEP! IF YOU DO YOU WILL HAVE TO DO EVERYTHING ELSE THIS POST IS TALKING ABOUT!

Unmount, and delete the datastore. YOU HAVE BEEN WARNED!

*during my testing I found this was not always the case. I was however able to replicate the issue in my lab after a couple of attempts.

Step 3) Re-install ESXi

Step 4) Reload saved Config file, and all is done.

This is when my heart sunk.

The Assumptions

I had the following wrong assumptions during this terrible mistake:

  1. Datastore names are saved in the backup config.
    INCORRECT – Datastore names are literally volume labels and stay with the volume in which they were created on.
    UUID is stored on the device FS SuperBlock.
  2. Removing an orphaned Object in vCenter would be easy.
  3. Renaming a Datastore would be easy.
    1. If the host is managed by vCenter Server, you cannot rename the datastore by directly accessing the host from the VMware Host Client. You must rename the datastore from vCenter Server.
  4. Installing on USB drive defaults all install mount points on the USB drive.
    INCORRECT – There’s magic involved.

Every one of these assumptions burnt me hard.

The Problem

So it wasn’t until I clicked on the datastore section of vCenter when my heart sunk. The old datastore was listed attempting to right click and delete the orphaned datastore shot me with another surprise…. the options were greyed out, I went to google to see if I was alone. It turns out I was not alone, but the blog source I found also did not seem very promising… How to easily kill a zombie datastore in your VMware vSphere lab | (tinkertry.com)

Now this blog post title is very misleading, one can say the solution he did was “easy” but guess what … it’s not support by VMware. As he even states “Warning: this is a bit of a hack, suited for labs only”. Alright so this is no good so far.

There was one other notable source. This one mentioned looking out for related objects that might still be linked to the Datastore, in this case there was none. It was purely orphaned.

Talking to other in #VMware on libera chat, told me it might be possibly linked to a scratch location which is probably the reason for the option being greyed out, while this might be a reasonable case for a host, for vCenter in which the scratch location is dependent on a host itself, not vCenter, it should have the ability to clear the datastore, as the ESXi host itself will determine where the scratch location is stored (foreshadowing, this causes me more grief).

In my situation, unlike tinkertry’s situation, I knew exactly what caused the problem, I did not rename the datastore accordingly. Since the datastore name was not named appropriately after being re-created, it was mounted and shown as a new datastore.

The Plan

It’s one thing to fuck up, it another to fess up, and it’s yet another to have a plan. If you can fix your mistake, it’s prime evidence of learning and growing as you live life. One must always perceiver. Here’s my plan.

Since building the host new and restoring the config with a wrong datastore, I figured I’d I did the same but with the proper datastore in place, I should be able to remove it by bringing it back up.

I had a couple issues to overcome. First one was my 3rd assumption: That renaming a datastore was easy. Which, usually, it is, however… in this case attempting to rename it the same as the missing datastore simply told me the datastore already exists. Sooo poop, you can’t do it directly from a ESXi host unless it’s not managed by vCenter. So as you can tell a catch22, the only way to get past this was to do my plan, which was the same as how I got in this mess to being with. But sadly I didn’t know how bad a hole I had created.

So after installing brand new on another USB stick, I went to create the new datastore with the old name, overwriting the partition table ESXi install created… and you guessed it. Failed to create VMFS datastore. Cannot change host configuration. – Zewwy’s Info Tech Talks

Obviously I had gone through this before, but this time was different. it turned out attempting to clear the GPT partition table and replace it with msdos(MBR) based one failed telling me it was a read-only disk. Huh?

Googling this, I found this thread which seemed to be the root cause… Yeap my 4th assumption: “Installing on USB drive defaults all install mount points on the USB drive.”

so doing a “ls -l”, and “esxcli storage filesystem list” then “vmkfstools -P MOUNTPOINT” I was veriy esay to discover that the scratch and coredump were pointing to the local RAID logical volume I created which overwrote the initial datastore when ESXi was installed. Talk about a major annoyance, like I get why it did what it did, but in this case it is  major hindrance as I can’t clear the logical disk partition to create a new one which will be hold the datastore I need to have mounted there… mhmmm

So I kept trying to change the core dump location and the scratch location and the host on reboot kept picking the old location which was on the local RAID logical volume that kept preventing me from moving forward. Regardless if I did it via the GUI or if I did it via the backend cmd “vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /tmp/scratch” even though VMware KB mentions to create this path path first with mkdir what I found was the creation of this path was not persistent, and it would seem that since it doesn’t exist at boot ESXi changes it via it’s usual “Magic”:

“ESXi selects one of these scratch locations during startup in this order of preference:
The location configured in the /etc/vmware/locker.conf configuration file, set by the ScratchConfig.ConfiguredScratchLocation configuration option, as in this article.
A Fat16 filesystem of at least 4 GB on the Local Boot device.
A Fat16 filesystem of at least 4 GB on a Local device.
A VMFS datastore on a Local device, in a .locker/ directory.
A ramdisk at /tmp/scratch/.”

So in this case, I found this post around a similar issue, and turns out setting the scratch location to just /tmp, worked.

When I attempted to wipe the drive partitions I was again greeted by read-only, however this time it was right back to the coredump location issues, which I verified by running:

esxcli system coredump partition get

which showed me the drive, so I used the unmounted final partition of the USB stick in it’s place:

esxcli system coredump partition set -p USBDriveNAA:PartNum

Which sure enough worked, and I was able to set the logical drive to have a msdos based partition, yay I can finally re-create the datastore and restore the config!

So when the OP in that one VMware thread post said congrats you found 50% of the problem I guess he was right it goes like this.

  1. Scratch
  2. Coredump

Fix these and you can reuse the logical drive for a datastore. Let’s re-create that datastore…

This is hen my heart sunk yet again…

So I created the datastore successfully however… I had to learn about those peskey UUID’s…

The UUID is comprised of four components. Lets understand this by taking example of one of the vmfs volume’s UUID : 591ac3ec-cc6af9a9-47c5-0050560346b9

System Time (591ac3ec)
CPU Timestamp (cc6af9a9)
Random Number (47c5)
MAC Address – Management Port uplink of the host used to re-signature or create the datastore (0050560346b9)

FFS… I can never be able to reproduce that… and sure enough thats why my UUIDs not longer aligned:

I figured maybe I could make the file, and create a custom symlink to that new file with the same name, but nope “operation not permitted”:

Fuck! well now I don’t know if i can fix this, or if restoring the config with the same datastore name but different UUID will fix it or make things worse…. fuck me man…. not sure I want to try this… might have to do this on my home lab first…

Alright I finally was able to reproduce the problem in my home lab!

Let’s see if my idea above will work…

Step 1) Make config Backup of ESXi host. (should have one before mess up but will use current)

Step 2) Reload host to factory defaults.

Step 3) rename datastore

Step 4) reload config

poop… I was afraid of that…

ok i even tried, disconnecting host from vcenter after deleting the datstore  I could, recreate with same name and it always attaches with appending (1) cause the datastore exists as far as vCenter thinks, since the UUID can never be recovered… I heard a vCenter reboot may help let’s see…

But first I want to go down a rabbit hole…. the Datastore UUID, in this case the ACTUAL datastore UUID, not the one listed in a VM’s config file (.vmx), not the one listed in the vCenter DB’s (that we are trying to fix), but the one actually associated with the Datastore… after much searching…  it seems it is saved in the File Systems “SuperBlock“, in most other File Systems there’s some command to edit the UUID if you really need to. However, for VMFS all I could find was re-signaturing for cloned volumes

So it would seem if I simply would have saved the first 4MB of the logical disk, or partition, not 100% sure which at this time, but I could have in theory done a DD to replace it and recovered the original UUID and then connect the host back to vCenter.

I guess I’ll try a reboot here see what happens….

Well look at that.. it worked…

Summary

  1. Try a reboot
  2. If reboot does not Fix it call VMware Support.
  3. If you don’t have support, You can try to much the with backend DB (do so at your own risk).

 

Azure AD and the ADConnect

*Note this is not supported. Installing Azure AD Sync on a Core server but it appears it does work.

Here’s what I did, I found this MS doc for reference:

  1. I followed this to guide me to make the “primary” tenant.
    no, I did not check either checkbox, **** em!
  2. I read this content to understand the tenant hierarchy.
  3. I added a custom domain (zewwy.ca), it said, sure no problem no federation issues, just verify. (Create a TXT record on the registrar to verify you own domain.)
    *refresh the page and the status will update accordingly.
  4.  I proceeded to download the Azure AD Connect msi file via the provided link after adding the custom domain.
  5. Install: (This was on Server 2016 Core)

2015.. interesting…

Click Accept Next.

Enter the Credentials from Step 1 (or enter the credentials provided by your MSP/CSP/VAR.

Enter the credentials of the local domain, enterprise admin account.

If you wish to do a hybrid Exchange setup check the second checkbox, Not sure how to configure this later but I’m sure there is a way. At this time that was not part of this post’s goals.

There was one snippet I missed, it appears to install a SQL express on the DC.

Then it appears to install a dedicated service.

This is Ground Control to Major Tom…

This is Major Tom to Ground Control… You’ve really made the grade!

They got all my passwords!

wait … it worked…. like what? No Errors?… No Service account creations? It actually just worked?…

Goto azure portal login, use my on prem credentials… and it logged me in….

I’m kind of mind blown right now. Well Guess on the next post can cover possibly playing with M365 services. Stay tuned. 😀

How to vMotion a VM without vCenter

Well here I am… again…

In short, you figure… “Ummm just vMotion the VM in vCenter” and for the most part I would agree, however what do you do if you need to move a VM, for example vCenter, and it just so happens to be on an ESXi host that is not within a cluster with other similar ESXi hosts, or in a cluster without EVC? (In most cases rare, sure) However I happened to be just in that situation recently.

First thing I thought I’d just copy the files via ESXi console, using the CP command, and it for the most part it seemed to work for one smaller VM. However when I went to do it against vCenter. It seemed to be going longer then I had expected. After nearly an hour… I decided to see what was going on… but since I was just using CP command how do I find out the processes time?

Yes, by running stat on target file and local file, and get a file size,

i.e stat -c “%s” /bin/ls”

Oh neat, so when I went to check the source was 28.5 Gigs…. and the target was 94 Gigs… wait wait what??? I can only assume something messed up with the copying cause the files were thin provisioned… not sure stopped the process and deleted the files…

Now I began to Google search and I wasn’t searching properly and found useless results such as this: Moving virtual machines with Storage vMotion (1005544) (vmware.com) then I got my act together and found exactly what I was looking for from here: How to move VMware ESXi VM to new datastore using vmkfstools | Alessandro Arrichiello (alezzandro.com)

So basically I liked this, and it was what I needed, I was just slightly annoyed that 1) there wasn’t a nice way to do multiple VMDKs via his examples, just for all the other files, so I took the one liner from the other files trick, and found out how to get the path I need from the files in question.

Low and behold here’s how to do the magic!

1) This assumes a shared datastore between hosts (if you need to move files between hosts without a shared datastore, follow this guide from VMware arena.) (I’m not sure but I think you can leave the VM’s registered, but they have to be powered down, and that there are no snapshots.)

2) Ensure you make the directory you wish to move the VM files to.

mkdir /vmfs/volumes/DatastoreTarget/VMData

3) Copy/Clone VMDK files to target.

find "/vmfs/volumes/DatastoreSource/VMData" -maxdepth 1 -type f | grep ".vmdk" | grep -v "flat" | while read file; do vmkfstools -i $file -d thin /vmfs/volumes/DatastoreTarget/VMData/${file##*/}; done

4) Copy remain files to target.

find "/vmfs/volumes/DatastoreSource/VMData/" -maxdepth 1 -type f | grep -v ".vmdk" | while read file; do cp "$file" "/vmfs/volumes/DatastoreTarget/VMData/"; done

Once done cloning and copying all necessary files, add the VM from the new datastore back to inventory.

In the vSphere client go to: Configuration->Storage->Data Browser, right click the destination datastore which you moved your VM to and click “Browse datastore”.

Browse to your VM and right click the .vmx file, then click “Add to inventory”

Boot up the VM to see if it works, when asked whether you copied or moved it, just answer that you moved it. In this case it all depends on if you want the VM DI to stay the same as it is known within vCenter. As long as you properly delete the old files and removed it from the host inventory, this will complete the VM migration. If you don’t plan on deleting the old VM, or do not care about VM IDs or backups, then select “I copied it”.

Hope this helps someone.

Failed to create VMFS datastore. Cannot change host configuration.

Quick one here. Create a new logical disk via RAID5, after an old logical unit failed from only a single bad disk.

No issues deleting the old logical disk, and creating a new one via HP storage controller commands.

However was greeted with this nice error.

From here, by Cookies04: ”

I had the same problem and in order to fix it I had to run three commands through an SSH connection. From what I have seen and found this error comes from having disks that were part of different arrays and contain some data on them. When I ran the commands I was then able to connect the data stores with no issues.

1. Show connected disks.

ls -lha /vmfs/devices/disks/

(Verify the disk is seen. You will probably see your disk ID then :1. This is a partition on the disk. We only need to work about the main disk ID.)

2. Show the error on disk.

partedUtil getptbl /vmfs/devices/disks/(disk ID)

(It will probably indicate that the GPT is located beyond the end of the disk.)

3. Wipe disk and rewrite with a basic MSDOS partion.

partedUtil setptbl /vmfs/devices/disks/(disk ID) msdos

(The output from this should be similar to msdos and the next line will be o o o o)

I hope this helps you out.”

Looks like it worked… Thanks Cookie04!

ESXi /tmp is Full

I’ll keep this post short and to the point. Gott errors in the alerts.

I was like huh, interesting… go to validate it on the host by logging in via SSH then typing the command:

vdf -h

At the bottom you can see /tmp space usage:

I then found out about this cool command from this thread:

find /tmp/ -exec ls -larth '{}' \;

This will list all the files and their sizes to gander at, when I noticed a really large file:

I decided to look up this file and found this lovely VMware KB:

The Workaround:

echo > /tmp/ams-bbUsg.txt

The solution:

To fix the issue, upgrade to VMware AMS to version 11.4.5 (included in the HPE Offline Bundle for ESXi version 3.4.5), available at the following URLs:

HPE Offline Bundle for ESXi 6.7 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-a38161c3e8674777a8c664e05a

HPE Offline Bundle for ESXi 6.5 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-7d214544a7e5457e9bb48e49af

HPE Offline Bundle for ESXi 6.0 Version 3.4.5

https://www.hpe.com/global/swpublishing/MTX-98c6268c29b3435e8d285bcfcc

Procedure

  1. Power off any virtual machines that are running on the host and place the host into maintenance mode.
  2. Transfer the offline bundle onto the ESXi host local path, or extract it onto an online depot.
  3. Install the bundle on the ESXi host.
    1. Install remotely from client, with offline bundle contents on a online depot:
      esxcli -s <server> -u root -p mypassword software vib install -d <depotURL/bundle-index.xml>
    2. Install remotely from client, with offline bundle on ESXi host:
      esxcli -s <server> -u root -p mypassword software vib install -d <ESXi local path><bundle.zip>
    3. Install from ESXi host, with offline bundle on ESXi host:
      esxcli software vib install -d <ESXi local path><bundle.zip>
  4. After the bundle is installed, reboot the ESXi host for the updates to take effect.
  5. (Optional) Verify that the vibs on the bundle are installed on your ESXi host.
    esxcli software vib list
  6. (Optional) Remove individual vibs. <vib name> can be identified by listing the vibs as shown in #5.
    esxcli software vib remove -n <vib name>

    Summary

    Use the commands shown to trace the source of the usage, your case may not be as easy. Once found hopefully find a solution. In my case I got super lucky and other people already found the problem and solution.

Veeam – More Than One Replica Candidate Found

Story Time!

The Problem!

So real quick one here. I edited a Replication job and changed it source form production to a backup dataset within the Veeam Replication Job settings. I went to run the replication job and was presented with an error I have no seen before…

I had an idea of what happened (I believe the original ESXi host might have been rebuilt) I’m not 100% sure, but just speculating. I was pretty sure the change I made on the job was not the source of the problem.

Since I wasn’t concerned about the target VM being re-created entirely I decided to go to Veeam’s Replica’s, and right clicked the target VM, and picked Delete from Disk… to my amazement the same error was presented…

Alright… kind of sucks, but here’s how I resolved it.

The Solution

Sadly I had to right click the Target VM under Veeams Replicas, and instead picked “Remove from Configuration”. What’s really annoying about this is it will remove the source VM from the replication job itself as well.

Why? Unno Veeams coding choices...

So after successfully removing the target VM from Veeam’s configuration, I manually deleted the target VM on the host ESXi host. Then I had to reconfigure the replication job and point it to the source VM again. Again if your interested in why that’s the case see the link above.

After that the job ran successfully. Hope this helps someone.

Exchange Certificates and SMTP

Exchange and the Certificates

Quick Post here… If you need to change Certificates on a SMTP receiver using TLS.. how do you do it?

You might be inclined to search and find this MS Doc source: Assign certificates to Exchange Server services | Microsoft Docs

What you might notice is how strange the UI is designed, you simple find the certificate, and in it’s settings check off to use SMTP.

Then in the connectors options, you simply check off TLS.

Any sensible person, might soon wonder… if you have multiple certificates, and they can all enable the check box for SMTP, and you can have multiple connectors with the checkbox enabled for TLS…. then… which cert is being used?

If you have any familiarity with IIS you know that you have multiple sites, then you go enable HTTPS per site, you define which cert to use (usually implying the use of SNI).

When I googled this I found someone who was having a similar question when they were receiving a unexpected cert when testing their SMTP connections.

I was also curious how you even check those, and couldn’t find anything native to Windows, just either python, or openSSL binaries required.

Anyway, from the first post seems my question was answered, in short “Magic”…

“The Exchange transport will pick the certificate that “fits” the best, based on the if its a third party certificate, the expiration date and if a subject name on the certificate matches what is set for the FQDN on the connector used.” -AndyDavid

Well that’s nice…. and a bit further down the thread someone mentions you can do it manually, when they source non other than the Exchange Guru himself; Paul Cunninham.

So that’s nice to know.

The Default Self Signed Certificate

You may have noticed a fair amount of chatter in that first thread about the default certificate. You may have even noticed some stern warnings:

“You can’t unless you remove the cert. Do not remove the built-in cert however. ” “Yikes. Ok, as I mentioned, do not delete that certificate.”-AndyDavid

Well the self signed cert looks like is due to expire soon, and I was kind of curious, how do you create a new self signed certificate?

So I followed along, and annoyingly you need an SMB shared path accessible to the Exchange server to accomplish this task. (I get it; for clustered deployments)

Anyway doing this and using the UI to assign the certificates to all the required services. Deleted the old Self Signed Cert, wait a bit, close the ECP, reopen it and….

I managed to find this ms thread with the same issue.

The first main answer was to “wait n hour or more”, yeah I don’t think that’s going to fix it…

KarlIT700 – ”

Our cert is an externally signed cert that is due to expire next year so we wanted to keep using it and not have to generate a new self sign one.

We worked around this by just running the three PS commands below in Exchange PS

Set-AuthConfig -NewCertificateThumbprint <WE JUST USED OUR CURRENT CERT THUMPRINT HERE> -NewCertificateEffectiveDate (Get-Date)
Set-AuthConfig -PublishCertificate
Set-AuthConfig -ClearPreviousCertificate

 

Note: that we did have issues running the first command because our cert had been installed NOT allowing the export of the cert key. once we reinstalled the same cert back into the (local Computer) personal cert store but this time using the option to allow export of the cert key, the commands above worked fine.

We then just needed to restart ISS and everything was golden. :D”

Huh, sure enough this MS KB on the same issue..

The odd part is running the validation cmdlet:

(Get-AuthConfig).CurrentCertificateThumbprint | Get-ExchangeCertificate | Format-List

Did return the certificate I renewed UI the ECP webUI… even then I decided to follow the rest of the steps, just as Karl has mentioned using the thumbprint from the only self signed cert that was there.

Which sure enough worked and everything was working again with the new self signed cert.

Anyway, figured maybe this post might help someone.