Hypertext String Validation via Powershell

So I had this running code:

function isURL($URL) 
{
$uri = $URL -as [System.URI]
$uri.AbsoluteURI -ne $null -and $uri.Scheme -match "http|https"
}

isURL('http://www.powershell.com')
isURL('test')
isURL($null)
isURL('zzz://zumsel.zum')
isURL('hp:')
isURL('https:')
isURL('http')
isURL('http:/incomplete')
isURL('Maybenot.http://complete') #our function has an outliar here
isURL('http://complete.should.return.true')
isURL('https://also.complete.should.return.true')

Though there was one outliar, lets fix that…

I was having some issues playing around with different things, till I got me head out my ass and followed KISS principal..

Found this simple reference… and made a simple change in my code…

function isURL($URL) 
{
$uri = $URL -as [System.URI]
$uri.AbsoluteURI -ne $null -and $uri.Scheme -like "http*"

}

isURL('http://www.powershell.com')
isURL('test')
isURL($null)
isURL('zzz://zumsel.zum')
isURL('hp:')
isURL('https:')
isURL('http')
isURL('http:/incomplete')
isURL('Maybenot.http://complete') #All Good now :)
isURL('http://complete.should.return.true')
isURL('https://also.complete.should.return.true')

Normally if your doing coding in other languages and not writing scripts, you’d usually want to write actual test code blocks. In scripting usually just keep things simple by utilizing input validation. If you look online you can use Invoke-Request but that requires being dependent on proper network stack and puts a load on the server or something that could easily be validated client side before any server requests are made.

Hope this helps someone.

Bonus (getting all sub paths from a URL string):

$Tet = "http://somesite.notorg/subsite/subite2/s3/doc/folder/no/matter/how/deep?"
$Array = ($Tet -split "/")
$Array = $Array[3..($Array.length -1)]
foreach($Item in $Array)
{
$FullLine = $FullLine + "\" + $Item
}
$FullLine

Mailbox Offline Exception

Since I need some email from an address I use, I figured I’d have some fun and spin up the ol’ Exchange server.

To my surprise when I attempted to login to OWA (since the front ends were loading just fine) after authentication I would be greeted with “Microsoft.Exchange.Data.Storage.MailboxOfflineException”.

My initial googlings didn’t provide much of good results.

I went to the server and did the usual check services and such, and noticed the root cause. Low Disk Space. I figured extending the logical volume and a reboot would suffice… nope. Problem persisted.

I decided to run the MS Exchange health checker: https://aka.ms/ExchangeHealthChecker

even after getting everything green in the health checker, the problem persisted.

A bit more google fooing and I was able to track down someone with a similar problem on TechNet with some useful guidance to use eseutil.exe to check the database.

The database indeed return “Dirty Shutdown”

ran the repair commands. *Note* you should try to use /r before using /p if it works you don’t need to use /p as it’s a hard recovery and data loss could ensure from it. I didn’t care as it’s use lab data.

K checking again it return “Clean Shutdown” everything I’ve read says it should be able to be mounted now. Failed to Mount….

As a last ditch effort, I try to Google some more if I missed something else. I found this nice post by Eric Simson

Step 1: Backup the Database (my case don’t care)
Step 2: Check Storage
(Was the cause, extended volume to 190GB used out of 250GB)
Step 3: Restart Exchange Services (Yeap, ran health checker)
Step 4: Check Database State (Yeap fixed it)
Step 5: Repair Exchange Database (Yeap fixed it)

Yet even after reboot and using PowerShell AND using accept data loss…

I was about to give up when I had one final idea, I realized that since /p does a hard recovery of the DB even if the log files are lost, and the log files take up a lot of space…

At this point I had well over 50% free space on the server. I ran the repair DB command again just to be safe.

wait.. what .. no error…. guess I was only at 24% free space, and that wouldn’t cut it, I don’t get why considering the -AcceptDataloss was defined.

Go to log in to OWA…. Ehhhh!!! There’s my emails!

Hope this helps someone.

Log Searching with Powershell

Context. You have a log directory with hundreds of log files, you need to look for a specific string, but you don’t know which file it resides in.

With PowerShell we can restrict things down in two ways.

  1. If we roughly know when the log entry was done, we can constrain on time.
  2.  We can then use Select String to filter further.
$daysToCheck = $(get-date).AddDays(-2)

-2 in this case indicates I want to find files that were modified at most 2 days ago. These means from right now, go back a max of 2 days.

Get-ChildItem -Recurse | ?{$_.LastWriteTime -gt $daysToCheck} | Select-String "String to Search for" -list | Select Path

In this example it’ll search the current working directory as it was not defined in the first command call. the list operation is important as to only list the file the string was found in once, else the file path will be listed for every instance the string is found within the file.

This will list all the files contain the string in question. What you wish to do with this is list is on you. However you at least now know where to look further for more information on whatever it is you might be looking for.

Hope this helps someone.

WinRM on Server Core

Prerequisites

  • AD with a Enterprise CA
    Why? For easier Certificate management, if you want step by step details using self sign, you can read this blog post by Tyler Muir. Thanks Tyler for your wonderful blog post it was really help to me.
  • Server Core (2016+)
  • A Certificate Template published and available to client machines

Now you *Technically* don’t need a template, if you were using self signed. However there are some prerequisites to the Certificate. According to the official Microsoft source it states:

“WinRM HTTPS requires a local computer Server Authentication certificate with a CN matching the hostname to be installed. The certificate mustn’t be expired, revoked, or self-signed.”

If you have a correct cert but not for the type of server auth you will get an error:

Which is super descriptive and to the point.

Implementation

Basic Implementation

If you don’t have a Server Authenticating certificate, consult your certificate administrator. If you have a Microsoft Certificate server, you may be able to request a certificate using the web certificate template from HTTPS://<MyDomainCertificateServer>/certsrv.

Once the certificate is installed type the following to configure WINRM to listen on HTTPS:

winrm quickconfig -transport:https

If you don’t have an appropriate certificate, you can run the following command with the authentication methods configured for WinRM. However, the data won’t be encrypted.

winrm quickconfig

Example:

On my Core Server domain joined, using a “Computer”/Machine Template certificate.

powershell
cd Cert:\LocalMahcine\My
Get-Certificate -Template Machine

ensure you exit out of powershell to run winrm commands

winrm quickconfig -transport:HTTPS

Congrats you’re done.

Advanced Implementation

Now remember in the above it stated “If you don’t have a Server Authenticating certificate, consult your certificate administrator. If you have a Microsoft Certificate server, you may be able to request a certificate using the web certificate template ”

That’s what this section hopes to cover.

There’s only one other pre-req I can think of besides the primary ones mentioned at the start of this blog post.

Once these are met, request a certificate from the CA and ensure it’s installed on the client machine you wish to configure WinRM on. Once installed grab the certificate Thumbprint.

Creating the listener using the certificate ThumbPrint:

winrm create winrm/config/Listener?Address=*+Transport=HTTPS '@{Hostname="<YOUR_DNS_NAME>"; CertificateThumbprint="<COPIED_CERTIFICATE_THUMBPRINT>"}'

Manually configuring the Firewall:

netsh advfirewall firewall add rule name="Windows Remote Management (HTTPS-In)" dir=in action=allow protocol=TCP localport=5986

Start the service:

net start winrm

Issues

Failed to create listener

Error: “The function: “HttpSetServiceConfiguration” failed unexpectedly. Error=1312.

Resolution: Ensure the machine actually has the key required for the certificate.  See Reference Three in this blog for more details.

Not Supported Certificate

Error: “The requested certificate template is not supported by this CA”

Resolution: Ensure you typed the Certificate template name correctly. If so, Ensure it is published to the CA signing the certificate.

References

Zero

official Microsoft source

One

Straight to the point command references at site below:
ITOM Practitioner Portal (microfocus.com)

Two

Another great source that covers manual setup of WinRM:
Visual Studio Geeks | How to configure WinRM for HTTPS manually

Three

When using the MMC snap in pointed to a ore server certificate store, and generated the cert request, and imported the certificate all using the MMC Snap cert plugin remotely. Whenever I would go to create the listener it would error out with “The function: “HttpSetServiceConfiguration” failed unexpectedly. Error=1312. 

I could only find this guys blog post covering it where he seems to indicate that he wasn’t importing the key for the cert.

Powershell WinRM HTTPs CA signed certificate configuration | vGeek – Tales from real IT system Administration environment (vcloud-lab.com)

This reminded me of a similar issue using Microsoft User Migration Tool and the Cert store showing it had the cert key (little key icon in the cert mmc snap in) but not actually being available. I felt this was the same case. Creating the req from the client machine directly, copying to CA, signing, copying signed cert back to client machine and installing manually resolved the issue.

My might have been able to just use the cert I created via the MMC snap in by running

certutil –repairstore my <serial number> 

I did not test this and simply create the certificate (Option 2) from scratch.

Four

“The requested certificate template is not supported by this CA.

A valid certification authority (CA) configured to issue certificates based on this template cannot be located, or the CA does not support this operation, or the CA is not trusted.”

This one lead me down a rabbit hole for a long time. Whenever I would have everything in place and request the certificate via powershell I would get this error. If you Google it you will get endless posts how all you need to do is “Publish it to your CA”, such this and this

it wasn’t until I attempted to manually create the certificate (Option 2) did it finally state the proper reason which was.

“A certificate issued by the certificate  authority cannot be installed. Contact your system administrator.
a certificate chain could not be built to a trusted root authority.”

I think checked, and sure enough (I have no clue how) my DC was missing the Offline Root Certificate in it’s Trusted Root Authority store.

Again all buggy, attempting to do it via the Certificate Snap in MMC remotely caused an error, so I had to manually copy the offline root cert file to the domain controller and install it manually with certutil.

This error can also stem from specifying a certificate template that doesn’t exist on the CA. Hence all the blog posts to “publish it”.  HOWEVER, in my case I had assumed the “Computer” template (as seen in MMC Snap in Cert tool) is only the display name, the actual name for this template is actually “Machine”

Five

I just have to share this, cause this trick saved my bacon. If you use RDP to manage a core server, you can also use the same RDP to copy files to the core server. Since you know, server core doesn’t have a “GUI”.

On windows server core, how can I copy file located in my local computer to the windows server? – Server Fault

In short

  1. enable you local drive under the Resources tab of RDP before connecting.
  2. open notepad on the RDP session core server.
  3. Press CTRL+O (or File->Open). Change file type to all.
  4. Use the notepad’s file explorer to move files. 😀

Six

Another thing to note about Core Server 2016:

Unable to Change Security Settings / Log on as Batch Service on Server Core (microsoft.com)

Server Core 2016, does not have added capability via FOD

Thus does not have secpol, or mmc.exe natively. To set settings either use Group Policy, or if testing on standalone instances or Server Core 2016, you’ll have to define to security policies via a system with a GUI installed, export them and import them into core using secedit.

¯\_(ツ)_/¯

Microsoft Certificate Auto-enrollment

Source: Certificate Autoenrollment in Windows Server 2016 (part 3) – PKI Extensions (sysadmins.lv)

Thanks to Vadims Podans for his detailed write up.

Source 2: Basic: How to set up automatic certificate enrollment in Active Directory – Druva Documentation

Source 3 (Official): Configure server certificate auto-enrollment | Microsoft Docs

Overview

Autoenrollment configuration in general consist of three steps: configure autoenrollment policy, prepare certificate templates and prepare certificate issuers. Each configuration step is described in next sections.

Pre-requirements

  • Working AD
  • Enterprise CA
  • Proper Permissions (This post assumed domain admin rights)

Setup

Configure Autoenrollment Policy

  1. Start Group Policy editor. In Active Directory environment, use Group Policy Management Console (gpmc.msc). In workgroup environment, use Local Group Policy Editor (gpedit.msc);
  2. Expand to
 Computer Configuration\Policies\Windows Settings\Security Settings\Public Key Policies
  1. Double-click on Certificate Services Client – Auto-enrollment;
  2. Set Configuration Model to Enabled;
  3. Configure the policy save settings:
  4. Repeat steps 2-5 for User Configuration node.

*Note 1* You technically don’t *NEED* a policy, the minimum you do need is the registry settings the policy defined. The reason for the policy is obliviously for scalability purposes. The key it defines is:

Key: SOFTWARE\Policies\Microsoft\Cryptography\AutoEnrollment
Value: AEPolicy
Type: DWORD

Of course HKLM and HKCU will be used depending on which one was defined in the policy, so if you want user auto enrollment ensure the registry is defined in the HKCU. If you want machine auto enrollment ensure it is defined in HKLM.

*Note 2* Vadims doesn’t cover what each value represents, or what possible values are available. I was only able to find this source on it which made the following statements:

“Hi,
http://technet.microsoft.com/en-us/library/cc731522.aspx

The two checkboxes (point 7) control the value of AEPolicy
0 = non
1 = second
6 = first
7= both selected”

Configuring Certificate Templates

This section covers how to configure certificate templates.

Default settings

The following are the default settings:

  • Both domain administrators from the root domain, and enterprise administrators for fresh installations of Windows Server 2003 (and newer) domains may configure templates.
  • Certificate template ACLs are viewed in the Certificate Templates MMC snap-in.
  • Certificate templates can be cloned or edited using the Certificate Templates MMC snap-in.
  • Certificate Template need to be published before they can be used.
  • Authenticated Users have Read permission on the Template. (Leave it be)

Creating a new template for the autoenrollment of Web Server Cert

In this exercise we will create certificate template that will be intended for Server Authentication usually for a web server (IIS). As the additional requirement, the certificate will be stored on the server. To create a new template for autoenrollment for a web server:

  1. Log on to a computer where ADCS Remote Server Administration Tools (RSAT) are installed with Enterprise Admins permissions;
  2. Press Win+R key combination on the keyboard.
  3. In the Run dialog box, type certtmpl.msc, and then click Ok.
    The Certificate Templates MMC snap-in may also be invoked using the Certification Authority MMC snap-in by selecting the Certificate Templates folder, right-clicking, and then selecting Manage.
  4. In the console tree, click Certificate Templates.
  5. In the details pane, right-click the Web Server template, and then click Duplicate Template.
  6. The Compatibility tab of the new template properties dialog box appears. Configure compatibility settings to minimum OS version that will consume this template and minimum OS version of CA server that will issue certificates based on this template. (In my Lab Server 2016, and client Windows 10)
  7. On the General Tab, Give it a name, Do not publish in AD. If you want more info on these 2 checkboxes read Vadims guide on creating a smart card cert.
  8. Click the Request Handling tab. This tab is used to define how the certificate request should be processed. Use default settings in this tab.
  9. Switch to Cryptography tab:
    I use Key Storage Provider, RSA, 2048, Requests can use any provider.
  10. Switch to Subject Name tab. This tab is used to define how the subject name and certificate properties will be built.
    *IMPORTANT* Check off “Use subject information from existing certificates for autoenrollment renewal requests.
  11. Switch to Security tab. This tab is used to define which users or groups may enroll or autoenroll for a certificate template. A user or group must have the ReadEnroll, and Autoenroll permissions to automatically be enrolled for a certificate template.
    In our case any web server computers joined to the domain will be granted Read, Enroll, Autoenroll permissions.

Publishing the Certificate Template

When certificate template is prepared for autoenrollment, it must be added to Enterprise CA server for issuance. This section will describe how to add certificate template to CA for issuance by using Certification Authority MMC snap-in. For examples using certutil, and Powershell see Vadims post.

*Note* Standalone CA does not support certificate templates

Configuring CA using MMC

The most convenient way to add certificate template to CA is to use Certification Authority MMC snap in:

  1. Log on to CA server or computer with Remote Server Administration Tools installed with CA Administrator permissions;
  2. Press Win+R key combination on the keyboard;
  3. In the Run… dialog, type “certsrv.msc”;
  4. If necessary, click on root node, then press Action menu and select Retarget Certification Authority to connect to desired CA server;
  5. When connected, expand CA node and select Certificate Templates folder. You will see certificate templates supported for issuance by this CA.
  6. In Action menu, select New and Certificate Template to Issue menu. In the opened dialog, select target template and press Ok to finish. Ensure that certificate template is listed in Certification Authority MMC console.

Request and Issue Initial Certificate

Now with all the pre-reqs in place. All one has to do is log into the domain joined machine and request a certificate. In our example since we picked Serve 2016 and recipient as Windows 10, the template is saved as a version 4 template.

*Note* Version 3 and 4 templates do not show up under the CA’s web enrollment option.

If everything was done correctly on the client side Certificate snap in for the machine you should be able to see the template listed:

Fill in a common name, and a couple DNS names fields to make browsers SAN requirements happy. Once filled the Enroll option should be available.

Testing and Validating

Well now that we got that, not sure how to test it getting renewed outside of the time going by…

I did discover this command by searching for an answer:

certutil –pulse

Well that’s doesn’t tell me much… wonder what the office MS source has to say…

Real mature Microsoft… This isn’t new either here’s a bit more deatiled answer from good ol TechNet (RIP).

“Certutil -pulse will initiate autoenrollment requests.

It is equivalent to doing the following in the CertMgr.msc console (in Vista and Windows 7)

Right-click Certificates , point to All Tasks , click Automatically Enroll and Retrieve Certificates .

The command does require that

– any autoenrollment GPO settings have already been applied to the target user or computer

– a certificate template enables Read, Enroll and Autoenroll permissions for the user or a global or universal group containing the user

– The group membership is recognized in the users Token (they have logged on after the membership was added”

This action is available only when you right click the very top “Certificates” node, not the sub folders node under the Personal folder.

So again I wasn’t sure how to validate it will work when time comes, as running the above action in certmgr simply only gave me the option to enroll in the computer certificate template all the other templates were marked as “unavailable” even though I manually enrolled the cert above without issue. Which made me wonder if there’s a difference between auto renewal of a certificate and auto enrollment.

I found this post from a “field  Engineer”  which seemed to conclude that they are tied together in some form.

“The Autoenrollment Group Policy has to be enabled for this feature to work. This feature will also work on certificates issued prior to enabling it.”

However no other details. From what I can tell.. The command certutil -pulse triggers the following Scheduled Task:

Microsoft\Windows\CertificateServicesClient\SystemTask

Which AFAIK will only trigger certificate issuance on certs destined to expire, how close to expiry? I’m not sure, there was the option in the template to log @ 10% remaining. I’m not sure that’s the threshold it uses to trigger a certificate renewal.

I’m not sure if there’s a specific parameter you can set to tell it to renew a certificate before this expiry time.

If you know please leave a comment.

Final Note… Ensure you enable the auto rebind feature introduced in IIS 8.5 and later. I’ve had this bite me.

Renew Subordinate CA Certificate to Offline Root

Setup

If you follow other posts on renewing a sub-ca certificate, they usually have two tings to make their lives easier:

  1. A server with a GUI
  2. an Online Enterprise Root CA

We have none of those. We have:

  1. an Offline Root CA (at least it has a GUI)
  2. A Server Core Sub CA

Like many times in the past, MMC remote snap-in pointing to a remote core server is lacking the context menu or ability to do what you need.

Steps

For example this poor guy who posted in Windows QA.

Step 1) Log Into the Server Core Sub Sub CA.

RDP, direct console, whatever floats your boat on this one.

Step 2) Run the following command:

Certutil -renewCert ReuseKeys

Now you get a pop up, asking you to select an Online CA server to sign the Cert. In small writing on the pop up it says you can click cancel and manually sign the certificate saved under c:\ path.

Step 3) Copy to Request File to Offline CA

Now save the request file, and copy it onto your Offline Root CA. How you accomplish this in on your and your setup. If it’s all virtualized, do the vUSB trick I often do. If you have RDP access to the Sub CA, use this RDP resource and notepad trick.

Step 4) Issue Certificate on Offline CA
– Open Certificate Authority Tool.
– Right Click Server Node -> All Tasks -> Submit New Request -> Select the request file created in Step 2
– Click on Pending Requests Folder -> Right Click Certificate -> Issue
– Go back to Issued Certificates Folder -> Double Click new Certificate -> Details Tab -> Copy to File -> Save it

Step 5) Copy Issued Certificate back to Sub CA

Whatever means you did for Step3, do it in reverse.

Step 6) Install the new Certificate on the Sub CA

certutil -installcert <path to signed certificate>

OK, Stop the Service:

sc stop CertSvc

Then Start it back up:

sc start CertSvc

Then from a remote management machine with the Cert Authority MMC Snap-in added, check the properties on the Sub-CA. You should see the new certificate listed.

Hope this Helps someone.

Filter n Find Contextually from CMD output

c:\command > c:\txtfile.txt
c:\powershell
<PS>c:\Get-Content c:\txtfile.txt | Select-String -Pattern <String your interested in finding> -Context 2,4

Context 2,4 means 2 lines above, and 4 lines below the string pattern found.

Super useful trick.

Veeam Backup Failed – SSL/TLS handshake failed

Another day, another issue.

Processing VirtualMachineName Error: Cannot get service content.
Soap fault. SSL_ERROR_SYSCALL
Error observed by underlying SSL/TLS BIO: Unknown errorDetail: 'SSL/TLS handshake failed', endpoint: 'https://vcenter.domain.localca:443/sdk'
SOAP connection is not available. Connection ID: [vcenter.domain.local].
Failed to create NFC download stream. NFC path: [nfc://conn:vcenter.domain.local,nfchost:host-#,stg:datastore-#@VirtualMachineName/VirtualMachineName.vmx].
--tr:Unable to open source file

If you come across this error, check if you have any firewalls between your Veeam proxy Server, and the vCenter server.

I’ve blogged about this type of problem before, but in that case it was DNS, in this case it’s a Firewall.

In most cases it’s either:

1) PEBKAC
2) DNS
3) Firewall <— This Case
4) A/V
5) a Bug

You may have noticed a lack in posts lately. It’s not that I can’t figure out content to share, it’s a lack pf motivation.  I’ve been burnt out with work from the pandemic when everyone got a bunch of free money and time off… I just got more work, did I get more pay? I’ll let you decide. The amount of support calls, sheesh. That’s my only real motivation — is not to be hassled. That and the fear of losing my job, but y’know, it will only make someone work just hard enough not to get fired.

This site has earned me $0, so that also doesn’t help. Thanks everyone for all the support keeping this site alive.

Send an Email using Powershell

Source: Send-MailMessage (Microsoft.PowerShell.Utility) – PowerShell | Microsoft Docs

Build your object….

$mailParams = @{
SmtpServer = 'heimdall.dgcm.ca'
Port = 25
UseSSL = $false
From = 'notifications@dgcm.ca'
To = 'nos_rulz@msn.com'
Subject = ('ON-PREM SMTP Relay - ' + (Get-Date -Format g))
Body = 'This is a test email using ON-PREM SMTP Relay'
DeliveryNotificationOption = 'OnFailure', 'OnSuccess'
}

And then send it….

Send-MailMessage $mailParams

if there’s any pre-reqs required I’ll update this blog post. That should be it though. Easy Peasy Lemon Squeezy.

Fix Orphaned Datastore in vCenter

Story

The Precursor

I did NOT want to write this blog post. This post comes from the fact that VMware is not perfect and I’m here to air some dirty laundry…. Let’s get started.

UPDATE* Read on if you want to get into the nitty gritty, otherwise go to the Summary section, for me rebooting the VCSA resolved the issue.

The Intro

OK, I’ll keep this short. 1 vCenter, 2 hosts, 1 cluster. 1 host started to act “weird”; Random power off,   Boots normal but USB controller not working.

Now this was annoying … A LOT, so I decided I would install ESXi on the local RAID array instead of USB.

Step 1) Make a backup of the ESXi config.

Step 2) Re-install ESXi. When I went to re-install ESXi it stated all data in the exiting datastore would be deleted. Whoops lets move all data first.

Step 2a) I removed all data from the datastore

Step 2b) Delete the Datastore, , and THIS IS THE STEP THAT CAUSED ME ALL FUTURE GRIEF IN THIS BLOG POST! DO NOT FORGET TO DO THIS STEP! IF YOU DO YOU WILL HAVE TO DO EVERYTHING ELSE THIS POST IS TALKING ABOUT!

Unmount, and delete the datastore. YOU HAVE BEEN WARNED!

*during my testing I found this was not always the case. I was however able to replicate the issue in my lab after a couple of attempts.

Step 3) Re-install ESXi

Step 4) Reload saved Config file, and all is done.

This is when my heart sunk.

The Assumptions

I had the following wrong assumptions during this terrible mistake:

  1. Datastore names are saved in the backup config.
    INCORRECT – Datastore names are literally volume labels and stay with the volume in which they were created on.
    UUID is stored on the device FS SuperBlock.
  2. Removing an orphaned Object in vCenter would be easy.
  3. Renaming a Datastore would be easy.
    1. If the host is managed by vCenter Server, you cannot rename the datastore by directly accessing the host from the VMware Host Client. You must rename the datastore from vCenter Server.
  4. Installing on USB drive defaults all install mount points on the USB drive.
    INCORRECT – There’s magic involved.

Every one of these assumptions burnt me hard.

The Problem

So it wasn’t until I clicked on the datastore section of vCenter when my heart sunk. The old datastore was listed attempting to right click and delete the orphaned datastore shot me with another surprise…. the options were greyed out, I went to google to see if I was alone. It turns out I was not alone, but the blog source I found also did not seem very promising… How to easily kill a zombie datastore in your VMware vSphere lab | (tinkertry.com)

Now this blog post title is very misleading, one can say the solution he did was “easy” but guess what … it’s not support by VMware. As he even states “Warning: this is a bit of a hack, suited for labs only”. Alright so this is no good so far.

There was one other notable source. This one mentioned looking out for related objects that might still be linked to the Datastore, in this case there was none. It was purely orphaned.

Talking to other in #VMware on libera chat, told me it might be possibly linked to a scratch location which is probably the reason for the option being greyed out, while this might be a reasonable case for a host, for vCenter in which the scratch location is dependent on a host itself, not vCenter, it should have the ability to clear the datastore, as the ESXi host itself will determine where the scratch location is stored (foreshadowing, this causes me more grief).

In my situation, unlike tinkertry’s situation, I knew exactly what caused the problem, I did not rename the datastore accordingly. Since the datastore name was not named appropriately after being re-created, it was mounted and shown as a new datastore.

The Plan

It’s one thing to fuck up, it another to fess up, and it’s yet another to have a plan. If you can fix your mistake, it’s prime evidence of learning and growing as you live life. One must always perceiver. Here’s my plan.

Since building the host new and restoring the config with a wrong datastore, I figured I’d I did the same but with the proper datastore in place, I should be able to remove it by bringing it back up.

I had a couple issues to overcome. First one was my 3rd assumption: That renaming a datastore was easy. Which, usually, it is, however… in this case attempting to rename it the same as the missing datastore simply told me the datastore already exists. Sooo poop, you can’t do it directly from a ESXi host unless it’s not managed by vCenter. So as you can tell a catch22, the only way to get past this was to do my plan, which was the same as how I got in this mess to being with. But sadly I didn’t know how bad a hole I had created.

So after installing brand new on another USB stick, I went to create the new datastore with the old name, overwriting the partition table ESXi install created… and you guessed it. Failed to create VMFS datastore. Cannot change host configuration. – Zewwy’s Info Tech Talks

Obviously I had gone through this before, but this time was different. it turned out attempting to clear the GPT partition table and replace it with msdos(MBR) based one failed telling me it was a read-only disk. Huh?

Googling this, I found this thread which seemed to be the root cause… Yeap my 4th assumption: “Installing on USB drive defaults all install mount points on the USB drive.”

so doing a “ls -l”, and “esxcli storage filesystem list” then “vmkfstools -P MOUNTPOINT” I was veriy esay to discover that the scratch and coredump were pointing to the local RAID logical volume I created which overwrote the initial datastore when ESXi was installed. Talk about a major annoyance, like I get why it did what it did, but in this case it is  major hindrance as I can’t clear the logical disk partition to create a new one which will be hold the datastore I need to have mounted there… mhmmm

So I kept trying to change the core dump location and the scratch location and the host on reboot kept picking the old location which was on the local RAID logical volume that kept preventing me from moving forward. Regardless if I did it via the GUI or if I did it via the backend cmd “vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /tmp/scratch” even though VMware KB mentions to create this path path first with mkdir what I found was the creation of this path was not persistent, and it would seem that since it doesn’t exist at boot ESXi changes it via it’s usual “Magic”:

“ESXi selects one of these scratch locations during startup in this order of preference:
The location configured in the /etc/vmware/locker.conf configuration file, set by the ScratchConfig.ConfiguredScratchLocation configuration option, as in this article.
A Fat16 filesystem of at least 4 GB on the Local Boot device.
A Fat16 filesystem of at least 4 GB on a Local device.
A VMFS datastore on a Local device, in a .locker/ directory.
A ramdisk at /tmp/scratch/.”

So in this case, I found this post around a similar issue, and turns out setting the scratch location to just /tmp, worked.

When I attempted to wipe the drive partitions I was again greeted by read-only, however this time it was right back to the coredump location issues, which I verified by running:

esxcli system coredump partition get

which showed me the drive, so I used the unmounted final partition of the USB stick in it’s place:

esxcli system coredump partition set -p USBDriveNAA:PartNum

Which sure enough worked, and I was able to set the logical drive to have a msdos based partition, yay I can finally re-create the datastore and restore the config!

So when the OP in that one VMware thread post said congrats you found 50% of the problem I guess he was right it goes like this.

  1. Scratch
  2. Coredump

Fix these and you can reuse the logical drive for a datastore. Let’s re-create that datastore…

This is hen my heart sunk yet again…

So I created the datastore successfully however… I had to learn about those peskey UUID’s…

The UUID is comprised of four components. Lets understand this by taking example of one of the vmfs volume’s UUID : 591ac3ec-cc6af9a9-47c5-0050560346b9

System Time (591ac3ec)
CPU Timestamp (cc6af9a9)
Random Number (47c5)
MAC Address – Management Port uplink of the host used to re-signature or create the datastore (0050560346b9)

FFS… I can never be able to reproduce that… and sure enough thats why my UUIDs not longer aligned:

I figured maybe I could make the file, and create a custom symlink to that new file with the same name, but nope “operation not permitted”:

Fuck! well now I don’t know if i can fix this, or if restoring the config with the same datastore name but different UUID will fix it or make things worse…. fuck me man…. not sure I want to try this… might have to do this on my home lab first…

Alright I finally was able to reproduce the problem in my home lab!

Let’s see if my idea above will work…

Step 1) Make config Backup of ESXi host. (should have one before mess up but will use current)

Step 2) Reload host to factory defaults.

Step 3) rename datastore

Step 4) reload config

poop… I was afraid of that…

ok i even tried, disconnecting host from vcenter after deleting the datstore  I could, recreate with same name and it always attaches with appending (1) cause the datastore exists as far as vCenter thinks, since the UUID can never be recovered… I heard a vCenter reboot may help let’s see…

But first I want to go down a rabbit hole…. the Datastore UUID, in this case the ACTUAL datastore UUID, not the one listed in a VM’s config file (.vmx), not the one listed in the vCenter DB’s (that we are trying to fix), but the one actually associated with the Datastore… after much searching…  it seems it is saved in the File Systems “SuperBlock“, in most other File Systems there’s some command to edit the UUID if you really need to. However, for VMFS all I could find was re-signaturing for cloned volumes

So it would seem if I simply would have saved the first 4MB of the logical disk, or partition, not 100% sure which at this time, but I could have in theory done a DD to replace it and recovered the original UUID and then connect the host back to vCenter.

I guess I’ll try a reboot here see what happens….

Well look at that.. it worked…

Summary

  1. Try a reboot
  2. If reboot does not Fix it call VMware Support.
  3. If you don’t have support, You can try to much the with backend DB (do so at your own risk).