ESXi 6.5 on Proliant Gen9 Hardware Status Unknown

I’ll keep this post short.

If you have a Proliant Gen9 server and running ESXi 6.5 u2 along with VSCA 6.5u2.

You will get all hosts not displaying any hardware status. This should fixed immediately as you don’t get alerts on any hardware faults via IPMI. This includes status from hosts running ESXi 5.5 or 6.5.

The first fix is to upgrade the VCSA to 6.5u3.

After upgrading the VCSA to 6.5u3… Hardware status will come back for each host.. however.. if you are running ESXi 6.5u2 on the Gen9 servers you’ll something like this:

as you can see some sensors are a lil wonky…

The fix here is to upgrade the host to 6.5u3 via the HPE build.

After the hosts and the VCSA are on 6.5u3 all is good and hardware faults will again will trigger critical alarms on vSphere.

Migrating Users and Passwords
Same NetBIOS Name

Story

*NOTE* All information here is provided as is for educational purposes only, whatever you do in your own environment is on you.

I want to give a bit of a background here, the goal was to migrate users to a new child domain, that was previously a 2 way forest wide trust. In this case scenario, there was no need for a 2 way trust as both domains were owned and operated by the same corporation. A simplified AD structure was conceived, new workflow servers, new clean permissions throughout.

Everything was coming along swimming until, after some extensive research, how to migrate users and their passwords without service interruptions…

With Windows Authentication in the back end there was only one choice, ADMT v3.2 and it’s associated documentation … yeah that’s not a content based website, it’s a download to a doc, that was last updated 2014 with support with Windows Server 2012 R2.

It states the following:

Any DB works (I used SQL 2016 Express)
Other things, read it if you wish to be over/under whelmed
Needs a Trust

Simple Overview

Here’s a simple of an idea of how the Forests are laid out, and you can see where the users are planned to be migrated… just one major problem… You can’t create a trust between forests where the NetBIOS match. And Yes, that is a thread from 2011 unanswered, which I will answer for you tonight. Which you can see from my design, is exactly the problem I was facing. I was initially hoping this could be done without a trust which turned out was the answer which lead me to the answer..

You can do this via trust but not how you might normally think about it.

Build out a new dc in your source domain and allow it to replicate properly, be sure that it is a dc/gc and dns server.  Disconnect this dc from the current domain and expect to NEVER connect to this domain again.

Do a metadata cleanup of this dc….

Along with this, How to rename the NetBIOS name, now mix everything into one huge sch-melting pot and what do we get…. this blog post.

Setup

ADMT

*Note don’t bother installing SQL + ADMT until the member server is domain joined to the target domain. In this example Windows Server 2016 is used to host ADMT services. So this server is domain joined to Special.NewDomain.com

First Download ADMT from here

Based on the size of the installer I’m assuming this is an online installer and instead of dicking around trying to find an offline version.. simply connect this server to the internet. However before we begin ADMT requires an SQL instance to utilize, to keep life easy we will install SQL Express on this server and run it locally for the migration.

Installing SQL Express 2016

Now looking at where to find specific version to download and use… I wasn’t sure which version was best
I decided to start with Express Core …
Next, next, next, Mixed Auth (just in case) sa password.

Installing ADMT

Now with SQL Express on the target domain joined server, double click the ADMT v3.2 installer exe.

Accept the EULA. Accept/Decline the CEIP

Use local SQL and….

DB Import… NO, next

Now we should be good to run ADMT (as a Domain admin on Special.NewDomain.com), IF you installed SQL + ADMT before joining to the target domain and did not choose to use mixed auth and have no sa account, follow my previous blog post to recover access to the SQL Express instance, granting your domain admin account access.

Preparing the Source Domain

Now how you choose to accomplish this is entirely up to you. If you wish to go the route suggested by the TechNet Post to create a new DC thats a GC and rip it out of the Forest/Domain via a MetaData Cleanup… be my guest but that’s a lot of work.

Instead I choose to simply create a secondary version of the Special.local DC via a backup, but I could have easily made a clone since it’s all virtualized. So for me it started out like this…

Clearly at this point a trust still can’t be established as NetBIOS names are still the same, however now we have no fear of mucking up the source domain as it’s simply a clone will all users and their passwords still encrypted within AD. So this migration will require 2 things:

1 – A domain rename

2- Password Export Server (Covered later in this post)

Renaming the Source Domain

Changing the IP Address

Since this is a clone, and I was not interested in alternative firewalls outside the windows firewalls I connected the source domain to the same subnet as the target domain to ease life. This requires the IP address to change.

So open Network and Sharing Center and edit the adapter settings accordingly, this will however break the DC’s DNS service. So lets fix that.

Deleting the DC A host Record

On the DC open DNS snap-in, and navigate to the top where the SOA and NS records are, and below that search for the A host record for the DC itself, and delete that record (remember this should be a clone or copy of a DC from the original source DC so no risk should be had here).

Reset DNS settings

ipconfig /flushdns
net stop dns
net stop netlogon
net start dns
net start netlogon

Create a new DNS zone

Open DNS snap-in again on cloned source DC, and create a new DNS Zone for the new domain name.

Complete the wizard.

Configure Domain to Accept new DNS Suffix

– Open ADSI Edit
– Right Click ADSI Edit -> Connect to…
– Leave defaults -> ok
– Expand “Default Naming Context”
– Right Click Domain Parent Object -> Properties

– Enter the new domain name into the msDS-AllowedDNSSuffixes

Enable update DNS Suffix option

Server will reboot after this step, again since it is a clone and not actually hosting AD services for any production need this is no problem, right?

Get the required XML to edit:

Everything I was reading online stated that you need to do all this from a member server, and you need to copy rendom and another application from the System32 from a DC, etc, etc, all a bunch of rubbish… everytime I attempted to follow such guides the rendom command would spit out some lines that seems was supposed to be parsed by something else to provide useful return output. People stated running from System32 directly fixed that issue for them, but not for me. Instead I decided to run all the commands directly from the DC since it was the lowest risk for me. as stated serveral times why above. Sooo…..

– Open CMD as admin on DC
– Run “rendom /list”

Edit the XML file (open via elevated cmd prompt)

save and check by running “rendom /showforest”

It should report the changes you made to the XML file.

Upload the XML to the DC

Prepare DC

Execute Domain Rename

Let DC reboot and then complete the rename.

End the Domain Rename process

So now the setup should look like this…

Now as you can see we no longer have the same NetBIOS name and thus we can create a trust here to migrate users using ADMT and PES yay!

The Trust

Conditional Forwarders

For the trust to work each domain must be reachable by the other domains DC via FQDN. This obviously requires conditional forwarders to be configured for each DC accordingly.

So opening a MMC.exe application, from a member system with RSAT installed, or directly on Special.NewDomain.com if it has the desktop experience. Then Add the DNS snap-in. Add a conditional forwarder, in my case I added NotSpecial.com pointing to the IP address of the cloned and renamed DC.

Then doing the same thing on NotSpecial.com DC, opening the DNS application (or remotely with RSAT), and creating a conditional forwarder that says special.newdomain.com pointed to the IP address of the actual child DC, in the same subnet as in the diagram.

At this point ensure that Target DC (Special.NewDoamin.com) can ping NotSpecial.com, and that Source DC (Special.com) can ping Special.NewDomain.com. If yes, we can now go ahead and build the trust.

Building the Trust

Open domain and trusts. Right click domain and properties:

Click the Trusts Tab, New Trust:

Complete the wizard for both sides of the trust. I had a domain admin account in each source and target domain.

With admin account on each domain and already logged in as domain admin on the NotSpecial domain, wizard completes successfully:

Now with a trust in place, we could start just migrating users, but we need those passwords migrated as well, else we will have a bunch of angry users.

ADMT & PES

Permissions

Nest Special\DomainAdmin into NotSpecial\BuiltIn\Administrators group, as well as into Special\BuiltIn\Administrators group. You might be wondering why? Well I hit this error when attempting to migrate users passwords:

After reading this, I made the changes above and it finally got past this error when attempting to migrate users passwords.

Logged on to ADMT as Special\DomainAdmin

Password Export Server Setup

Step 1) Create the encryption key for the migration:

admt key /option:create /sourcedomain:notspecial.com /keyfile:"C:\path\to\file.pes" /keypassword:*

Step 2) Copy the Key to the NotSpecial.com DC (I used RDP)

Step 3) Grab PES installer from here, and get it on NotSpecial.com DC

You should now have this on NotSpecial.com DC:

Step 4) Install PES running the MSI from an elevated cmd prompt:
If you’re wondering why, I was about to smash a monitor when the installer kept telling me the password was wrong for the encryption file, when I knew for certain I wasn’t putting it in wrong, and someone else blogged about it.

I used a installed using local system account cause again this DC will be shutdown after the migration.

Step 5) Complete the Install and after reboot start the service

ADMT and Migrating Users

At this point we should be officially ready to migrate users, on ADMT open ADMT:

Right click the folder and select the user migration wizard
Populate the domain names and tree source domain controllers should pick up automatically.

Select your users, Pick a target OU, then select to migrate password:

Given you followed the permissions section, this should work:

Keep target state same as source and don’t copy SID as we have no intention of using SID filtering.

These settings worked great for me, change based on your needs.

again these settings worked for me.

After the process…

It worked! However i was amazed even in my first test run, there was one noticeable message in the log:

Rename UPN name user@NewDomain.com to user@NewDomain.com. Cannot create accounts with the same UPN name as another UPN in the enterprise.

Well cause there already exists a user with that UPN at the parent, but why is it picking the parent for setting the UPN? Who knows… but much like that reference you can bulk select users in ADUAC Snap-in, and select the child domain from the drop down text-box.

Summary

  1. Create a Copy of the Source Domain Controller
  2. Rename its Domain
  3. Connect to target domain subnet
  4. create conditional forwarders
  5. create two way trust
  6. Setup ADMT
  7. Setup PES
  8. Migrate Users
  9. Remove Trust, and Shutdown NotSpecial.com DC
  10. Happy Dance

Hope everyone enjoyed this post, and hopefully someone finds it useful.

Resetting Access to SQLEXPRESS 2016

So today I wanted to rerun a task of using ADMT on a server I configured, I was now connecting to this server via a different domain account then when I had first configured the server in my first test run. Now for the purpose of this particular DB and server purpose I could easily rebuild… but what if… you’re in a situation in which the data and the access to it is much more important.

Also… I was lazy… so I researched.. after a little while (my dev got involved too) it was simply a mission… a purpose a GOAL to figure it out… in the end it was really easy it was just SYNTAX, oh gawd the important of syntax.

Now the SQL team at Microsoft does a lot of wonky things and doesn’t follow standards that most other divisions follow, so it hats off to the walls, if the SQL guys are on mushrooms today better expect some funky changes without notice or documentation. Wait… what… anyway…

My research began.. . stackoverflow.… to… a random blog, archived on the wayback machine ahhh I feel that’d what will become of my blog….

Let me paraphrase everything:

  1. If you are a local admin you can fix this
  2. SYNTAX

My main issue was even though I kept starting the service with the sqlservr -m as the blog posted this resulted in an error, it turns out you have to specify the instance name… so

  1. change DIR to “C:\Program Files\Microsoft SQL Server\MSSQL13.SQLEXPRESS\MSSQL\Binn”
  2. run “sqlservr -s SQLEXPRESS -m”
    *Note* If you are on said mushroom you can skip the space and run “sqlservr -sSQLEXPRESS -m” and it will work too
    At this point there are two lines to watch out for:
    A) SQL service is running in single user mode (-m)
    B) SQL service is running under admin account that is logged in
  3.  open another cmd prompt as admin and run “sqlcmd -S COMPUTERNAME\INSTANCENAME -E”
    E.G. sqlcmd -S SQLSERVER\SQLEXPRESS -E
    *NOTE* the computername is super important else you will get a logon failure at the other console screen showing the compueraccount instead of the user account for some reason.

Because of this error I kept trying other things, my dev tried a couple things and thought one of them was enabling TCP/IP stack, but I said this was all local commands and connections with sqlcmd since this SQLEXPRESS doesn’t come bundled with SSMS and there was no member server or client system with SSMS available to connect or use, as to the point of resetting access locally.

once you get the 1> prompt you can just follow the other guides which is:

1> CREATE LOGIN [DOMAIN\USER] FROM WINDOWS
2>go
1>exec sp_addsrvrolemember @loginame=’DOMAIN\USER’,@rolename=’sysadmin’
2>go
1>exit

Yes that is loginame and not loginname, cause mushrooms.

A Productive Nightmare

The Story

Lack of Space

It all begins with a new infrastructure design, it’s brilliant. All the technical stuff a side, the system is built and ready for use, one problem the new datastore is slightly overused (many plans for service migrations and old bloated servers to be removed but have not yet been completed). I had one datastore that was used for a test environment, with the whole test environment down and removed this datastore would be perfect temp location till the appropriate datastore could be acquired.

The Next Day

I was chatting with our in house developer when a user walks in asking why they couldn’t complete a task on the system, figuring a work flow server issue simply rebooting it often fixed any issues with it, however this time I also received an email from the DBA stating reports of a DB issue due to bad blocks on the storage level.

At this point my heart sank, I quickly logged into the storage unit and was shocked to not see any notification of issues, deciding right then and there to move to back to reliable storage I made the svMotion, while it was in progress the storage unit I was logged into finally showed errors of disk failure, one disk had failed while the other had become degraded (In a RAID 1+0 this can be bad news bears) after the svMotion completed there was still a corrupted DB (we all have backups right?) lucky it was just a configuration DB for the workflow server and not any actual data, so I provided the DBA with a backup of the database files, didn’t take long and everything was back to green.

That Weekend

I decided to play catch up on the weekend due to the disruptive nature of the disk failure that week, to my dismay and only by chance the new host in the new cluster was showing disconnected from vCenter…   What the…

Since I wasn’t sure what was going on here at first I chatted with the usual’s on IRC, I was informed instantly “RAMdisk is full”. After some lengthy recovery work (shutting down VMs and manually migrating them to an active host in vCenter) I discovered it was cause the ESXi host did lose connectivity to its OS storage (in this case was installed on an SD card)

So I updated the firmware on the host server. This so far (after a couple weeks now) has resolved this issue.

Then while I was working on the above host lost of connectivity, the other host lost connection to vCenter! However this one had much different signs and symptoms, after doing the exact same process of moving VMs off this host, it was determined by VMware support that it was “possibly” due to the loss of the one datastore. Remember the datastore I discussed above, although I had moved any VM usage of it from the hosts I did not remove it as an active datastore, so although the storage unit was accessible while the disks had failed, for some reason the whole storage unit had failed (UI was now unresponsive). So I had to remove this datastore and all associated paths. After all this everything was again green for this cluster.

So much for that weekend…

That Storage Unit

Yeah alright so that storage unit… it was a custom built FreeNAS box that was spliced together from a HP DL385p Gen8 server. I got this thing for dirt cheap and was working as a datastore perfectly fine before the disc failure so I don’t blame the hardware or even FreeNAS or all the crap that happened. It was just a perfect storm.

So I decided to try something different with this unit first… since I had been using an LSI 9211-8i flashed in IT mode (JBOD) for the SAS expanders in the front (25 disk sff). I decided I would try to build my first hyper-converged setup. That meant creating a FreeNAS VM, hardware passthough the storage controller (LSI 9211-8i) and then created datastores using the discs in the front.

Sooo

The Paradox

The first issue I had was the fact you need a datastore to host the FreeNAS VMs config and hard drive files… but if we are going to do hardware pass-through of the entire SAS exapnders via the LSI card, that means it’s not accessible or usable for the host OS. Uggghhhh, now we could use NFS or iSCSI but the goal for me was to have a full self contained system not relying on another host system, now I can easily install ESXi on a USB or SD card, but it won’t allow me to use these as datastores. At least not on there own…

Come here USB datastore… I mostly followed this blog post on it by Virten however I personally love this old one by non other than my favorite VMware blogger William Lam of VirtuallyGhetto.com

*My Findings* Much like the comments on here and many other blog and form posts about doing this is I could not get this to work on 5.1 or 5.5 those builds are too finiky and I’d always get the same error about no logical partition defined or something, yet worked perfectly fine in 6.5 or 6.7 (I personally don’t use 6.0)

OK, so I decided to use ESXi 6.7, installed on a SD card, and setup a 8 gig USB based Datastore. Next Issue is you have to reserve the memory else you’d be limited to even less than 4 gigs as ESXi will complain there is not enough from on the datastore for the swap file. Not a big deal here as we have plenty of RAM to use (100 Gigs HP genuine ECC memory).

I did manage to get FreeNAS installed on said datastore and as you’d expect it was slowwwwww. My mind started to run wild and though about RAMDisc and if it was possible to use that as a datastore… in theory.. it is! William is still around! 😀

Couple notes on this

1) you need a actual Datastore as it seems like ESXI just creates system links to the PMem Datastore. (I noticed this by attempting to ssh into the host and simple copy the VM’s files over, it failed stating out of space, even though there was enough defined for the PMem Datastore).

2) You create the VM and defined the HDD to be on PMem Datastore and will warn you of non persistence.

Sure enough I created a FreeNAS VM on the PMem and it was fast install, but as soon as the host needed a reboot, attempt to power on that VM and it says the HDD is gonezo. So this was cool, but without persistence it sort of sucks.

Anyway I didn’t need the FreeNAS OS to have fast I/O anyway, so stuck with the USB based datastore. Then I went to pass-through the controller, now enabling pass-though on the controller worked fine, but the VM wouldn’t start.

Checking the logs and googling revealed only ONE finding! 

No matter what I tried the LSI card or the built in HBA same error as the post above:
“WARNING: AMDIOMMU: 309: Mapping for iopn 0x100 to mpn 0x134bb00 on domain 1 with attr 0x3 failed; iopn is already mapped to mpn 0x100 with attr 0x1
WARNING: VMKPCIPassthru: 4054: Failed to setup IOMMU mapping for 1 pages starting at BPN 0x100000100”

Yay, another idea gone to shit and time wasted, I learned some things but I wanted to learn something and bring some use back to this system… ugh fine! I’ll just put it back to normal connecting the SAS expanders to the P420i HBA and use the 2GB battery backed cache to define a speedy datastore and just keep it simple…

The Terrible HBA

 I don’t wish this HBA on anyone seriously, so after I put it all back to normal, the first thing I find is:

  1. When I booted the server and let the system post, when it got up to the storage controller part (Past the bottom indication to press F9 for setup, F10 for Smart Provision, and F11 for Boot Menu) it will list the storage controller and it’s running firmware in this case v8.00.
    Half the time if I pressed F5, if there was no previous error codes and no disks or logical units defined I someones got into the ACU (Array Configuration Utility) the other half the fans would kick up to 100% and stay there while the ACU booted (showing nothing but an HP logo and a slow progress bar) and when ACU finally did load I’d be presented with “No Storage Controller found”
    (Trust me I got a 40 min video of me yelling at the server for being stupid haha)
  2. This issue would become 100% apparent as soon as I plugged in a drive with a logical unit defined from another (updated) version of Smart Array.
    To get around this issue I ended up grabbing the “latest” HP SSA (Smart Storage Administrator) tool from, HPs site. Now I quote latest due to the fact is it’s from 2013… No this allowed me to finally build some arrays for me to use with the planned ESXi build.

I noticed that at first I wasn’t seeing the new logical drive I defined in the HP SSA in ESXi itself, I totally forgot to grab HPE custom build as it includes all required drivers for these pieces of hardware.

First thing i notice after grabbing HPE’s custom ESXi build… in this case 6.7 (requires VMware login)  is that the keyboard is buggering out on me when attempting to configure the management NIC.

At first I thought maybe the USB stick was crapping out due to the many OS installs I’ve been doing on it. So I decide to move to using the logical array I built, the custom installer does see the new array and away I go, still buggy, so I thought maybe it’s the storage controller firmware? Looking up the firmware for P420i or equivalent appears there are numerous post of issues and firmware updates.. turns out there’s even a 8.32(c) Nov 2017 update, since I was too lazy to build a custom offline installer for this firmware flash I used an install of Windows Server 2016 and ran the live updater, to my amazement it worked flawless… yet also to my amazement Windows worked perfectly fine on the same logical array regardless of the firmware it was running (Is this a VMware issue…??)

So after re-installing the custom ESXi 6.7 from HPE, the host was still being buggy… and now started to PSOD (Purple Screen of Death)… are you kidding me, after everything that’s already happened… ughhhhh…

Googling this I found either

A) Old posts of Vendor finger Pointing (Around ESXi 3-4)

B) Newer Posts (ESXi 6.7~) this lead me to the only guy who claimed to have fixed his PSOD and how he did it here

Which I found I was not having the same errors showing which lead me to my first link due to the logs. Having updated all the firmware, and running HPEs builds I could only think to try the ESXi 6.5-U2 build as the firmware was supposedly supported for that build.

Now running ESXi 6.5-U2 without any issues, and no PSOD! Unfortunately without warranty on this hardware I have no way to get HP to investigate this newer 6.7 build to run on this particular hardware.

Icing on the Cake

Alright so now I should finally be good to go to use this hypervisor for testing purposes right? Well I had a bunch of spare discs and slots to create a separate datastore for more VMs yay…

Until I went to boot that latest HP SSA offline I listed above that fixed the fan speed and no controller found for the ACU, well now this latest HP SSA was getting stuck at a  white screen! AHHHHHHHHHHHHHHHHH how do I create of manage the logical unit and build arrays if the offline software is stuck, well i could have installed and learned how to use the hpssacli and their associated commands but since I was already kind of stressed and bummed out at this point installed Windows Server 2016 and ran the HP SSA for that which looks exactly like the offline version.

Finally created all my arrays, installed the only stable version of ESXi with associated drivers, have all my datastores on the host showing green, created a dedicated restore proxy and am finally getting some use back from this thing….

Conclusion

What… a …. freaking… NIGHTMARE!

 

A certificate chain could not be built to a trusted root authority

Today I tried installing .NET 4.6.2 onto an offline WIndows 7 to setup playnite.

Every since I built my PiCade I’ve been huge into loaders, almost all of them seem to rely on RetroArch for an old game emulation which is fine by me. Check out those sites for their respective offerings. My PiCade uses Lakka which is just a different loader that I find is rather well compiled for ARM based devices such as the Pi. They do offer a x86-64 build however I found that since it’s a linux derivative you have to be pretty Linux swanky to do good with it, by that I mean supporting hardware can be a bit more difficult as you have to be able to get the drivers for certain hardware yourself, in my case using an old HP laptop with an ATI card inside it was CPU rendering everything so N64 and Dreamcase would CHUGGGG, even though the system is way more than capable.

Anyway… Playnite only has one dependency for Windows, thats .NET 4.6.2, so when I downloaded the offline installer I didn’t expect issues, until…

Error! A certificate chain could not be built to a trusted root authority.

Ughhh ok… Google… wtf does that mean? MS says

Grab their certificate

install using elevated cmd with…

certutil -addstore root X:\Where\you\saved\the\Cert.crt

Sure enough after this I was successfully able to install .NET 4.6.2

Fixing WindowsRE

To make a long story short, in my previous post I covered some issues I had dealing with MBR2GPT. In which case I fried the Recovery Partition, and thus ruining my Advanced Startup abilities.

So here’s what happens… As soon as you either A) move the Recovery partition (via GParted) or B) Delete it you’ll have a disabled WinRE.

Checking BCDedit will show Recovery: Yes, and a recovery sequence but agentc will state otherwise:

Heck you may as well even wipe the useless BCD settings at this point!

Sooo how do you fix it?

If you did A) and simply moved the Recovery Partition, the fix is pretty easy.

If you did B) then First you’ll need to shrink a bit of space on your primary disk that hosts the Windows OS files (or another whole different disk, doesn’t really matter to me mon) either way…

I did recall someone recreating the files that are within the actual recovery folder, but I sadly can’t find it now, also this is a good post thread on it

In my case I had another laptop with the same Windows version deployed that still had the recovery partition, so I simply used a linux live to DD it onto a USB drive, and then do the same Linux Live on the laptop with the deleted partition and created another partition with the exact same sector count, and dd the image back into place.

This however still doesn’t fix the WinRE, and simple doing “reagentc /enable” fails stating the WinRE location is null, which from the picture above remains the fact.

I was stuck on this for a while until I stumbled upon this Technet post (not my own… haha woah!)

After reading this, I followed along by mounting my Recovery Partition:

Cleaned my Reagent.xml From this…

to this…

Set the WinRE location with the /setreimage option on reagentc, and enabled that puppy for the win!

This was good enough for me! After this all the advanced recovery options were available again, so I could do things like MBR2GPT without using the /allowfullos switch. 😀

Sorry about the crappy pictures and no headers… I clearly was super lazy on this one.

My Delightful Challenges with MBR2GPT

The Story

I’m not gonna cover in this blog what MBR2GPT is… honestly there’s mroe than enough coverage on this. Instead mines gonna cover a little rabbit hole I went down.  Then how I managed to move on, this might even be a multi part series since I did end up removing my recovery partition in the whole mix up.

How did I get here?

In this case the main reason I got here was due to how I was migrating a particular machine, normally I wouldn’t do it this was but since the old machine was not being reused, I ended up DDing (making a direct coopy) of an SSD from old laptop to M2 SSD on new laptop. New laptop has more storage capacity and since the recovery partition was now smack dab in the center of the disk instead of at the end, I wasn’t able to simply extend the partition within Windows Disk Management Utility (diskmgmt).

Instead I opted to use GParted Live, to move the recovery partition to the end of the drive and extend the usual user data partition containing Windows and all that other fun stuff. Nothing really crazy or exciting here.

After the move and partition expansion ran Windows Check Disk to ensure everything was good (chkdsk /F) and sure enough everything was good. Since this newer laptop uses EUFI and all the fun secureboot jazz along with the Windows 10 OS that was on the SSD already, but the SSD partition scheme was MBR… whommp whomp, but of course I knew about MBR2GPT to make this final transition the easiest thing on earth!

Until….

Disk layout validation failed for disk 0

Ughhhh… ok well I’m sure there’s some simple validation requirements for this script…

  • The disk is currently using MBR
  • There is enough space not occupied by partitions to store the primary and secondary GPTs:
    • 16KB + 2 sectors at the front of the disk
    • 16KB + 1 sector at the end of the disk
  • There are at most 3 primary partitions in the MBR partition table
  • One of the partitions is set as active and is the system partition
  • The disk does not have any extended/logical partition
  • The BCD store on the system partition contains a default OS entry pointing to an OS partition
  • The volume IDs can be retrieved for each volume which has a drive letter assigned
  • All partitions on the disk are of MBR types recognized by Windows or has a mapping specified using the /map command-line option

Ughhh, again, ok that’s a check across the board… wonder what happened…

Well I quickly looked back at my open pages, but couldn’t find it (I must have closed it) but someone on a thread mentioned all they did was delete their recovery partition. I ended up doing this, but it still failed on me…

checking the logs (%windir%/setupact.log)

I had errors, such as partition too close to end, and I extended and shrunk the partition. After going through most random errors in the log I was stuck on one major error I could not find a good solution to…

Cannot find OS partition(s) for disk 0

so I started looking over all the helpful posts on the internet, like this, and this

The first one I came across for general help, and it was a nice post, but it actually didn’t help me solve the problem. The second one I actually hit cause it was literally dead on with my issue…

“After checking logs (%windir%/setuperr.log), it was clear it had a problem with my recovery boot option — it was going through all GUIDs in my BCD (Boot Configuration Data), and failed on the one assigned to recovery entry. This entry was disabled (yet still had GUID assigned to it, which apparently led nowhere and that seems to have been the root cause). I searched some discussion forums, and most of them said that I would have somehow create recovery partition.

Fortunately, it turned out to be a lot easier 🙂 Windows has another command line tool, called REAgentC, which can be used to manage its recovery environment. I ran reagentc /info, which showed that recovery is disabled and its location is not set, but it had assigned the same GUID that was failing when running MBR2GPT. So, I run reagentc /enable, which set recovery location and voila, this time MBR2GPT finished its job successfully.”

That’s cool and all but for me…

  1. I removed my recovery partition, so any GUID pointing to it is irrelevant and its gone
  2. reagentc was not showing as an available command, either on my main Windows installation, or the windows PE that comes with the 1809 installer ISO I was using.

After farting around for a while, finding out I wasted my time with ADK and other annoyance around Windows Deployment methods and solutions when all I wanted was this simple problem solved, I moved on to another solution, a lil more hands on…

If he fixed his with reagentc… and it was due to a problem in the BCD… let’s see what we can do with the BCD directly without third-party tools. Bring on the….. BCDedit!

 

Microsoft Windows [Version 10.0.17763.437]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\WINDOWS\system32>diskmgmt

C:\WINDOWS\system32>mbr2gpt /validate /allowfullos
MBR2GPT: Attempting to validate disk 0
MBR2GPT: Retrieving layout of disk
MBR2GPT: Validating layout, disk sector size is: 512 bytes
Cannot find OS partition(s) for disk 0

C:\WINDOWS\system32>c:\Windows\setuperr.log

C:\WINDOWS\system32>diskmgmt

C:\WINDOWS\system32>mbr2gpt /validate /allowfullos
MBR2GPT: Attempting to validate disk 0
MBR2GPT: Retrieving layout of disk
MBR2GPT: Validating layout, disk sector size is: 512 bytes
Cannot find OS partition(s) for disk 0

C:\WINDOWS\system32>bcdedit

Windows Boot Manager
--------------------
identifier              {bootmgr}
device                  partition=\Device\HarxxiskVolume1
description             Windows Boot Manager
locale                  en-US
inherit                 {globalsettings}
default                 {current}
resumeobject            {c982d23f-8xx8-11e8-b5xx-ed8385ce2xx2}
displayorder            {current}
toolsdisplayorder       {memdiag}
timeout                 30

Windows Boot Loader
-------------------
identifier              {current}
device                  partition=C:
path                    \WINDOWS\system32\winload.exe
description             Windows 10
locale                  en-US
inherit                 {bootloadersettings}
recoverysequence        {b09bfaxx-6xx4-11e9-98ce-ead29f9a0a02}
displaymessageoverride  Recovery
recoveryenabled         No
allowedinmemorysettings 0x15000075
osdevice                partition=C:
systemroot              \WINDOWS
resumeobject            {c982d23f-8xx8-11e8-b5xx-ed8385ce2xx2}
nx                      OptIn
bootmenupolicy          Standard

Even though I disabled the recovery partition via the “recoveryenabled” no setting the issue kept happening, I could only guess it was this setting, but how can I delete it… ohhhhhh

C:\WINDOWS\system32>bcdedit /deletevalue {current} recoverysequence
The operation completed successfully.

C:\WINDOWS\system32>bcdedit

Windows Boot Manager
--------------------
identifier              {bootmgr}
device                  partition=\Device\HarxxiskVolume1
description             Windows Boot Manager
locale                  en-US
inherit                 {globalsettings}
default                 {current}
resumeobject            {c982d23f-8xx8-11e8-b5xx-ed8385ce2xx2}
displayorder            {current}
toolsdisplayorder       {memdiag}
timeout                 30

Windows Boot Loader
-------------------
identifier              {current}
device                  partition=C:
path                    \WINDOWS\system32\winload.exe
description             Windows 10
locale                  en-US
inherit                 {bootloadersettings}
displaymessageoverride  Recovery
recoveryenabled         No
allowedinmemorysettings 0x15000075
osdevice                partition=C:
systemroot              \WINDOWS
resumeobject            {c982d23f-8xx8-11e8-b5xx-ed8385ce2xx2}
nx                      OptIn
bootmenupolicy          Standard

C:\WINDOWS\system32>mbr2gpt /validate /allowfullos
MBR2GPT: Attempting to validate disk 0
MBR2GPT: Retrieving layout of disk
MBR2GPT: Validating layout, disk sector size is: 512 bytes
MBR2GPT: Validation completed successfully

No way! it worked! Secure boot here I come!

I hope this blog posts helps someone else out there!

Summary

Check the logs: C:\Windows\setuperr.log

There might have been a way I could have moved forward without even have deleted my recovery partition, I just happening to know it was not needed for this system, so it was something that was worth a try. However if I probably had looked in the log before even having done that I may have avoided this rabbit hole.

But…. I learnt something, and that was cool.

Note don’t delete your recovery partition. Bringing it back is an extremely painful process without re-install. I will cover how to accomplish this in my next post. I was actually going to this yesterday, however other work has come up. I will however complete the blog post this week… I promise! (It’s like I’m talking to myself in these posts… like no one actually reads these… do they?)

A general system error occurred: Launch failure

Failure to Launch

Sound the Alarm! Sound the Alarm!

*ARRRRRRREEEEEEEEERRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
ARRRRRRREEEEEEEEERRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR

and one heck of a day sweating bullets, am-I-right?!

Also sorry about the lack of updates, with spring and work, it’s been hard to find time to blog. Deepest apologies.

The Story

Well it’s Tuesday so it’s clearly a day for a story, and boy do I have a story for you. I was going about my usual way of working.. endlessly to meet my goal… that never able to acquire goal of perfection…. anyway… I had completed a couple major tasks of going from vCenter 5.5 to 6.5u2, and while I have yet to blog about that fun bag (cause I had used my own PKI and certificates so the migration scripts VMware provides pooped the bed till I replaced them with self signed) but I digress this has nothing to do with that… well sort of.

Where was I… oh right… I was vMotioning a couple VMs to some new 6.5U2 hosts I had setup on new hardware (yeah buddy this was a whole new world (Why did I just say that in the voice of Princess Jasmine?)) but alas the final VM was not meant to move for it had stalled @ 20 percent (I had this happen once before during my deployment and discovered some interesting things about multi-NIC vmotions) I probably should blog about that toooooo but alas my time is running short. (I still have many other things to tackle and blog about). and then……..

“A general system error occurred: Launch failure”

ugh…. I could have jumped right to the logs but I jumped on google instead…

CloudSpark apparently recently came across this issue and posted on all the many places he apparently could think… reddit, and VMware forums 

the VMware post got a tad long, but it was more the post on reddit that got me wondering….

“Haven’t seen this personally, but there’s a few things via google on the “Failed to connect to peer process” error that suggest it could be due to running out of storage. Specifically in one scenario, the /tmp/vmware-root folder on the ESXi host fills up with logs.” – Astat1ne

When I SSH into the host and ran “df -h” I was surprised to be reported back with an error:

esxcli returned an error: 1

Something to that extent anyway. I ended up Moving all the VMs back off the host, and swapped out the SD card the ESXi software was installed on (I had a copy of the SD card with a clone of the ESXi software and host configure that was created right after the host was installed and configured).

Sure enough after reboot powering it back on, “df -h” returned clean. I was then able to vMotion all the VMs back onto the set of new hosts.

Def a generic error message I’ve never seen, and thinking about it now is rather comical. (I know when your in the middle of the problem it’s not so funny)…

BUT it sure is now! :D…. this is so going to come back to bite me….

(await updates here post April 30th)

Exporting OPNsense HAProxy Let’s Encrypt Certificates

You know… in case you need it for the backend service… or a front end IDS inspection… whatever suits your needs for the export.

Step 1) Locate the Key and certificate, use the ACME logs!

cat /var/log/acme.sh.log | grep “Your cert”

*No that is not a variable for your cert, actually use the line as is

Step 2) Identify your Certificate and Key

Step 3) run the openssl command to create your file:

openssl pkcs12 export out certificate.pfx inkey privateKey.key in certificate.crt

Step 4) use WinSCP to copy your files to your workstation

*Note use SFTP when connecting to OPNsense, for some reason SCP just no worky

Testing Active Sync

I’ll keep this post short.. I swear this time haha

The Story

I recently blogged about using OPNsense as a reverse proxy for Exchange. This was really nice, and I was able to access Outlook Anywhere, literally from anywhere with owa. However…

The Problem

For some reason I could not connect with Active Sync… Even when I had my phone in the same network pretty much. I thought it might had to do with the self signed certificate (I should have known it was not when selecting “Accept all certificates” under the connections options still didn’t work), so  I wastefully exported the certificate and key from my OPNsense server and imported it into the Exchange ECP, and assigned it to the IIS and SMTP services. (I’m probably going to change these back to the self signed as I don’t really have intentions on completing these steps every 60 days.

Since I didn’t want to use my account to test on Microsoft test site (this is a life saver), I used a account and email I was setting up for my colleague… and it passed… I was shocked. (At first I thought it was cause of the certificate changes… I soon found out.. it was not).

So I tested the same connection settings on my phone and it worked!

Woo Hoo!

The Real Problem and Solution

My happiness was sort lived once I attempted to add my account…. which still failed…. what the heck? I had all the settings the same exc…ep…t… my account…. wait a minute…. Google… WTF! Admins can’t use active Sync?!?!? Why isn’t that more specified in the documentation! I was aggravated about this for days… cause of something that was sooo simple!

I’ll cover exporting and importing certificates for other uses in a nother blog post. I just wanted to get this one out cause even though I had configured everything in my previous blog post about using OPNsense as a reverse proxy correctly, I wanted to follow up on why Active Sync was working for me, cause everything else on online guides made it sound so simple, and it rather is… until you find out a little secret GEM MS didn’t tell you about…