vCenter SSO

vCenter SSO

The other day I covered installing vCenter.

Today I’ll do a very quick overview on setting up SSO with a Windows based AD Auth.

DNS

Step 1) validate vCenter can reach any AD via the Root domain name:
*USE AD SERVER FOR DNS, 3rd Party DNS leads to failure as missing specialized records, E.G. srv records)
*Ensure Time is synced to within 5 minutes of AD server*

I ssh’d into the VCSA using root and then, “shell” and a regular old ping command to validate.

Step 2) Follow Virten’s Guide for doing the Flash way, or CLI way to join vCenter to the Windows Domain. Via the HTML5 Web Client: Menu -> Administration -> SSO -> Configuration -> Active Directory Domain -> Click Join AD (hidden behind the menu in the snippet)

Enter the domain to join, and an account that is allowed to join systems to the domain, in my case I used my Domain ADmin Account:

Populate the fields, and click joing and sure enough you will join the domain without issue… if you have a proper working NTP/AD architecture that is…

Thanks VMware… Ugghh ok, and if I use the CLI maybe some more verbose error?

What do you mean you “DC not found” what kind of PCLoadLetter error is this? Like I just verified lookup via DNS which is like the primary pre-req besides firewalls, which I have already configured my actually firewalls… so what gives, Googling this error leads me to this.

and I quote “On ESXi 6.5, the command is executed from /usr/lib/likewise/bin. If you haven’t enabled the AD firewall rule mentioned earlier, you must temporarily unload the ESXi firewall – assuming it is enabled – for this to work. Failing this, you will get an Error: NERR_DCNotFound [code 0x00000995] error.”

Are you ****in’ with me…. for reals… man wtf VMware….

Shit, right this is the VCSA not a ESXi host… ugggh quick research…

What… da… How, did I not know about this?! There’s a special VCSA management page, everything online just uses the “Web Client” which all VMware’s documentation assumes this to be the Flash client, which doesn’t even reference this at all!

https://vcsa:5480

Alrighty then… logging in… mhmm

That’s awesome but I don’t see firewall, maybe if I navigate to networking…

Nope, NICs settings and that’s about it:

C’mon those firewall settings have to be here, I don’t want to have to be forced to use flash…. cmon…..

F*** it says it’s for 6.7 I’m clearly on 6.5 there has to be a way…

After some deeper digging ( I found out VCSA uses python scripts to use specific files to build the firewall) then also talking this problem over with someone on the IRC channel #wmware, and digging a bit further and finding this vmware post….

I was at first simply using a third part DNS, having JUST an A host record for the AD server, not any of the other service records for LDAP or anything else, after changing my DNS settings on the VCSA to point to the AD server itself I got a different error at the CLI:

Bahhh what? oh wait… lol all my time is wrong, everywhere…

NTP – Fixing Time

Actual time 8:20 PM Winnipeg Central Time. Mon Oct 7, 2019

AD server time: 2:09 PM Mon Oct 7, 2019 (CST)

VCSA time: Tue Oct 8 01:15:08 UTC 2019

What a gong show… let’s fix this! First MS states to leave the PDC to system time to get form the host as host gets acurate time, well not for me. I could point the host to external, and wait then changing PDC time auto. But if you want to Domain join the hosts they should follow the hierarchy and use the PDC as time, catch 22, so instead PDC points to external source, and hosts will point to PDC for time and DNS (this allows for ease for changing external time provider and no issues with time sync).

So fixing PDC time:

before:

after

NOw time has changed and my firewall shows the successful packets, but why is my offset still so off? and why is my time an hour off?

Here’s my local workstation:

Yet here’s my PDC:

ok everything I checked online I’m sure I did it right but the syntax on one of the guides I was following didn’t seem right and I tried again and this time it worked, finally!

K, now I can update each host in my lab….

Before:

Configure:

After:

Finally VCSA itself, https://vcsa:5480 (login as root) -> Time

Before:

Configure:

After:

Yay, after fixing my time everywhere:

Joining VSCA to Windows Domain via CLI

/opt/likewise/bin/domainjoin-cli join $domain $user '$password'

YAY!

Quick Re-Cap:

So bad news is this isn’t as short a blog as I wanted, but good news is we are all learning something! Yay!

Now that we got our system domain joined (reboot required)

waiting… waiting….

Verifying AD object on AD server (core, via powerhsell)

and on the HTML 5 Web Client:

Adding Identity Source

Now I can finally follow adding the Identity source A) AD Auth from here.

Click on Identity Sources -> Add Identity Source:

omg finally something that was dead simple…

Defining Permissions

Now click on global Permissions.

Click “+” icon, and if system join is all good it should be able to query the AD and find the users when typed into the Name field:

Lets test it….

Second attempt but pushing to children objects:

and yay this time I was able to get in successfully:

but I had to put in my UPN (user@doman.local) what if I just want to enter my user name…

What a bunch of poop, that’s cause we didn’t set the primary SSO domain… back in the VCSA settings https://vcsa:5480 – summary shows

back on vCenter Web Client, Menu -> Administration -> SSO -> Configure -> Identity Sources -> select new source -> click Set as Default:

login again:

success, and finally as the source virten post stated, the “Use Windows Authentication” option is greyed out unless the Enhanced Authentication Plugin is installed. You can find the download link at the bottom of the login screen.

Summary

That was a bit more painful then I wanted it to be, but it really was nice that it was this painful cause it reminded me of the moving parts that have to be setup correct for this all to play nicely to begin with.

I hope this guide has helped someone. Please leave a comment, any comment will do!!!

 

Remove “inaccessable” datastore from VCSA

In my previous post I mentioned restoring my ESXi after a bad upgrade. Today when I attempted to add it back into vCenter, it complained stating a Datastore with the same name exists. I was a bit stumped when I saw it showing up under the datastore area as inaccessible, when there should be nothing referencing it. Googling led me to this gem where MikeOD states:

“I figured it out. I was double checking on VM’s on those datastores. Under “related objects”, there were no VM’s or hosts, but there were two old templates that were still referenced by the original VCenter. When I right clicked on the template and selected “remove from inventory”, the data stores disappeared.”

mhmmm, looking at the associated VM, I checked one of it’s settings and sure enough, an old ISO was mounted on it:

just as Mike said, as soon as I removed the association, by changing the VM to client device, the inaccessible datastore went away.

You can also check for templates, snapshots, etc.

Using VMware Update Manager (VUM)

VUM

Overview

In this post I’m going to try and upgrade one of my ESXi 5.5 host to 6.5 using VCSA’s now built in by default VUM. I followed this video on youtube for reference.

First thing I noticed but the video doesn’t mention is that (at least for 6.5) VUM tab is only available using the flash based client. If you use the HTML5 based web client the Update tab isn’t shown:

HTML 5 Missing update Tab:

Flash based client with Update Tab:

As soon as I see a video, I can tell which version is being used as its sooo different. (Flash sucks)

Import Image

About a minute into the source video he gets to where the major images are for major host upgrades. Since I had no existing images provided like the video I decided to try the import wizard, which poped up a useful Windows Selection Dialog box (as I was testing this form the latest version of Windows 10 – 1903, with the latest Chrome (Built in flash after enabling it)), on top of that I uploaded the the image from a UNC path (\\ip\some\folder\file.iso) after verifying access via windows explorer, I simply pasted the UNC path into the Windows Selection Dialog box address bar, and selected the ISO image. and it worked.

only once a image is uploaded, and selected does the creation of a baseline option make itself shown.

Baseline

Let the beat drop! *Bass beat drops* What the point of this is I can’t exactly tell yet, it seems to be a one to one mapping between a name and the ISO image being used?

paraphrasing the video guide “Now that we completed this useless step, we navigate to the cluster needing to be upgraded” In my case a single 5.5 test host, and by clicking on “Go to compliance view”

Attach Baseline

It would seem the Update Tab is available, at either the vSphere host level, Datacenter level, cluster level or host level depending on the scope you wish to deploy a “baseline”. Once within the scope you choose, I’m at host, click on “attach baseline” after ensuring you are still on the update tab.

much like the source after attaching the baseline the compliance level was shown as unknown, let’s follow along and “scan for updates”.

Now I’m assuming cause I am at the host level I don’t see the tabs with compliant and others cause there is only one host. and in this case it does change to “non-compliant” cause as the speaker states “The hosts listed as non-compliant do not match the version of ESXi associated with the attached baseline” AKA these hosts need to be upgraded.

Remediate

Click it to being the upgrade process for the host/cluster, which will being a wizard! Ohhh might wizard guide me to the light at the end of the tunnel!

while I clicked next, flash gave me an error prompt telling me my session had expired, and kicked me out back to the login page, even though I was still pretty actively working on it (snippets don’t take that long). Stupid flash, logged back in and back to the wizard:

agree to the EULA

Schedule it or do it now by not checking a scheduled time.

Pick your additional options and remediation options (I picked for my test to suspend my VMs as they are unable to be vmotions live due to no EVC based cluster of the hosts. they are all stand alone at the time of this writing. so lets try that.

After clciking finish I didn’t see anyting much happening at the vcenter tasks, so I logge dinto the host being upgraded and saw it was suspending the vms in question:

Now it has to copy all the memory from these VMs to disk so this could take a bit of time… then I’d assume I’ll be disconnected from the host once it reboots.

Monitoring my pings for these servers the pings have dropped starting the suspend stat (makes sense) but the host is still responding (makes sense).

I decided at this point to go get some food, I’m lazy and don’t cook, so by the time I had returned I was rather shocked to see the host had succefully been updated, showed compliant and my systems were right back to operational…

Summary

Besides the flash rubbish, this was overall a rather good experience. :O

I think I may upgrade more hosts this way in the future. I didn’t even have to step into my basement at all. That was great!

Until…

I was going to update my second host at home and was hit with this…

well wtf… then it hit me in the face… oh yeah…. I forgot about that, this is a nice real possible word example of third party, unsupported drivers. When I checked my own blog, and lucky the reference to the driver, and where I got it, it appears it still works for 6.x, so I can only guess I’ll have to remove the VIB, run the VUM update procedure, then manually re-install the third party driver… lets try this!

Remove VIB

Following this as a reference, I did the same thing:

esxcli software vib list

esxcli software vib remove --vibname DLink-528T

Ughhhh…

I remember that script/VIB, He was generally really cool guy an dI really loved his blog posts, but his VIB has been rather garbage…. as others have mentioned

Errors

As the picture shows a reboot is required now…. I let VUM do its thing with the standard ESXi 6.5u2 baseline I was using, after the server rebooted I got a problem:

“There was a problem with the Network Device specified on the command line. Error: No NIC found with MAC address.”

Discussion :

The NIC to be used as the management NIC has no drivers installed for it.

Ohhh crap, I forgot when I installed ESXi on this desktop I had to make a custom image, and this is a requirement for systems with custom builds, I removed the drivers for the one NIC but it was not for the ESXi mgmt, but the built in NIC on the mainboard is Realtek and… yeah… anyway, I’ll make a post on creating a custom image, but after a good while of failing to get what I need (as it was my hypervisor with my internet providing VM). I managed to find my initial build and re-install it manually and re-register the VMs and re-create the vSwitches and brought everything back up.

In this case I could have used my Veeam server, however none of my other hypervisors have multiple NICs and thus not an option to use them. My lab is def no redundant lab setup.

Reset ESXi trial license

Quoted directly by Aaron from:

“This guide will give you the steps needed to reset the license file so that you can apply the evaluation license back to your ESXi host.

WARNING: This is for education/informational testing/development purposes only, and should not be used on a production server.

To reset your expired ESX 4.x, ESXi 4.x, ESXi 5.x or ESXi 6.x 60 day evaluation license:

  1. Login to the HOST via SSH or Shell
  2. Remove /etc/vmware/license.cfg
  3. Copy /etc/vmware/.#license.cfg to /etc/vmware/license.cfg
  4. Restart the vpxa service

Or simply copy the code below and paste it into your SSH session.

rm -r /etc/vmware/license.cfg
cp /etc/vmware/.#license.cfg /etc/vmware/license.cfg
/etc/init.d/vpxa restart

Then open the “Licensed Features” option in the configuration tab of the ESXi host through the vSphere Client.

Click on “Edit” in the top right of the “Licensed Features” page

Once the “Assign License” window opens you will see two options. There will be a category for “Evaluation Mode” and Assigned License. Click on the “(No License Key)” option and then click “OK”. This will set the host back to “evaluation” mode and will give you access to all features for 60-days!”

Installing vCenter

Installing vCenter

Since vCenter will not be support on Windows moving forward, all discussion of vCenter will simply be referenced by its new known acronym; VCSA. vCenter based on linux.

I just signed up for VMUG advantage as such I get to play with vCenter at home, yay, else get the required ISO from VMware’s product portal using your own VMware login ID.

Although 6.7 is out, and well polished, 6.7 cannot manage ESXi 5.5 hosts, since I still have a few I’d like to use in my cluster, I’m going to be using VSCA 6.5 for this guide.

Also, I technically only have 5.5 based hosts at this moment (I love the phat (C#) client).

new version PhotonOS?

VCSA CPU and RAM Requirements

VCSA Storage Requirements

Open/Mount the ISO on your OS of choice. For me in Windows, simply mount the ISO and navigate into the vcsa-ui-installer\Win32\installer.exe

Run it!

Stage 1

*Drools* I’m not sure what to do… *Clicks Install*

Introduction; Next
EULA; Accept; Next
VCSA + PSC
Target Host + port + username + Password; next
VM Name + Root Password
Select Datastore (I enabled Thin Disk)
Give a system name (which you’ll want to point to the IP address you define, in the DNS servers used by the VCSA and any client systems needing management access)
IP Address
IP MASK
Gateway + DNS Server


Finish.

Now it states this will take a few minutes as it depends on, the hardware specs of the ESXi host it was deployed to, and maybe internet speed if these RPMs are not on the OVF template that was deployed. Also the VM has to boot.

Quick Break time!

Interesting default… until it finally completes…

Stage 2

NEXT!

NTP servers (0.ca.pool.ntp.org,1.ca.pool.ntp.org,2.ca.pool.ntp.org)

Next

New SSO domain, create a password for administrator@vsphere.local (I’ll create a SSO domain for zewwy.ca later to allow my local AD based accounts to have logon rights later on in this or another tutorial).

DEPLOY!

Mhmm, after 2 attempts I kept getting a pschealth service error. I googled it but the VMware KB was rather useless.

On the third try, I set the system name to IP address, as well as set the vCenter to simply use the hosts time, instead of NTP (even though I used the same NTP server the host was using… so shrug), also waited a little bit longer when starting stage 2, and on the third try it finally succeeded the installation.

Then I added the license key and assigned it to vCenter. which was provided to me when I checked out the “purchase” on VMUG advantage partner site.

Summary

Over all the process is very straight forward. In the next post I’ll cover adding hosts, assigning keys, connecting VCSA to an AD server for an alternative SSO domain. Stay tuned!

Veeam, SMB, and the Failed to get disk free space

The Story

I wanted to try Veeam B&R Free again, now that I discovered a trick on re-issuing the 60 day trial key on ESXi hosts so I should be able to get past my old issue I blogged about “The VMware Screw“…

So I D/L the latest n greatest from Veeam and that’s B&R 9.5 Update 4, grab the latest builds here (Veeam Login Req)…

Run the installer, nothing special here. Love the new UI, amazing how much nicer it is vs the old Free Edition.

Anyway, navigate to Backup Infrastructure to add a Repo, in this case a simple USB HDD I was sharing via SMB on a FreeNAS server. I had created it with open access so no authentication was required to access the share.

As shown here, I was accessing the file share without issue in Windows Explorer…

However, attempting to add it as a Repo…

Whomp, whomm, whmomomomomom.

Kind of annoying that anonymous SMB is I guess not supported as a Repo type, or maybe just not with my particular setup, I’m not exactly sure what the exact reason for this error being hit as I don’t have access to Veeam source code. Anyway, I started to google for a possible solution, annoying the first result was simply a post which a Veeam rep simply posted to the second most common solution post which basically stated:

“add the registry setting:

Key: HKLM\SOFTWARE\Veeam\Veeam Backup and Replication\
DWORD: NetUseShareAccess = 1

As per KB1735”

Did that and…. Failed to get disk free space. Since a lot of people on these threads are mentioning the user of a username and password I decided to follow this guide on creating a user for an SMB share on FreeNAS.

Follow the Veeam wizard and… “Can’t get disk free space”

Ughhhh… I was about to give up and actually attempt a physical alternative then I noticed something people were saying…

ender – “Depending on your SMB server, you may need to enter the username as DOMAIN\user or SERVERNAME\user before it’s accepted.”

StivoBerlin – “Have you tried to write “YOUDOMAINONYOURSYNOLOGY\youruser” instead of only “youruser” for the NAS login ?”

michaelbrandi – “I have found that it makes a difference if you enter “full” credentials so not admin but IP\admin, not sure it’s universal, but it helped me.”

That’s when I added an account to Veeam as “FREENAS\TestUser” along with the password, and used that credential after entering the path, and got past the error!

I was actually rather impressed at the speed of the backups considering it was a USB drive shared over the network via SMB from FreeNAS…

Look at that, backed up a 20 gig vm in 8 minutes wooo! Not bad for Free.

Maybe I’ll re-follow up on my old free VM backup series and bring it back with a proper tut on each step required to make it work.

For now I hope this information helps someone!

ESXi 6.5 on Proliant Gen9 Hardware Status Unknown

I’ll keep this post short.

If you have a Proliant Gen9 server and running ESXi 6.5 u2 along with VSCA 6.5u2.

You will get all hosts not displaying any hardware status. This should fixed immediately as you don’t get alerts on any hardware faults via IPMI. This includes status from hosts running ESXi 5.5 or 6.5.

The first fix is to upgrade the VCSA to 6.5u3.

After upgrading the VCSA to 6.5u3… Hardware status will come back for each host.. however.. if you are running ESXi 6.5u2 on the Gen9 servers you’ll something like this:

as you can see some sensors are a lil wonky…

The fix here is to upgrade the host to 6.5u3 via the HPE build.

After the hosts and the VCSA are on 6.5u3 all is good and hardware faults will again will trigger critical alarms on vSphere.

A Productive Nightmare

The Story

Lack of Space

It all begins with a new infrastructure design, it’s brilliant. All the technical stuff a side, the system is built and ready for use, one problem the new datastore is slightly overused (many plans for service migrations and old bloated servers to be removed but have not yet been completed). I had one datastore that was used for a test environment, with the whole test environment down and removed this datastore would be perfect temp location till the appropriate datastore could be acquired.

The Next Day

I was chatting with our in house developer when a user walks in asking why they couldn’t complete a task on the system, figuring a work flow server issue simply rebooting it often fixed any issues with it, however this time I also received an email from the DBA stating reports of a DB issue due to bad blocks on the storage level.

At this point my heart sank, I quickly logged into the storage unit and was shocked to not see any notification of issues, deciding right then and there to move to back to reliable storage I made the svMotion, while it was in progress the storage unit I was logged into finally showed errors of disk failure, one disk had failed while the other had become degraded (In a RAID 1+0 this can be bad news bears) after the svMotion completed there was still a corrupted DB (we all have backups right?) lucky it was just a configuration DB for the workflow server and not any actual data, so I provided the DBA with a backup of the database files, didn’t take long and everything was back to green.

That Weekend

I decided to play catch up on the weekend due to the disruptive nature of the disk failure that week, to my dismay and only by chance the new host in the new cluster was showing disconnected from vCenter… What the…

Since I wasn’t sure what was going on here at first I chatted with the usual’s on IRC, I was informed instantly “RAMdisk is full”. After some lengthy recovery work (shutting down VMs and manually migrating them to an active host in vCenter) I discovered it was cause the ESXi host did lose connectivity to its OS storage (in this case was installed on an SD card)

So I updated the firmware on the host server. This so far (after a couple weeks now) has resolved this issue.

Then while I was working on the above host lost of connectivity, the other host lost connection to vCenter! However this one had much different signs and symptoms, after doing the exact same process of moving VMs off this host, it was determined by VMware support that it was “possibly” due to the loss of the one datastore. Remember the datastore I discussed above, although I had moved any VM usage of it from the hosts I did not remove it as an active datastore, so although the storage unit was accessible while the disks had failed, for some reason the whole storage unit had failed (UI was now unresponsive). So I had to remove this datastore and all associated paths. After all this everything was again green for this cluster.

So much for that weekend…

That Storage Unit

Yeah alright so that storage unit… it was a custom built FreeNAS box that was spliced together from a HP DL385p Gen8 server. I got this thing for dirt cheap and was working as a datastore perfectly fine before the disc failure so I don’t blame the hardware or even FreeNAS or all the crap that happened. It was just a perfect storm.

So I decided to try something different with this unit first… since I had been using an LSI 9211-8i flashed in IT mode (JBOD) for the SAS expanders in the front (25 disk sff). I decided I would try to build my first hyper-converged setup. That meant creating a FreeNAS VM, hardware passthough the storage controller (LSI 9211-8i) and then created datastores using the discs in the front.

Sooo

The Paradox

The first issue I had was the fact you need a datastore to host the FreeNAS VMs config and hard drive files… but if we are going to do hardware pass-through of the entire SAS exapnders via the LSI card, that means it’s not accessible or usable for the host OS. Uggghhhh, now we could use NFS or iSCSI but the goal for me was to have a full self contained system not relying on another host system, now I can easily install ESXi on a USB or SD card, but it won’t allow me to use these as datastores. At least not on there own…

Come here USB datastore… I mostly followed this blog post on it by Virten however I personally love this old one by non other than my favorite VMware blogger William Lam of VirtuallyGhetto.com

*My Findings* Much like the comments on here and many other blog and form posts about doing this is I could not get this to work on 5.1 or 5.5 those builds are too finiky and I’d always get the same error about no logical partition defined or something, yet worked perfectly fine in 6.5 or 6.7 (I personally don’t use 6.0)

OK, so I decided to use ESXi 6.7, installed on a SD card, and setup a 8 gig USB based Datastore. Next Issue is you have to reserve the memory else you’d be limited to even less than 4 gigs as ESXi will complain there is not enough from on the datastore for the swap file. Not a big deal here as we have plenty of RAM to use (100 Gigs HP genuine ECC memory).

I did manage to get FreeNAS installed on said datastore and as you’d expect it was slowwwwww. My mind started to run wild and though about RAMDisc and if it was possible to use that as a datastore… in theory.. it is! William is still around! 😀

Couple notes on this

1) you need a actual Datastore as it seems like ESXI just creates system links to the PMem Datastore. (I noticed this by attempting to ssh into the host and simple copy the VM’s files over, it failed stating out of space, even though there was enough defined for the PMem Datastore).

2) You create the VM and defined the HDD to be on PMem Datastore and will warn you of non persistence.

Sure enough I created a FreeNAS VM on the PMem and it was fast install, but as soon as the host needed a reboot, attempt to power on that VM and it says the HDD is gonezo. So this was cool, but without persistence it sort of sucks.

Anyway I didn’t need the FreeNAS OS to have fast I/O anyway, so stuck with the USB based datastore. Then I went to pass-through the controller, now enabling pass-though on the controller worked fine, but the VM wouldn’t start.

Checking the logs and googling revealed only ONE finding!

No matter what I tried the LSI card or the built in HBA same error as the post above:
“WARNING: AMDIOMMU: 309: Mapping for iopn 0x100 to mpn 0x134bb00 on domain 1 with attr 0x3 failed; iopn is already mapped to mpn 0x100 with attr 0x1
WARNING: VMKPCIPassthru: 4054: Failed to setup IOMMU mapping for 1 pages starting at BPN 0x100000100”

Yay, another idea gone to shit and time wasted, I learned some things but I wanted to learn something and bring some use back to this system… ugh fine! I’ll just put it back to normal connecting the SAS expanders to the P420i HBA and use the 2GB battery backed cache to define a speedy datastore and just keep it simple…

The Terrible HBA

I don’t wish this HBA on anyone seriously, so after I put it all back to normal, the first thing I find is:

  1. When I booted the server and let the system post, when it got up to the storage controller part (Past the bottom indication to press F9 for setup, F10 for Smart Provision, and F11 for Boot Menu) it will list the storage controller and it’s running firmware in this case v8.00.
    Half the time if I pressed F5, if there was no previous error codes and no disks or logical units defined I someones got into the ACU (Array Configuration Utility) the other half the fans would kick up to 100% and stay there while the ACU booted (showing nothing but an HP logo and a slow progress bar) and when ACU finally did load I’d be presented with “No Storage Controller found”
    (Trust me I got a 40 min video of me yelling at the server for being stupid haha)
  2. This issue would become 100% apparent as soon as I plugged in a drive with a logical unit defined from another (updated) version of Smart Array.
    To get around this issue I ended up grabbing the “latest” HP SSA (Smart Storage Administrator) tool from, HPs site. Now I quote latest due to the fact is it’s from 2013… No this allowed me to finally build some arrays for me to use with the planned ESXi build.

I noticed that at first I wasn’t seeing the new logical drive I defined in the HP SSA in ESXi itself, I totally forgot to grab HPE custom build as it includes all required drivers for these pieces of hardware.

First thing i notice after grabbing HPE’s custom ESXi build… in this case 6.7 (requires VMware login) is that the keyboard is buggering out on me when attempting to configure the management NIC.

At first I thought maybe the USB stick was crapping out due to the many OS installs I’ve been doing on it. So I decide to move to using the logical array I built, the custom installer does see the new array and away I go, still buggy, so I thought maybe it’s the storage controller firmware? Looking up the firmware for P420i or equivalent appears there are numerous post of issues and firmware updates.. turns out there’s even a 8.32(c) Nov 2017 update, since I was too lazy to build a custom offline installer for this firmware flash I used an install of Windows Server 2016 and ran the live updater, to my amazement it worked flawless… yet also to my amazement Windows worked perfectly fine on the same logical array regardless of the firmware it was running (Is this a VMware issue…??)

So after re-installing the custom ESXi 6.7 from HPE, the host was still being buggy… and now started to PSOD (Purple Screen of Death)… are you kidding me, after everything that’s already happened… ughhhhh…

Googling this I found either

A) Old posts of Vendor finger Pointing (Around ESXi 3-4)

B) Newer Posts (ESXi 6.7~) this lead me to the only guy who claimed to have fixed his PSOD and how he did it here

Which I found I was not having the same errors showing which lead me to my first link due to the logs. Having updated all the firmware, and running HPEs builds I could only think to try the ESXi 6.5-U2 build as the firmware was supposedly supported for that build.

Now running ESXi 6.5-U2 without any issues, and no PSOD! Unfortunately without warranty on this hardware I have no way to get HP to investigate this newer 6.7 build to run on this particular hardware.

Icing on the Cake

Alright so now I should finally be good to go to use this hypervisor for testing purposes right? Well I had a bunch of spare discs and slots to create a separate datastore for more VMs yay…

Until I went to boot that latest HP SSA offline I listed above that fixed the fan speed and no controller found for the ACU, well now this latest HP SSA was getting stuck at a white screen! AHHHHHHHHHHHHHHHHH how do I create of manage the logical unit and build arrays if the offline software is stuck, well i could have installed and learned how to use the hpssacli and their associated commands but since I was already kind of stressed and bummed out at this point installed Windows Server 2016 and ran the HP SSA for that which looks exactly like the offline version.

Finally created all my arrays, installed the only stable version of ESXi with associated drivers, have all my datastores on the host showing green, created a dedicated restore proxy and am finally getting some use back from this thing….

Conclusion

What… a …. freaking… NIGHTMARE!

 

A general system error occurred: Launch failure

Failure to Launch

Sound the Alarm! Sound the Alarm!

*ARRRRRRREEEEEEEEERRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR
ARRRRRRREEEEEEEEERRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR

and one heck of a day sweating bullets, am-I-right?!

Also sorry about the lack of updates, with spring and work, it’s been hard to find time to blog. Deepest apologies.

The Story

Well it’s Tuesday so it’s clearly a day for a story, and boy do I have a story for you. I was going about my usual way of working.. endlessly to meet my goal… that never able to acquire goal of perfection…. anyway… I had completed a couple major tasks of going from vCenter 5.5 to 6.5u2, and while I have yet to blog about that fun bag (cause I had used my own PKI and certificates so the migration scripts VMware provides pooped the bed till I replaced them with self signed) but I digress this has nothing to do with that… well sort of.

Where was I… oh right… I was vMotioning a couple VMs to some new 6.5U2 hosts I had setup on new hardware (yeah buddy this was a whole new world (Why did I just say that in the voice of Princess Jasmine?)) but alas the final VM was not meant to move for it had stalled @ 20 percent (I had this happen once before during my deployment and discovered some interesting things about multi-NIC vmotions) I probably should blog about that toooooo but alas my time is running short. (I still have many other things to tackle and blog about). and then……..

“A general system error occurred: Launch failure”

ugh…. I could have jumped right to the logs but I jumped on google instead…

CloudSpark apparently recently came across this issue and posted on all the many places he apparently could think… reddit, and VMware forums

the VMware post got a tad long, but it was more the post on reddit that got me wondering….

“Haven’t seen this personally, but there’s a few things via google on the “Failed to connect to peer process” error that suggest it could be due to running out of storage. Specifically in one scenario, the /tmp/vmware-root folder on the ESXi host fills up with logs.” – Astat1ne

When I SSH into the host and ran “df -h” I was surprised to be reported back with an error:

esxcli returned an error: 1

Something to that extent anyway. I ended up Moving all the VMs back off the host, and swapped out the SD card the ESXi software was installed on (I had a copy of the SD card with a clone of the ESXi software and host configure that was created right after the host was installed and configured).

Sure enough after reboot powering it back on, “df -h” returned clean. I was then able to vMotion all the VMs back onto the set of new hosts.

Def a generic error message I’ve never seen, and thinking about it now is rather comical. (I know when your in the middle of the problem it’s not so funny)…

BUT it sure is now! :D…. this is so going to come back to bite me….

(await updates here post April 30th)

VMware ESXi 5.5
D-Link DGE-530T RevC

The Story

Are you guys ready for a story? This one is actually not so bad. A couple days ago I post on Facebook if anyone happened to have a spare PCI/PCIe Network Interface Card (NIC), since it was going to be used for interest access I was ok with it being 100, but was aiming for 1000 (now that Shaw provide over 300mbps internet, clearly 100 doesn’t cut it).

After a day of no luck, and a bunch of funny remarks (as almost none of my friends had any idea of what I was talking about), I decided to take another look through my old computer hardware to see what I could scrounge up…

PCI NIC Found!

well, well, not even dusty, a PCI NIC, exactly what I needed in my hypervisor to play with OPNsense. I originally was going to try layer 2 trunking via VLANs, however the main vSwitch already had VMkernel Nics bound to the physical adapter @ layer 3, and the same interface on my firewall (Palo Alto) wouldn’t allow me to create a layer 2 sub-interface is the main interface was already bound to layer 3. Since I wanted my OPNsense VM to get an actual public IP address, this required my device to get a connection from my VM, directly to my modem at layer 2… yeah another NIC. So here we are, and it didn’t take long for me to shut down my VMs and install the card, and boot my hypervisor back up (I hope to one day have multiple hypervisor to not have to shut down my VMs, but even then, if you don’t pay chances are you won’t get access to the APIs that migrate the memory states of the VMs for you, so it’s a hassle either way…. anyway back to the story.

PCI NIC Found … NOT

Oh Borat, who brought you in?!?! So as you may have guessed I went to add a new vSwitch for my new VM to get it’s direct Public IP, and to my dismay there was no physical NIC to pick… what the….

So to Google! and hopefully either VMware support, or usually always better personal blogs! We all loves these right… ahem… anyway…

You can probably guess where the official answer went, but I’ll enlighten you as I did follow along for … pain? OK I don’t know why I did, I was really hopeful it wasn’t going to be the answer I knew it was going to be….

Hey! some of the command they provided helped, or did they? All this was, was some BS data chasing to tell you, IT’s Not supported, SOWWY!

Clearly, there must be some answers by the community forums right??

Community’s great! VMwares…. :S

So what do we get… One… unanswered and crying about a badly referenced link to source two... also unanswered crying about the same stuff we already know…. it’s officially not supported. Well I’m running ESXi 5.5 Free and using GhettoVCB’s scripts, also unsupported, so not really an issue… the issue is teh lack of help right now.

But bring me down, I don’t thikn so, the internet has many sites, and many people sharing their knowledge, how?!?! BLOGS! Ahem…

Blogs to the Rescue!

Yes believe it or not it is the power of the real untethered, unfiltered beauty that is blogging that we actually get some meat and potatoes. My first source showed signs of light! One problem, it’s literally 9 years old and using ESXi 4. OK well it also wanted a fair amount of direct file placing and special manipulation. Most of this works fairly differently in ESXi 5.x, and vibs or precompiled binaries that work with esxcli are the more preferred method. I avoid saying supported here, cause I use these methods to install unsupported packages :D.

Alright, so now what, well the Holy Grail! This King managed to not only blog about getting this working but shared the drivers/vibs packages required to get it to work too! Epic! Let’s get this dang NIC working…

1) Grab the VIB files

2) Change your support level on ESXi5+:

~ # esxcli software acceptance set –level=CommunitySupported
Host acceptance level changed to ‘CommunitySupported’.

3) Install the driver with: “esxcli software vib install -v /DLink-528T-1.x86_64.vib“

4) Reboot

Sounds simple enough lets give it a shot… and I hit some errors, classic…

I won’t show the erros just yet as I have it one long snippet, but basically I had a bit of problems cause of the GhettoVCB scripts I had pushed on to my host, but the error results weren’t exactly clear… I attempted a couple things first, like copying the VIB to the path it kept complaining about and specifying the fully qualified path to the VIB.. nothing till I stumbled across this...

esxcli software vib install -v /full/path/to/.vib -f

which finally gave me a driver install successful!

Alright, and after reboot…..


OMG! No way, there it is with the proper name and everything. Considering the blog post I followed was for a different NIC model I wasn’t sure if it would work, but there it is… so lets not get to ahead of ourselfs and see if it comes up and is able to transmit packets…

I was having some issues initially so I decided to give my lil netbook a simple /24 IP and give my OPNsense a simple /24 IP just to validate the card wasn’t the issue, or the drivers I just installed.

Plug them together, lights come up, that’s good… checking ESXI vSphere…

That’s good, and finally can we transmit?!?!

Hey!!!! we have communication! Now it’ll be figuring out getting the Public IP configured properly. But we’ll save that for another post. 😀 Cheers!