Upgrade and Migrate a vCenter Server

Intro

Hello everyone! Today I’ll be doing a test in my home lab where I will be upgrading, not to be confused with updating, a vCenter server. If you are interested in staying on the version your vCenter is currently on but just patch to the latest version, see my other blog post: VMware vCenter Updates using VAMI – Zewwy’s Info Tech Talks

Before I get into it, there are a couple thing expected from you:

  1. An existing instance of vCenter deployed (for me yup, 6.7)
  2. A backup of the config or whole server via a backup product
  3. A Copy of the latest vCenter ISO (either from VMware directly or for me from VMUG)

Side Story

*Interesting Side Note* VM Creation dates property is only a thing since vCenter 6.7. Before that it was in the events table that gets rotated out from retention policies. 🙂

*Side Note 2* I was doing some vmotions of VMs to prepare rebooting a storage device hosting some datastores before the vCenter update, and oddly even though the Task didn’t complete it would disappear from the recent task view. Clicking all Tasks showed the task in progress but @ 0% so no indication of the progress. The only trick that worked for me was to log off and back in.

A quick little side story, it was a little while since I had logged into VMUG for anything, and I have to admit the site setup is unbelievably bad designed. It’s so unintuitive I had to Google, again, how to get the ISO’s I need from VMUG.

Also for some reason, I don’t know why, when I went to log in it stated my username and password is wrong. Considering I use a password manager, I was very confident it was something wrong on their end. Attempting to do a password reset, provided no email to my email address.

Distort I decided to make a another account with the same email, which oddly enough when created brought me right back to my old account on first log in. Super weird. According to Reddit I was not the only one to experience oddities with VMUG site.

Also on the note of VMware certification, I totally forgot you have to take one of the mandatory classes before you can challenge, or take any of the VMware exams.

“Without the mandatory training? Yes, they represent a reasonable value proposition. With mandatory training? No, they do not. Requiring someone who’s been using your products for a decade to attend a class which covers how to spell ESXi is patronizing if not downright condescending. I only carry VMware certifications because I was able to attain them without going through the nonsense mandatory training.”

“The exam might as well cost $3500 and “include” the class for “free”.”

Don’t fully agree with that last one cause you can take any one class (AFAIK) and take all the exams. I get the annoyance of the barrier to entry, gotta keep the poor out. 😛

Simple Summary about VMUG.

  1. Create account and Sign up for Advantage from the main site.
  2. Download Files from their dedicated Repo Site.

Final gripes about VMUG:

  1. You can’t get Offline Bundles to create custom ESXi images.
  2. You can’t seem to get older versions of the software from there.
  3. The community response is poor.
  4. The site is unintuitive and buggy.

So now that we finally got the vCenter 7 ISO

For a more technical coverage of updating vCenter see VMware’s guide.

For shits.. moving esxi hosts, and vcenter to new subnet.

1) Build Subnet, and firewall rules and vlans
2) Configure all hosts with new VMPG for new vlan
3) Move each host one at a time to new subent, ensure again that network will be allowed to the vCenter server after migration
4) Can’t change VMK for mgmt to use VLAN from the vCenter GUI, have to do it at host level.
i) Place host into maintenance mode, remove from inventory (if host were added by IP, otherwise just disconnect)
ii) Update hosts IP address via the hosts console, and update DNS records
iii) Re-add the host to the cluster via new DNS hostname

Changing vCenter Server IP address

Source: How to change vCenter and vSphere IP Address ( embedded PSC ) – Virtualblog.nl

changed IP address in the VAMI, it even changed the vpxa config serverIP address to the new IP automatically. it worked. :O

Upgrading vCenter

Using the vCenter ISO

The ISO is not a bootable one, so for me I mount it on to a Windows machine that has access to the vCenter server.

Run the installer exe file…

Click Upgrade

I didn’t enter the source ESXi host IP.. lets see

nope wants all the info, fill all fields including source esxi host info.

Yes.

Target ESXi Host for new VCSA deployment. Next

Target VCSA VM info. Next

Would you like, large or eXtra large?

pick VMs datastore location, next.

VM temp info, again insure network connections are open between subnets if working with segregated networks.

Ready to deploy.

Deploying VM to target ESXi host. Once this was done got a message to move on to Stage 2, which can be done later, I clicked next.

Note right here, when you get a prompt for entering the Root password, I found it to be the target Root password not actually the source.

Second Note Resolving Certs Expired Pre-Check

While working on a client upgrade, it was more in my face when doing the source server pre-checks and would not continue stating certificates expired.

I was wondering how to check Existing certs and while this KB states you can check it via the WebUI There  could be a couple issues.

1) You might not even be able to login into the WebUI as mentioned in this Blog, a bit of a catch 22. (Note* same goes for SSO domains, it can’t be managed by VAMI, so if there’s an AD issue with a source, you often get a service 503 error attempting to log on to the WebUI)

2) It might not even show up in that area of the WebUI.

In these cases I managed to find this blog post… which shockingly enough is the very guy who wrote the fixsts script used to fix my problem in this very blog post :O

Checking Certs via the CLI

Grab Script from This VMware KB

Download the checksts.py script attached to the above KB article.
Upload to attached script to the VCSA or external PSC.

For example, /tmp

Once the script has been successfully uploaded to VCSA, change the directory to /tmp.

For example:

cd /tmp

Run python checksts.py.

OK Dokie then, I guess this script doesn’t check the required cert… so instead I followed along with this VMware KB (Yes another one).

In which case I ran the exact commands as specified in the KB and saved the certificate to a txt, file and opened it up in Windows by double clicking the .crt file.

openssl s_client -connect MGMT-IP:7444 | more

So now instead of running the fixsts script, this KB states to run the following to reset this certificate to use the Machine Cert (self signed with valid date stamps, at least that’s what this server showed when checking them via the Certificate management are in the vCenter WebUI).

For the Appliance (I don’t deal with the Windows Server version as it EOL)

/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store MACHINE_SSL_CERT --alias __MACHINE_CERT > /var/tmp/MachineSSL.crt
/usr/lib/vmware-vmafd/bin/vecs-cli entry getkey --store MACHINE_SSL_CERT --alias __MACHINE_CERT > /var/tmp/MachineSSL.key
/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT > /var/tmp/sts_internal_backup.crt
/usr/lib/vmware-vmafd/bin/vecs-cli entry getkey --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT > /var/tmp/sts_internal_backup.key
/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT -y
/usr/lib/vmware-vmafd/bin/vecs-cli entry create --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --cert /var/tmp/MachineSSL.crt --key /var/tmp/MachineSSL.key

Then:

  • service-control --stop --all
  • service-control --start --all

In my case for some odd reason I saw a bunch of these… when stopping and starting the services

2021-09-20T18:35:47.049Z Service vmware-sts-idmd does not seem to be registered with vMon. If this is unexpected please make sure your service config is a valid json. Also check vmon logs for warnings.

I was nervous at first I may have broke it, after sometime it didn’t complete the startup command sequence, and after some time the WebUI was fully accessible again. Let’s validate the cert with the same odd method we did above.

Which sure enough showed a date valid cert that is the machine cert, self-signed.

Running the Update Wizard… Boooo Yeah!

 

Uhhh, ok….

 

Ok dokie?

I didn’t care too much about old metrics.

nope.

Let’s go!

After some time…

Nice! and it appears to have worked. 🙂

Another Side Trail

I was excited cause I deployed this new VCSA off the FreeNAS Datastore I wanted to bring and reboot. but low and behold some new random VMs are on the Datastore…

doing some research I found this simple explanation of them however it wasn’t till I found this VMware article with the info I was more after.

Datastore selection for vCLS VMs

The datastore for vCLS VMs is automatically selected based on ranking all the datastores connected to the hosts inside the cluster. A datastore is more likely to be selected if there are hosts in the cluster with free reserved DRS slots connected to the datastore. The algorithm tries to place vCLS VMs in a shared datastore if possible before selecting a local datastore. A datastore with more free space is preferred and the algorithm tries not to place more than one vCLS VM on the same datastore. You can only change the datastore of vCLS VMs after they are deployed and powered on.

If you want to move the VMDKs for vCLS VMs to a different datastore or attach a different storage policy, you can reconfigure vCLS VMs. A warning message is displayed when you perform this operation.

You can perform a storage vMotion to migrate vCLS VMs to a different datastore. You can tag vCLS VMs or attach custom attributes if you want to group them separately from workload VMs, for instance if you have a specific meta-data strategy for all VMs that run in a datacenter.

In vSphere 7.0 U2, new anti-affinity rules are applied automatically. Every three minutes a check is performed, if multiple vCLS VMs are located on a single host they will be automatically redistributed to different hosts.

Note:When a datastore is placed in maintenance mode, if the datastore hosts vCLS VMs, you must manually apply storage vMotion to the vCLS VMs to move them to a new location or put the cluster in retreat mode. A warning message is displayed.

The enter maintenance mode task will start but cannot finish because there is 1 virtual machine residing on the datastore. You can always cancel the task in your Recent Tasks if you decide to continue.
The selected datastore might be storing vSphere Cluster Services VMs which cannot be powered off. To ensure the health of vSphere Cluster Services, these VMs have to be manually vMotioned to a different datastore within the cluster prior to taking this datastore down for maintenance. Refer to this KB article: KB 79892.

Select the checkbox Let me migrate storage for all virtual machines and continue entering maintenance mode after migration. to proceed.

huh, the checkbox is greyed out and I can’t click it.
vmotioned them and the process kept moving up.

How to remove a Datastore from a vSphere Cluster

How to Remove a Datastore

Intro

Hey everyone,

I figured I’d write up a quick little help guide on removing a Datastore. Now this isn’t new and likely to be buried on the internet because of it. However in my searches I have found the following sources to be great reads. I highly recommend you check them out.

1)  Official Source VMware KB2004605.

2) A Blog guide by Sam McGeown, here.

3) A post by Mike on cswitchzero.

Now let’s go through the checklist from the official source one by one.

Check List

  • If the LUN is being used as a VMFS datastore, all objects (for example, virtual machines, templates, and Snapshots) stored on the VMFS datastore must be unregistered or moved to another datastore.-This one is pretty easy navigate to the datastore files and check. You may find some remanence from the following though.
  • All CD/DVD images located on the VMFS datastore must also be unmounted/unregistered from the virtual machines.-This shouldn’t even be the case if you did check one.
  • The datastore is not used for vSphere HA heartbeat.-This setting will use a folder labeled “.vSphere-HA”
    For a Quick overview of Datastore Heart beating See here
    To “remove” aka change them See here
  • The datastore is not part of a datastore cluster.-You can find useless help on this process from VMware here. I’m assuming it’s an easy task via the WebUI
  • The datastore is not managed by Storage DRS.-If you removed it from the datastore cluster, how could this be an issue?
  • The datastore is not configured as a diagnostic coredump Partition/File and Scratch Partition. For more information, see the following:
  • Storage I/O Control is disabled for the datastore.-See here on how to enable (disabling is the exact reverse)
  • No third-party scripts or utilities running on the ESXi host can access the LUN that has issue.-Honestly I’m not sure how you could check this… even when doing some quick research, you can have scripts I guess that are not on the hosts, but run by alternative machines via PowerCLI. As described in this community post. I guess you’d have to know, either way the scripts would just fail, shouldn’t affect the vSphere cluster.
  • If the LUN is being used as an RDM, remove the RDM from the virtual machine. Click Edit Settings, highlight the RDM hard disk, and click Remove. Select Delete from disk if it is not selected and click OK.Note: This destroys the mapping file but not the LUN content.

    – This is more involving the removing of the backend physical device. Which in my case is the final goal. Though if yours was just to remove a datastore while keeping the physical storage in place this can be ignored.

  • As noted by Sam but not the official source or Mike is if you see a .dvsData folder. as stated by SAM “The .vdsData folder is created on any VMFS store that has a Virtual Machine on it that also participates in the VDS – so by migrating your VMs off the datastore you’ll be ensuring the configuration data is elsewhere.”
  • Check that there are no processes locking the VMFS with this command:
esxcli storage core device world list -d

Datastore Removal Steps

Step 1) Follow the Checklist above.

Make sure no files reside on the Datastore.

Step 2) Unmount Datastore from all ESXi hosts.

As noted by SAM blog post even in vSphere 5.x using the C# phat client, this was possible to do via a wizard against all hosts that have the datastore mounted. Even on the newer HTML5 WebUI this is still possible (I think everyone wants to fully forget that VMware chose flash for a short time).

At this point the Datastore will show up as inaccessible to vSphere. As noted by both Mike and Sam. This will be the same anywhere from 5.x-7.x (As noted by Mike it might be slightly more important to follow procedures with earlier versions of ESXi 3 or 4). If the Check list was followed, there should be no issues unmounting the datastore.

If you need to do this via esxcli (Source):

# esxcli storage filesystem list

Unmount the datastore by running the command:

# esxcli storage filesystem unmount [-u UUID | -l label | -p path ]

For example, use one of these commands to unmount the LUN01 datastore:

# esxcli storage filesystem unmount -l LUN01

# esxcli storage filesystem unmount -u 4e414917-a8d75514-6bae-0019b9f1ecf4

# esxcli storage filesystem unmount -p /vmfs/volumes/4e414917-a8d75514-6bae-0019b9f1ecf4

Step 3) Detach the LUN from all hosts.

As noted by Sam, if you are on 5.x you might want to automate this via PowerCLI. Then noted by Mike, newer 7.x can now do this in bulk via the Management WebUI.

6/7 WebUI -> Hosts n Clusters -> Hosts -> Cluster -> Host -> Configure Tab -> Storage Device (left side tree) -> Highlight Device -> Detach

for esxcli

Obtaining the NAA ID of the LUN to be removed

esxcli storage vmfs extent list

To detach the device/LUN, run the command:

# esxcli storage core device set --state=off -d NAA_ID

6. To verify that the device is offline, run the command:

# esxcli storage core device list -d NAA_ID

The output, which shows that the Status of the disk is off.

Step 4) Rescan HBAs

At this point, if you rescan all HBAs on all hosts the inaccessible datastore should be gone from the WebUI.

At this point you can remove the LUN from being seen (disc from showing up under devices) this will either be iSCSI based configurations (remove static and dynamic IPs from the iSCSI initiator settings on each host.) Mostly likely for a shared VMFS datastore.

It could be a local disc over a local storage controller (such as a logical drive created in RAID) such as behind a Pxxx storage controller.

Removing the source device will always be dependent on how it was configured in the first place.

Summary

So today we covered removing a Datastore. The important thing to remember is removing a Datastore takes a lot more steps than removing one, cause so many different VM’s and services can be applied to a datastore once it has started being used.

In many cases, the SysLog and Scratch partition are big hang ups, and should be looked at closely. Which, however, as stated if you are actually checking for files on the datastore this stuff will be pretty evident.

In most cases, ensure you follow the check list and the process should be pretty smooth. Hope this helps someone.

*Note* I often provide screen shots to provide some context, in this case I decided to leave it more generic to span multiple versions of vSphere.

ESXi new install; failed to create new Datastore

Well I booted up a new server, created a new logical drive, bot ESXi and Failed to create datastore… what is this?

Google help? Yeah Forms help.

1. Show connected disks.

ls -lha /vmfs/devices/disks/

(Verify the disk is seen. You will probably see your disk ID then :1. This is a partition on the disk. We only need to work about the main disk ID.)

Neat. next

2. Show the error on disk.

partedUtil getptbl /vmfs/devices/disks/(disk ID)

(It will probably indicate that the GPT is located beyond the end of the disk.)

Ohhh yeah, huh… fix it

3. Wipe disk and rewrite with a basic MSDOS partion.

partedUtil setptbl /vmfs/devices/disks/(disk ID) msdos

(The output from this should be similar to msdos and the next line will be o o o o)

Go to create data store after this, yay it worked. Please note to use your own values, images are just for reference.

*UPDATE* I went to reuse some old drives from an old RAID controller. In this case I had removed the logical drive from the old RAID configuration, pulled the disks. Since they were same Caddy as an alternative server, and went on to create some new logical drives to use as an alternative datastore on this particular host.

In the examples above, it would fail at creation of the datastore. In this example it failed at the point in the wizard to define the partition to create. with an error as follows:

“Either the selected disk already has a VMFS datastore or the host cannot perform a partition table conversion. Select another disk” in a nice red banner.

Now attempting My usual fix as mentioned above resulted in…

… to be updated (i have such a headache right now from the endless issues)

Had to clear the drives to fix this problem (delete logical drive) rip Drives out of server, use a USB enclosure to use “diskpart” and the “clean command on windows to clean the drives.

Then after that the health light on the server went off, saying my one disk or caddy is “unauthentic” even though it was just working. Apparently terrible engineer caddy’s.

Which to find out this issue I had to get into iLO which the admin password was unknown so had to run up my old blog post to get into that. and now after all that.. I have a headache.

Good job computers, you managed to make my day fantastic… again.

ESXi VM disconnected after applying patch

Keep this short.

I had to update a ESXi host locally as it’s mgmt connection would drop with all VMs having to go down on it. As it’s a single host with virtualized network components.

On one of the VMs this was an opportunity to update it since it needed to be shutdown temperately anyway. I took a snapshot of the VM, updated it, validated updates were fine, removed the snapshot. Then proceeded to update the host:

vim-cmd hostsvc/Maintenance_mode_enter
esxcli software vib update -d "/path/to/file.zip"
vim-cmd hostsvc/Maintenance_mode_exit
reboot

Nothing special here. However once the host came back up and I was able to access it via vCenter, one of the VM’s was shown as “disconnected” I’ve seen this with ESXi hosts before, but not particularly with a VM.

Oddly enough there’s only one datastore on the host and all other VMs are fine, and checking the datastore, all files are where they should be.

I figured maybe remove the VM from inventory and just re-add it via the vmx file, however the option was greyed out.

It turned out there was apparently still a snapshot left on the VM (noticed via delta files existing within the VMs folder path).

Removing all the snapshots resolved the issue. Turns out the VM was also running, but didn’t show the green play icon, thus I wasn’t aware of it’s powered on state. Which also explains the greyed out context menu for removing from inventory.

Hope this helps someone.

VMware HA down after 6.5 patch

The Story

So the other day I tested the latest VMware patch that was released as blogged about here.

Then I ran the patch on a clients setup which was on 6.5 instead of 6.7. Didn’t think would be much different and in terms of steps to follow it wasn’t.

First thing to note though is validating the vCenter root password to ensure it isn’t expired. (On 6.7u1 a newer)Else the updater will tell you the upgrade can’t continue.

Logged into vCenter (SSH/Console) once in the shell:

passwd

To see the status of the account.

chage -l root

To set the root password to never expire (do so at your own risk, or if allowed by policies)

chage -I -1 -m 0 -M 99999 -E -1 root

Install patch update, and reboot vCenter.

All is good until…

ERROR: HA Down

So after I logged into the vCenter server, an older cluster was fine, but a newer cluster with newer hosts showed a couple errors.

For the cluster itself:

“cannot find vSphere HA master”

For the ESXi hosts

“Cannot install the vCenter Server agent service”

So off to the internet I go! I also ask people on  IRC if they have come across this, and crickets. I found this blog post, and all the troubleshooting steps lead to no real solution unfortunately. It was a bit annoying that “it could be due to many reason such as…” and list them off with vCenter update being one of them, but then goes throw common standard troubleshooting steps. Which is nice, but non of them are analytical to determine which of the root causes caused it, as to actual resolve it instead of “throwing darts at a dart board”.

Anyway I decided to create an SR with VMware, and uploaded the logs. While I kept looking for an answer, and found this VMware KB.

Which funny the resolution states… “This issue is resolved in vCenter Server 6.5.x, available at VMware Downloads.”

That’s ironic, I Just updated to cause this problem, hahaha.

Anyway, my Colleague notices the “work around”…

“To work around this issue in earlier versions, place the affected host(s) in maintenance mode and reboot them to clear the reboot request.”

I didn’t exactly check the logs and wasn’t sure if there actually was a pending reboot, but figured it was worth a shot.

The Reboot

So, vMotion all VMs off the host, no problem, put into maintenance mode, no problem, send host for reboot….

Watching screen, still at ESXi console login…. monitoring sensors indicate host is inaccessible, pings are still up and the Embedded Host Controller (EHC) is unresponsive…. ugghhhh ok…..

Press F2/F12 at console “direct management as been disabled” like uhhh ok…

I found this, a command to hard reboot, but I can’t SSH in, and I can’t access the Embedded Host Controller… so no way to enter it…

reboot -n -f

Then found this with the same problem… the solution… like computer in a stuck state, hard shutdown. So pressed the power button for 10-20 seconds, till the server was fully off. Then powered it back on.

The Unexpected

At this point I was figuring the usual, it comes back up, and shows up in vCenter. Nope, instead the server showed disconnected in vcenter, downed state. I managed to log into the Embedded Host Controller, but found the VMs I had vMotion still on it in a ghosted state. I figured this wouldn’t be a problem after reconnecting to vCenter it should pick up on the clean state of those VM’s being on the other hosts.

Click reconnect host…

Error: failed to login with the vim admin password

Not gonna lie, at  this point I got pretty upset. You know, HULK SMASH! Type deal. However instead of smashing my monitors, which wouldn’t have been helpful, I went back to Google.

I found this VMware KB, along with this thread post and pieced together a resolution from both. The main thing was the KB wanted to reinstall the agents, the thread post seemed most people just need the services restarted.

So I removed the host from vCenter (Remove from inventory), also removed the ghosted VM’s via the EHC, enabled SSH, restarted the VPXA and HOSTD services.

/etc/init.d/hostd restart

/etc/init.d/vpxa restart

Then re-added the host to vCenter and to the cluster, and it worked just fine.

The Next Server

Alright now so now vMotion all the VMs to this now rebooted host. So we can do the same thing on the alternative ESXi host to make sure they are all good.

Go to set the host into maintenance mode, and reboot, this server sure enough hangs at the reboot just like the other host. I figured the process was going to be the same here, however the results actually were not.

This time the host actually did reconnect to vCenter after the reboot but it was not in Maintenance mode…. wait what?

I figured that was weird and would give it another reboot, when I went to put it into Maintenance Mode, it got stuck at 2%… I was like ughhhh wat? weird part was they even stated orphaned ghosted VM’s so I thought maybe it had them at this point.

Googling this, I didn’t find of an answer, and just when I was about to hard reboot the host again (after 20 minutes) it succeeded. I was like wat?

Then sent a reboot which I think took like 5 minutes to apply, all kinds of weird were happening. While it was rebooting I disconnected the host from vCenter (not removed), and waited for the reboot, then accessed this hosts EHC.

It was at this point I got a bit curious about how you determine if a host needs a reboot, since the vCenter didn’t tell, and the EHC didn’t tell… How was I suppose to know considering I didn’t install any additional VIBs after deployment… I found this reddit post with the same question.

Some weird answers the best being:

vim-cmd hostsvc/hostsummary|grep -i reboot

The real thing that made me raise my brow was this convo bit:

Like Wat?!?!?! hahaha Anyway, by this time I got an answer from VMware support, and they simply asked when the error happened, and if I had a snippet of the error, and if I rebooted the vCenter server….

Like really…. ok don’t look at the logs I provided. So ignoring the email for now to actually fix the problem. At this point I looked at the logs my self for the host I was currently working on and noticed one entry which should be shown at the summary page of the host.

“Scratch location not set”… well poop… you can see this KB so after correcting that, and rebooting the server again, it seemed to be working perfectly fine.

So removed from the inventory, ensured no VPXuser existed on the host, restarted the services, and re-added the host.

Moment of Truth

So after ALL that! I got down on my knees, I put my head down on my chair, I locked my hands together, and I prayed to some higher power to let this work.

I proceeded to enable HA on the cluster. The process of configuring HA on both host lingered @ 8% for a while. I took a short walk, in preparation for the failure, to my amazement it worked!

WOOOOOOOOO!!!

Summary

After this I’d almost recommend validating rebooting hosts before doing a vCenter update, but that’s also a bit excessive. So maybe at least try the commands on ESXi servers to ensure there’s no pending reboot on ESXi hosts before initiating a vCenter update.

I hope this blog posts helps anyone experiencing the same type of issue.

 

Creating Custom ESXi Image

Follow these steps

  1. Download Offline Bundle of ESXi Image
  2. Download Drivers E.G The Native ESXi USB NIC drivers
  3. Install PowerCLI (Set-ExecutionPolicy Remotesigned; Import-Module PowershellGet; Install-Module -Name VMware.PowerCLI)
  4. In PowerCLI connect the standard SoftwareDepot by typing:

    Add-EsxSoftwareDepot -DepotUrl <Path to zip>

  5. Get the ImageProfile list:

    Get-EsxImageProfile

  6. Clone standard ImageProfile:

    New-EsxImageProfile -CloneProfile ESXi-6.7.0-8169922-standard -Name MyProfile -Vendor <vendor>

  7.  [Only If Required] If your vib file has Acceptance Level – CommunitySupported, we need to set this Acceptance Level for our ImageProfile:

    Set-EsxImageProfile -ImageProfile MyProfile -AcceptanceLevel CommunitySupported

  8. Add our vib to SoftwareDepot:

    Get-EsxSoftwarePackage -PackageUrl <path to vib>

  9. Add our vib to ImageProfile:

    Add-EsxSoftwarePackage -PackageUrl

Error:

Search result.

Answer driver for specfic version (7.1, need 6.5)

So I downloaded the proper driver but I couldn’t figure out how to pick the right software package since the “get” command was actually already loaded the other driver, so it kept trying to add the 7.1 driver. Only thing I could think of was to close the powershell windows and start fresh…

10. Export ImageProfile to ISO image:

Export-EsxImageProfile -ImageProfile MyProfile -ExportToIso -FilePath

That was it! Sadly the laptop I wanted to use this on was still boot looping, and sadly the USB NIC “Insagnia” didn’t seem to work and was getting NFS4 client failed to load, and not network adapters found on the machine. But was worth a shot.

VMware vCenter Updates using VAMI

This is a quick post on the latest security release notification from VMware.

VMSA-2021-0002 (vmware.com)

If for whatever reason an update is not possible you can follow these workarounds.

While you can use VUM to distribute updates and patches to ESXi hosts.

You’ll have to use VAMI for updating vCenter.

You can download the latest patches here (vmware account required).

I did this on my lab vCenter,  took a lil while but not bad.

  1. Made a backup of the VCSA using Veeam
  2. Shutdown Veeam or any other backup solution that might use vCenter
  3. Notified anyone that might use vCenter that it would be inaccessible during update
  4. Attached ISO to VCSA VM (You can do as 4sysops did and upload to a datastore, or you can simply open the VCSA console via VMRC, and attach the ISO from your Downloads folder)
  5. Log into VAMI (https://vcsa:5480)
  6. Click Update on left nav, then Update -> Check CD-ROM
  7. The update should be available as the option, then click Stage and Install
  8. Accept the EULA, use/don’t use CEIP, Check I have a backup, Click Install.

It could take an hour or so, then everything is back to running state, here’s the summary page after completion:

You can read the alternative methods such as using CLI, or how to handle a vCenter HA cluster upgrade using the link above to 4sysops guide on upgrading vCenter.

Sorry this post is not as extensive as usual, just a heads up about the latest VMware patches. Stay Safe out there.

 

ESXi 6.7 on HPE DL380 G7

I had this long blog post I was going to write about HP screwing me on a ESXi upgrade, but in a nut shell you can read these ones about that how shebang:

  1. MonsterMuffin (crude)
  2. Claud “Admin” (Less crude)

As both of them mention you have to do a clean install, and you probably won’t have a config saved from that exact version as you are just updating to it, so your config on the old 5.x or 6.0 won’t work either. If you have dvswitches and all that fun jazz probably not a huge deal but if you have standard vswitches and lots of custom configurations around them including vlan tagging, well this can be crappy.

I did manage through all my trial n errors to get a working copy but it required workarounds I don’t think would have been supported, so meh just follow those…

I tried everything to get a ESXi system upgraded to 6.7 without loosing or reconfiguring the host, you figure just do anew install and reload the config.

However you can only load config for the same version a backup of that one was created. I eventually came across other things other PSOD’s and had to even at one point edit the boot file to remove a HP dedicated driver from loading. After all that meh, just install new with the custom images mentioned in the above Blogs.

I’m really sorry I would have covered these tasks in far more detail but spent a good couple days smashing my head just trying to get it to work. something are just not worth the effort, and blogging every annoying error and steps along the way on this one… is just one of those things.

Cheers.

Managing HPE Storage controllers on VMware ESXi

HPE Storage on ESXi

Quick Overview

Assumptions, Device drivers and tools are already on the ESXi host as servers such as these running on ESXi should be using authorized images from the vendor and on the Hardware Compatibility List (HCL).

If not use this guys blog on how to manually install the tools that should otherwise already be on the server in question.

I recently decided to double check some server setups running for testing. Since it was all tests I figured I’d talk about some of the implications of simple misconfigurations or even just the unexpected.

Most of these commands I used from following Kalle’s blog and the command list was super useful.

List PCI Devices

To start if you are in a hoop and need to find what storage controller is in use by the hypervisor, run this to list all the devices (least the ones on the PCI bus)

lspci -vvv

This will present you this a long list of devices for my test device (an HP DL385 Gen8) it turned out to be an HP Smart Array P420i:

That’s cool.

Storage Config

To see the current config run:

/opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show config

This shows to me what I already knew, I have 2 logical drives both created with RAID 1+0 tolerance with different amount of different sized drives. In this case one from 4 900 Gig SAS drives, and the other from 12 300 Gig SAS drives.

From this information we can’t determine the speed of the drives.

Controller Status

To view the status of the controller:

 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show status

From this we can tell the type of controller, double verifying the results from the lspci command and that there is cache available. Still not sure at this point what type of cache we are dealing with. Our goal is to use the Battery Based Write Cache for the logical volumes.. but we still have some things to cover before we get there.

 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show detail

with these details we get to see more of the juicy information, here we can tell we have a cache board for the controller available in “slot 0” as indicated by the “slot” attribute.

Also note the Drive Write Cache, which is when the physical drive itself enables cacheing. However, we again, want to use the BBWC to prevent data loss in the event of a power outage as to not leave our VM’s with corrupted virtual drives. Read this thread on a bit more details about this.

Physical Disk Status

To view all the disks and if they are OK:

/opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd all show status

in my case they were all OK.

Physical Disk Details

Now this is where we get to see more details on those SAS disks I talked about ealier:

 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd all show detail

here we can tell now that the 300 Gig SAS disk is a 10K SAS disk, not bad… 🙂

Logical Drive Status

Run this to get a very basic status report of the logical drives created from all the physical drives.

 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld all show status

 

Logical Drive Details

Change the all to the logical volume ID number, in this case 2 for the 300 Gig based array.

/opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 show

Just to how the difference against the logical disk I know I enabled cache on and has unreal better performance…

/opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 1 show

Now I created these logical drives during the boot of the server using the BIOS/EFUI tools on the system. Lucky though we can adjust these settings right from the esxcli. 🙂

Enable Logical Write Cache

Just like magic:

/opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 logicaldrive 2 modify arrayaccelerator=enable

Being specific to change logical drive 2 which was the one that did not have cache enabled originally… checking it after running the above command shows it has cache! 🙂

SSD SmartPath Caveat

One thing I noticed when playing with SSDs in HPE servers…

Here’s a post about why SSD Smart path is not always a good choice. (Note its down have to use Google cache).

I’ll let these graphs speak for themselves…

Latency went from SmartPath 14ms, No Cache 9ms, BBWC 4ms while doing the cloning operation. With BBWC it completed so fast I didn’t even need to cancel. 10x performance increase.

Interesting Side Story

I was going over this blog post while checking storage on my homelabs DL380 G6. I had it powered off for a while and I noticed some terrible latency times on the write operations on the datastore as I was vMotioning a VM to it. As it turns out the battery write cache doesn’t charge the battery when the server is powered off and still plugged in.

For me to took about n hour n a half to 2 hours for the battery status to change and the write cache to become enabled. I’ll let this chart speak for itself as well…

I also found this really cool hack if you have a dead BBWC battery you can hack it to use regular batteries. This is so cool I kinda wish I remembered what I did with the old dead one I had…

All Commands

Just incase Kalle’s site goes down here’s the list he shared for both ESXi 5.x and 6.x

Show configuration
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl all show config
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show config
Controller status
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl all show status
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show status
Show detailed controller information for all controllers
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl all show detail
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show detail
Show detailed controller information for controller in slot 0
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 show detail
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 show detail
Rescan for New Devices
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli rescan
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli rescan
Physical disk status
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 pd all show status
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd all show status
Show detailed physical disk information
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 pd all show detail
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd all show detail
Logical disk status
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld all show status
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld all show status
View Detailed Logical Drive Status
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 show
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 show
Create New RAID 0 Logical Drive
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 create type=ld drives=1I:1:2 raid=0
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 create type=ld drives=1I:1:2 raid=0
Create New RAID 1 Logical Drive
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 create type=ld drives=1I:1:1,1I:1:2 raid=1
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 create type=ld drives=1I:1:1,1I:1:2 raid=1
Create New RAID 5 Logical Drive
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 create type=ld drives=1I:1:1,1I:1:2,2I:1:6,2I:1:7,2I:1:8 raid=5
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 create type=ld drives=1I:1:1,1I:1:2,2I:1:6,2I:1:7,2I:1:8 raid=5
Delete Logical Drive
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 delete
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 delete
Add New Physical Drive to Logical Volume
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
Add Spare Disks
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 array all add spares=2I:1:6,2I:1:7
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 array all add spares=2I:1:6,2I:1:7
Enable Drive Write Cache
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify dwc=enable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 modify dwc=enable
Disable Drive Write Cache
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify dwc=disable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 modify dwc=disable
Erase Physical Drive
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 pd 2I:1:6 modify erase
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd 2I:1:6 modify erase
Turn on Blink Physical Disk LED
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 modify led=on
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 modify led=on
Turn off Blink Physical Disk LED
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 modify led=off
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 modify led=off
Modify smart array cache read and write ratio (cacheratio=readratio/writeratio)
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify cacheratio=100/0
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 modify cacheratio=100/0
Enable smart array write cache when no battery is present (No-Battery Write Cache option)
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify nbwc=enable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 modify nbwc=enable
Disable smart array cache for certain Logical Volume
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 logicaldrive 1 modify arrayaccelerator=disable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 logicaldrive 1 modify arrayaccelerator=disable
Enable smart array cache for certain Logical Volume
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 logicaldrive 1 modify arrayaccelerator=enable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 logicaldrive 1 modify arrayaccelerator=enable
Enable SSD Smart Path
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 array a modify ssdsmartpath=enable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 array a modify ssdsmartpath=enable
Disable SSD Smart Path
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 array a modify ssdsmartpath=disable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 array a modify ssdsmartpath=disable

vCenter 503 Service Unavailable

I was going to test a auditing script from a DefCon presenter on my AD server, when I was adding the USB controller and the USB stick I was passing thorugh to get the script in my VM was being weird.

First USB 3.0 connected just fine, and connected the USB device to the VM, but diskpart was not showing it. So I went to remove it and try a USB 2.0 controller, that failed to connect since the USB 3.0 was still showing there and I selected to remove it again, which it errored another concurrent task. Makes sense, till refreshing the page told me unprivileged account. I wasn’t sure what this was about, so I decided to open another window and navigate to my center web app… 503 service unavailable:

“503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x000055aec30ef1d0] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)”

What the… rebooting the VCSA showed no success still same error even with an incognito window.. ughh.

I found this thread: https://communities.vmware.com/thread/588755

I was going through this, and decided to try to renew the certs, even though my internal PKI certs were still valide (AFAIK, and checking the cert provided when accessing the page). Now here’s the thing, while I ran the certificate-manager script and renewed all the certs, I noticed my AD server somehow was down. I booted it back up. I’m not exactly sure which fixed it. So I decided to take another snapshot while it was in this “fixed state” and revert to the  broken state. After restoring o the broken state nothing was responding at all on the https service from the VCSA, so I gave it a simple reboot (which I did initially before I noticed my AD server was down, for some reason). Sure enough after the reboot everything was working fine with my internal PKI certs.

I guess if you set vCenter to use MS AD as the primary login domain and that domain is not available the web management service becomes unavailable… that kind of sucks. I should have noticed my AD was not operational but I didn’t have monitoring on it 😉 or use my local workstation as a AD member. Mostly just random VMs I have for testing.

Like most people, should have looked at the logs for a better idea of what the root cause was. I threw 2 darts at a dart board and had to revert to find the true root cause. Not the best way to troubleshoot, but sometimes if logs are not available it is another method…