SharePoint an Update Conflict

So the other day I was getting my test environment replicated to the latest state of production. Now I did spin up my front end before replicating it, when I noticed it in the CA I powered it down and replicated it fresh, but after I had already replicated the DB server without re-replicating it, this was more than likely the cause of this problem.

So after having replicated the front end and spinning it back up, I went to make a site run from HTTP to HTTPS. So got my cert ready, bound it to my IIS site listener, go to CA to edit the Alternative Access Mappings (AAMs) and….

ERROR … “An update conflict has occurred, and you must re-try this action. The object SPFarm Name=SharePoint_Config is being updated by DOMAIN\username, in the STSADM process, on machine MACHINE-NAME. View the tracing log for more information about the conflict.”

Googling this there are a couple good references like this old one on the sharepointdiary even from SP 2007 so it’s a long known thing. Just not to me. 😛

This one also helped me as it covered the upper two resources point to older directories.

Resolution

    1. Stop the SharePoint Timer Service
    2. On the Front end Navigate to: %SystemDrive%\ProgramData\Microsoft\SharePoint\Config
    3. Find the folder with dedicated numbers and hyphens.
    4. Delete everything but cache.ini (make a backup if you want of cahce.ini)
    5. restart the timer service

I noticed when I went back to the CA almost all my collections were missing in the configure alternative access mappings. So I rebooted the front end.

After that I was able to adjust the AAMs without issue. Hopes this helps someone.

Veeam – Can’t get service content. Soap fault.

So the other day I added a new Windows managed server to Veeam and as usual I came  across some errors and issues that had to be resolved, and some tips on what too look out for to resolve them. Besides the one error being used for two different issues (network vs authorization), it’s generally not that bad and easy to decypher exactly which of the two is the cause. However sometimes you come across an error that seems to have multiple causes and knowing which one it is can be sometimes difficult to diagnose.

Today was one of those things, after adding the newly added managed server as a Veeam vSphere Proxy I was hit with this error when attempting to complete any replication jobs…

Processing configuration Error: Client error: Cannot get service content.
Soap fault. No DataDetail: 'get host by name failed in tcp_connect()', endpoint: 'https://vcenter.domain.local:443/sdk'

Googling this I found one post on the Veeam forms that was a basic dead end.

And this nice thread on Spiceworks.

The only thing different between this Proxy and my other one was that it was not domain joined, which I didn’t see as a pre-req… and sure enough it’s not, but in my case it was phlights response that nailed it for me:

“I attempted to connect to vcenter from my remote proxy and found that it didn’t have an entry for vcenter in DNS.  Remoted into vcenter and performed ipconfig /registerdns.  Remote proxy could then connect to vcenter.  I did a test replication job successfully. Yeah!”

In my case the error showed the vcenter server by the hostname that was not fully qualified, domain joined machines will auto add the domain suffix on a DNS request, but in this case a standalone system, even pointing to the same DNS servers, won’t. As soon as I saw this I had two options:

  1. Add a domain suffix in the DNS settings of the Proxy as to make the vcenter server lookup succeed OR
  2. Just add a static record in the Proxy host file.

since I didn’t need this system to do any other particular domain looks up I simply did #2. Then my Replication job worked. Why it didn’t fall back to another proxy that did work is beyond me…..

Also why the proxy needs to communicate with vCenter is also beyond me…

Veeam – Adding a Windows Managed Server

Unlike most other blog posts that seem to love to follow the “happy path”, that never happens with me so I’m going to go over this cause something WILL go wrong…

Pre-required reading.

Now I got this as my first error attempting to add the server:

Things to check here:

  1. Network and services:
    In my case first issue was DNS, and DNS cache, since I added a newly created hostname the Veeam server was attempting to query it’s local DNS cache, I had to ensure all DNS servers had a valid record (nslookup/dig) then validate those on the local system (ping) which failed and required a local DNS cache flush (ipconfig /flushdns).Also make sure you didn’t click “No” when connected to the network, else it would have set the firewall zone to “Public”, change it back to Private or open the firewall accordingly.
  2. File and Print Services on target:
    Next I had to create a temp share folder to ensure share services were started (since I was using Windows 10, and not Server 2016/2019), otherwise much like others have mentioned… somewhere (I’ll link if I find the Veeam thread again).
  3. This can also show up if the user account is incorrectly entered or if used as “.\user”. While this was stated as a solution to an alternative issue (to be mentioned below), I got the error above using the account in that syntax. I had to use “HOSTNAME\USERNAME”.

The second error I got was:

Things to check here:

  1. Are you using local accounts? (Managed Server being added not part of domain) More than likely yes (otherwise you haven’t granted the domain account local administrative rights on the server being added).In this case as covered in this Veeam thread.

This issue is not Veeam specific rather MS specific, which has been the case since the inception of Windows Vista.

If you are in this boat you have 3 options:

  1. Join the host to the same domain as Veeam. Created a dedicated domain account and place into the managed server local admins groups (preferably via GPO).*Most recommend

    If domain joining is out of the question these are the other 2 options…

  2. Enable and use the built in local administrator account “HOSTNAME\Administrator)*Recommend if domain join not possible (It’s less likely that this account would be directly compromised vs the alternative solution). This is also mentioned by Gostev directly in the Veeam thread shared above.
  3.  Disable UAC for local account to utilize remote calls:
cmd /c reg add HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\system /v LocalAccountTokenFilterPolicy /t REG_DWORD /d 1 /f

This adds a reg key to disable UAC. as Mentioned by Gostev why this isn’t done automatically as it’s a security risk. No solution seems good here (besides domain joining). In this case it’s better to just use to local admin account… ughhh.

and sure enough using the local administrator account worked and the wizard moved on…

The rest of it’s a wizard, if you got to this point there should be no other major issues moving on…

*UPDATE* Veeam 11, if you can’t get option 2 to work, you’ll have to update to Veeam 11a, whenever that’s set to be released. See this Veeam Forum post for more details. Only option for V11 is to disable UAC… :S

Creating a Windows Image for Deployment without WDS/ADM

Creating a Windows Image

Intro

If you got the administrative team in your org to handle all the ins and outs of WDS and ADM deployments including using DISM for driver injection for custom images and ALL THAT jazz. Rock on!

If you find you have only a small amount of OS’s to support and want to keep up with only one image of the latest version with set software without having to learn all the WDS and ADM, and networking with PXE and DHCP and all that fun stuff then this might be a nice alternative for your needs.

Intro

You can technically prep the OS on any laptop or a VM as long as you generalize the image during the final prep stage into OOBE mode. If you want to keep hardware specific drivers and such then do not generalize the image.

Basic Steps

I used a VM and started it as follows:

  1. Boot and install Windows 10 (I gave it a bare minimum 40 Gig drive)
  2. At the OOBE first time boot after install, press CTRL+SHIFT+F3, let the system boot into the admin account using AUDIT mode.
  3. Install Apps, All Updates, run the community decrapifer, clean up the start menu.
  4. in Sysprep Pick OOBE and shutdown
  5. Using Linux Live DD drive to any shared storage.
  6. Use Linux Live to DD image from shared storage to local desktop/laptop drive and boot, which will boot right into OOBE ready for AD joining or whatever.

Caveats

*known caveat…

  1. Don’t open Bitlocker area while configuring the system in Audit mode.

All good right? Well normally after I dd the image and let the OOBE run, I like to extend the partition space from the initial 40 to the remainder of the local disk, whatever size it happens to be until…

You might wondering what the big deal is here, now with the existing system, not much, however if I extend the VM’s drive to replicate copying this image over, you will notice the ability to extend the partition is grayed out this is cause the sectors of the partition would not be contiguous, which is not allowed with partitions on a disk… since MS provides no way to nicely move it using Disk Management we have to rely on other tools. In this case I’m going to rely on gparted.

You can grab gparted live from here

GParted to Move WinRE Partition

This is the part that sucks the most cause even though it’s easily possible MS made the install wizard place the WinRE partition at the end of the disk (this might be able to be manually configured, but I did the ol, pick disk, click install.

Normally I’d have my IODD device with a huge drive attach to write stuff to, but in this case since it’s a VM I’ll add a 1 Gig disk to save the WinRE partition to while moving the rest of the data to the end of the disk.

So add drive, then edit VM boot options to force into EUFI menu, then once powered on, upper right click disk icon and mount ISO, once all green dots to indicate they are mounted, boot from CD/DVD…

No keymap changes, enter yeah default lanuage, auto login whatever…

there it is, the two drives we want to work with, now lets quickly format the 1 gig drive..

So create the MBT (Device -> create partition table), new partition, all space, FS NTFS, click green check mark.

Awesome, as you can see I then mounted that partition and used DD to copy the WinRE partition as a whole…

I sort of covered up the gparted window but I know /dev/sda4 was the WinRE partition based on the size and information.

Now for the biggest pain… we have to delete that partition, move the data partition over and then re-create the open space as the same partition we just deleted, and copy the contents back, so that the data partition sectors can be contiguous even though it remains the “3rd” partition.

Weird computer science…

Anyway let’s do this..

Delete the WinRE Parittion, /dev/sda4 in my case:

shift /dev/sda3 to end of disk, I selected move/size and just dragged the partition on the slider all the way to the right, click ok and it should look like this:

Now we re-create the /dev/sda4 as ntfs and hidden,diag flags:

*NOTE* This takes a lot of CPU and Disk I/O as the all the data has to be shifted which is also why the alerts of possible data loss (if there’s any issue with the actual disk). So the time it takes depends on the size of the over data partition, also recommend to only do this after a backup, or some alternative copy is made. In terms of VMware I had this on it’s own VM cloned from another, I avoided a snapshot as it would create a delta v file larger than simply cloning the VM. High CPU and Disk I/O was noticed during this operation.

*Note* Manage flags will be grayed out till the partition is created and applied.

Then we copy the contents back…

Time for the fun part, does windows still boot?

Sweet still boots, still in sysprep audit mode, lets quickly check the disk.

chkdsk /f

all’s good, and here’s a nice picture to show the data drive at the end of the disk as to be extendable to any machine it’s deployed to:

I can now remove the 1GB drive, and fix the WinRE now.

Fix WinRE

Checking with BCDedit

bcdedit

Checking with reagentc

reagentc /info

bcdedit /set {current} recoveryenabled no
bcdedit /deletevalue {current} recoveryenabled
bcdedit /deletevalue {current} recoverysequence

Then mount that recovery partition we moved above (part 4)

diskpart
select disk 0
select part 4
assign letter=r

change dir to c:\windows\system32\recovery

fix up reagent.xml to:

<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<WindowsRE version="2.0">
	<WinreBCD id=""></WinreBCD>
	<WinreLocation path="" id="0" offset="0"></WinreLocation>
	<ImageLocation path="" id="0" offset="0"></ImageLocation>
	<PBRImageLocation path="" id="0" offset="0" index="0"></PBRImageLocation>
	<PBRCustomImageLocation path="" id="0" offset="0" index="0"></PBRCustomImageLocation>
	<InstallState state="0"></InstallState>
	<OsInstallAvailable state="0"></OsInstallAvailable>
	<CustomImageAvailable state="0"></CustomImageAvailable>
	<WinREStaged state="0"></WinREStaged>
	<ScheduledOperation state="4"></ScheduledOperation>
	<OperationParam path=""></OperationParam>
	<OsBuildVersion path=""></OsBuildVersion>
	<OemTool state="0"></OemTool>
</WindowsRE>

Then use reagentc to reset the path:

reagentc /setreimage /path r:\recovery\WindowsRE

This succeeded then

reagentc /enable

and of course error till I read this technet… offf

huh..

reagentc /enable /auditmode

Yay that worked..

Managing HPE Storage controllers on VMware ESXi

HPE Storage on ESXi

Quick Overview

Assumptions, Device drivers and tools are already on the ESXi host as servers such as these running on ESXi should be using authorized images from the vendor and on the Hardware Compatibility List (HCL).

If not use this guys blog on how to manually install the tools that should otherwise already be on the server in question.

I recently decided to double check some server setups running for testing. Since it was all tests I figured I’d talk about some of the implications of simple misconfigurations or even just the unexpected.

Most of these commands I used from following Kalle’s blog and the command list was super useful.

List PCI Devices

To start if you are in a hoop and need to find what storage controller is in use by the hypervisor, run this to list all the devices (least the ones on the PCI bus)

lspci -vvv

This will present you this a long list of devices for my test device (an HP DL385 Gen8) it turned out to be an HP Smart Array P420i:

That’s cool.

Storage Config

To see the current config run:

/opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show config

This shows to me what I already knew, I have 2 logical drives both created with RAID 1+0 tolerance with different amount of different sized drives. In this case one from 4 900 Gig SAS drives, and the other from 12 300 Gig SAS drives.

From this information we can’t determine the speed of the drives.

Controller Status

To view the status of the controller:

 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show status

From this we can tell the type of controller, double verifying the results from the lspci command and that there is cache available. Still not sure at this point what type of cache we are dealing with. Our goal is to use the Battery Based Write Cache for the logical volumes.. but we still have some things to cover before we get there.

 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show detail

with these details we get to see more of the juicy information, here we can tell we have a cache board for the controller available in “slot 0” as indicated by the “slot” attribute.

Also note the Drive Write Cache, which is when the physical drive itself enables cacheing. However, we again, want to use the BBWC to prevent data loss in the event of a power outage as to not leave our VM’s with corrupted virtual drives. Read this thread on a bit more details about this.

Physical Disk Status

To view all the disks and if they are OK:

/opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd all show status

in my case they were all OK.

Physical Disk Details

Now this is where we get to see more details on those SAS disks I talked about ealier:

 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd all show detail

here we can tell now that the 300 Gig SAS disk is a 10K SAS disk, not bad… 🙂

Logical Drive Status

Run this to get a very basic status report of the logical drives created from all the physical drives.

 /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld all show status

 

Logical Drive Details

Change the all to the logical volume ID number, in this case 2 for the 300 Gig based array.

/opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 show

Just to how the difference against the logical disk I know I enabled cache on and has unreal better performance…

/opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 1 show

Now I created these logical drives during the boot of the server using the BIOS/EFUI tools on the system. Lucky though we can adjust these settings right from the esxcli. 🙂

Enable Logical Write Cache

Just like magic:

/opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 logicaldrive 2 modify arrayaccelerator=enable

Being specific to change logical drive 2 which was the one that did not have cache enabled originally… checking it after running the above command shows it has cache! 🙂

SSD SmartPath Caveat

One thing I noticed when playing with SSDs in HPE servers…

Here’s a post about why SSD Smart path is not always a good choice. (Note its down have to use Google cache).

I’ll let these graphs speak for themselves…

Latency went from SmartPath 14ms, No Cache 9ms, BBWC 4ms while doing the cloning operation. With BBWC it completed so fast I didn’t even need to cancel. 10x performance increase.

Interesting Side Story

I was going over this blog post while checking storage on my homelabs DL380 G6. I had it powered off for a while and I noticed some terrible latency times on the write operations on the datastore as I was vMotioning a VM to it. As it turns out the battery write cache doesn’t charge the battery when the server is powered off and still plugged in.

For me to took about n hour n a half to 2 hours for the battery status to change and the write cache to become enabled. I’ll let this chart speak for itself as well…

I also found this really cool hack if you have a dead BBWC battery you can hack it to use regular batteries. This is so cool I kinda wish I remembered what I did with the old dead one I had…

All Commands

Just incase Kalle’s site goes down here’s the list he shared for both ESXi 5.x and 6.x

Show configuration
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl all show config
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show config
Controller status
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl all show status
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show status
Show detailed controller information for all controllers
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl all show detail
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl all show detail
Show detailed controller information for controller in slot 0
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 show detail
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 show detail
Rescan for New Devices
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli rescan
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli rescan
Physical disk status
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 pd all show status
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd all show status
Show detailed physical disk information
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 pd all show detail
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd all show detail
Logical disk status
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld all show status
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld all show status
View Detailed Logical Drive Status
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 show
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 show
Create New RAID 0 Logical Drive
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 create type=ld drives=1I:1:2 raid=0
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 create type=ld drives=1I:1:2 raid=0
Create New RAID 1 Logical Drive
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 create type=ld drives=1I:1:1,1I:1:2 raid=1
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 create type=ld drives=1I:1:1,1I:1:2 raid=1
Create New RAID 5 Logical Drive
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 create type=ld drives=1I:1:1,1I:1:2,2I:1:6,2I:1:7,2I:1:8 raid=5
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 create type=ld drives=1I:1:1,1I:1:2,2I:1:6,2I:1:7,2I:1:8 raid=5
Delete Logical Drive
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 delete
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 delete
Add New Physical Drive to Logical Volume
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
Add Spare Disks
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 array all add spares=2I:1:6,2I:1:7
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 array all add spares=2I:1:6,2I:1:7
Enable Drive Write Cache
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify dwc=enable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 modify dwc=enable
Disable Drive Write Cache
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify dwc=disable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 modify dwc=disable
Erase Physical Drive
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 pd 2I:1:6 modify erase
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 pd 2I:1:6 modify erase
Turn on Blink Physical Disk LED
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 modify led=on
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 modify led=on
Turn off Blink Physical Disk LED
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld 2 modify led=off
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 ld 2 modify led=off
Modify smart array cache read and write ratio (cacheratio=readratio/writeratio)
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify cacheratio=100/0
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 modify cacheratio=100/0
Enable smart array write cache when no battery is present (No-Battery Write Cache option)
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify nbwc=enable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 modify nbwc=enable
Disable smart array cache for certain Logical Volume
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 logicaldrive 1 modify arrayaccelerator=disable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 logicaldrive 1 modify arrayaccelerator=disable
Enable smart array cache for certain Logical Volume
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 logicaldrive 1 modify arrayaccelerator=enable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 logicaldrive 1 modify arrayaccelerator=enable
Enable SSD Smart Path
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 array a modify ssdsmartpath=enable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 array a modify ssdsmartpath=enable
Disable SSD Smart Path
ESXi 5.5 -> /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 array a modify ssdsmartpath=disable
ESXi 6.5 -> /opt/smartstorageadmin/ssacli/bin/ssacli ctrl slot=0 array a modify ssdsmartpath=disable

Setting SPN causes Credential Prompts on SharePoint Site

I’ll keep this post short. So the other day noticed after doing some audits that some sites were not using kerberos, even though the SharePoint Web Application Auth Providers settings were already configured to use Kerberos. Which in most cases will always fall back to the less secure NTLM auth method. Sure enough the SPN was not configured for the service.

So in a test environment…

SETSPN -S HTTP/SPSite domain\webappserviceaccount

To my dismay when I attempted to access the site I was present with a credential prompt, entering my creds did auth succeed, but it shouldn’t have prompted for creds considering all requirements for Kerberos to work is there, and if that failed fall back to NTLM. In either case the SSO part is usually handled by the internet security settings on the client machines. Since these are all managed by company based GPOs. I know they were in fact good as nothing there has changed, and the site was working fine before setting the SPN.

Googling this I only a couple examples of this, like here I attempted a reboot and that failed. Since it was test I could start over again, and verified the only change was the setting of the SPN which caused this to happen, even though everyone is stating it’s not related. It this case it def was.

The only solution I found from my testing was to:

  1. Go to the Web Application in CA
  2. Highlight the problematic Site, click on Auth Providers in the Ribbon
  3. Click default (claims Auth)
  4. Switch it back to NTLM. (Watch Front End server resources spike as IIS is reconfigured) (Can’t remember if reboot was required here)
  5. after it’s done access the site. Ensure no prompts for creds and SSO works as intended.
  6. ensure SPN exists and is proper.
  7. Reset the Site Auth providers setting back to Kerberos. (Again reboot may be required)
  8. Access the site, SSO (no prompt for creds after already logged on) and Kerberos (klist shows a TGTicket) should work as expected.

When I went to implement this in production I figured it was less risky to just set the auth provider to NTLM before even setting the SPN thus there should be no point in time where it prompts for credentials for the end user. Despair ensues…

So set auth to NTLM.. Prompts for creds (wait what…), even worse enter creds many times and site will not load… WAT!?!

In a panic I call my superior, he wants to look through the log, but there show no major indicators (event viewer) I mention my usual quick n easy first thing to try… yeah, reboot! Sure enough a reboot resolved the issue. Not sure why that happened but conitue on as above from this point and sure enough got both SSO and kerberos working as intended.

Let this be a friendly reminder that even though you test stuff in test, even the slightest change in your procedure can have devastating consequences. Hope this post was insightful for someone.

 

vCenter 503 Service Unavailable

I was going to test a auditing script from a DefCon presenter on my AD server, when I was adding the USB controller and the USB stick I was passing thorugh to get the script in my VM was being weird.

First USB 3.0 connected just fine, and connected the USB device to the VM, but diskpart was not showing it. So I went to remove it and try a USB 2.0 controller, that failed to connect since the USB 3.0 was still showing there and I selected to remove it again, which it errored another concurrent task. Makes sense, till refreshing the page told me unprivileged account. I wasn’t sure what this was about, so I decided to open another window and navigate to my center web app… 503 service unavailable:

“503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x000055aec30ef1d0] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)”

What the… rebooting the VCSA showed no success still same error even with an incognito window.. ughh.

I found this thread: https://communities.vmware.com/thread/588755

I was going through this, and decided to try to renew the certs, even though my internal PKI certs were still valide (AFAIK, and checking the cert provided when accessing the page). Now here’s the thing, while I ran the certificate-manager script and renewed all the certs, I noticed my AD server somehow was down. I booted it back up. I’m not exactly sure which fixed it. So I decided to take another snapshot while it was in this “fixed state” and revert to the  broken state. After restoring o the broken state nothing was responding at all on the https service from the VCSA, so I gave it a simple reboot (which I did initially before I noticed my AD server was down, for some reason). Sure enough after the reboot everything was working fine with my internal PKI certs.

I guess if you set vCenter to use MS AD as the primary login domain and that domain is not available the web management service becomes unavailable… that kind of sucks. I should have noticed my AD was not operational but I didn’t have monitoring on it 😉 or use my local workstation as a AD member. Mostly just random VMs I have for testing.

Like most people, should have looked at the logs for a better idea of what the root cause was. I threw 2 darts at a dart board and had to revert to find the true root cause. Not the best way to troubleshoot, but sometimes if logs are not available it is another method…

Installing PowerCLI 12.0 Offline

PowerCLI 12.0

Offline Install

Checking VMwares source wasn’t too insightful…

Just this with the “Download” button redirecting to an alternative site non-other than powershellgallery.com …clicking manual Download gives you the raw nuget package let’s try to install first normally.

Install-Module -Name VMware.PowerCLI

No way it failed, expected, and it even states a warning about the network.

Alright so using an online computer copy the nuget package to the offline (use USB sticks, Floppy drives, Zip Drives, serial modem if that’s what it takes…)

In my case I was testing this on a VM and simply used a USB stick to mount it to the VM from the VMRC console, and copied the nuget file to c:\temp\PowerCLI

This from this MS Doc page on the cmdlet, is for Visual Studio, we are using powershell only…

This topic describes the command within the Package Manager Console in Visual Studio on Windows. For the generic PowerShell Install-Package command, see the PowerShell PackageManagement reference.

Sure enough this is where I gave up on this path. All the new stuff is nice with it all being connected makes life super easy, but in those locked down situations this is a hassel. Since I wasn’t sure how to install the nuget package via a simple ID option like Install-Package for VS PS, there wasn’t one for the regular PS Install-Package cmdlet. Then I went to google how to accomplish this and was a bit annoyed at all the steps required to do it via the package manager… Read this by William on Stackoverflow for more details.

Lucky for me I found an alternative blog post, which does an alternative offline install and much, much simpler.

From the online system instead of saving the nuget package we save the modules files themselves directly.

 Save-Module -Name VMware.PowerCLI -Path C:\temp\PSModules

Copy the entire contents of the PSModules folder to a storage medium of your choice (e.g. USB flash drive) and transfer the files to the desired offline system where PowerCLI is needed.

If you have admin rights on the target system, you can copy files to the location below.

 C:\Program Files\WindowsPowerShell\Modules

At this point he goes on about some settings and stuff, I wasn’t exactly sure how to use PowerCLI, as usually it opens up in a custom PS window before. Now you simple import-module *modulename*

Import-Module VMware.PowerCLI

Now creating custom ESXi images should be a breeze!

Extra Bits

Customer Experience Improvement Program (CEIP)

The VMware Customer Experience Improvement Program collects data about the use of VMware products. You can either agree (true) or disagree (false). For offline systems, only the rejection (false) makes sense. The command shown below suppresses future notifications within PowerCLI.

Set-PowerCLIConfiguration -Scope AllUsers -ParticipateInCeip $false

Ignore invalid SSL certificates

When using self-signed certificates in vCenter, PowerCLI will deny the connection. This behavior can be suppressed with the command:

Set-PowerCLIConfiguration -Scope AllUsers -InvalidCertificateAction Warn

Found the types from this old 5.1 documentation you can also set it to ignore instead of warn. 🙂 Cheers!

Palo Alto Networks Cert Import Stuck Uploading

Using latest browser indeed gets stuck importing certificate:

Uploading SSL Certs stuck on Uploading Screen
byu/thehayk inpaloaltonetworks

Yup had to use IE, sigh I’ll never get away from this browser. Same with locking down mixed content and blocking iframes using lower grade TLS 1.0 or 1.1. So in these cases I still have to tell people to use an older browser. How does this increase security when functionality is removed for perceived security risks. When lots of these systems can be in locked down networks where these risks of lower cypher suites are low?

Now we have to tell people to use older more insecure browsers to access resources or older web services, then they start browsing the internet inadvertently with a vulnerable browser.

Thanks Google, *slow clap*…

Oh yeah also when you make your certs, use “Host Name” not Alt Name to create proper certs with Subject Alternative Names

ESXi Upgrade Failure

Upgrading one of my ESXi hosts in my lab failed on me, sure enough I figured this might happened and put a head on my usually headless server. This means I plugged in a monitor. at the screen I was this:

well that sucks, googling I found this thread from VMware.

looking closer at the boot error before this it stated:

system does not have secure boot enabled. This being an old mini desktop from the mid 2000’s it had uEFI but did not have the “feature” of secure boot. Clearly an after thought of the time. Now the odd part is when I hit the boot menu key “f12” in my case, I had the “legacy” BIOS style, list as P0: Hard Disk and EFI: Hard Disk. When I picked P0 one it booted just fine. So I figured just a simple boor order fix adjust some settings much like the thread (disable EFI boot and stick with legacy). I couldn’t see a way in my EFI/BIOS options to disable the alternative boot types, so I put the legacy type at the top of the list and the EFI one at the bottom, yet every time I booted it would boot the EFI one. When I check the vCentre system it wouldn’t remediate aka update to the new version, so I had to click remediate, run downstairs, and ensure I was there to pick the Legacy Disk boot, even after setting the boot order in the BIOS it wouldn’t stick to legacy and this was the only way I could get the upgrade to succeed.

Dang Computers…

Oh yeah.. this happened to me to, while I was trying to migrate some servers, I wanted to move some VM’s vNic into different VMPGs so I decided to rename the one they were currently using. I created the new VMPG in the alternative vSwitch, and i was a bit stumped to see them already there. I had presumed that once I renamed the VMPG it would reflect as the new name on the VM settings and still be on that old vSwitch (in secret it is). When I went to delete the vSwitch it told me error failed to delete “a specified parameter is not correct”. Googling I found this 10 year old blog that still relevant in ESXi 6.5.

Had to simply edit the VMs vNics and change them back. Dang Computers…