Clear vCenter Alert Certificate Status

Story

So lately updated a couple vCenter server servers, and in my process I hit a couple errors that required some resolving…

The Problem

So a couple hiccups along the way. And now it’s time to resolve this one…

Yeahhhh and alert on Certificates… Seems like VMware and certificate management is like Oil n Water. They don’t mix well.

I’ve had some terrible times managing certificates with VMware. However as blogged about here, seems there’s finally a way to use your own certificates via the WebUI.

Anyway… to the point, you figured you simply navigate to the vCenter WebUI -> Home -> Administration -> Certificates. Only to realize there’s nothing reporting as invalid or expired.

Checking for Expired Certs

What gives? Ahhh yes, more hidden secret stuff that is not in your face when it comes to the WebUI. Can you guess? That’s right another VMware KB…

So while the other issues I’ve mentioned does have references and script in relation to certs, the only “check” in those previous posts was using openssl on the VCSA shell to grab the certificate from the listening service on the dedicated port. Which was based on a particular symptom which spurred that check. So here’s the KB telling you how to actually check the certificates the easiest way I found so far (no check.py; python script needed)

for store in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list | grep -v TRUSTED_ROOT_CRLS); do echo "[*] Store :" $store; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $store --text | grep -ie "Alias" -ie "Not After";done;

That’s it! :D…. which just like the KB indicated which cert was bad, in this case, an old Root CA that was used in previous deployments of vCenter before upgrades, So it turns out even though you follow the required KB to get past the pre-check of expired certs. It doesn’t delete the old certificates CA Cert.

There it is, the second CA Cert with expiry in 2019… OK so… You figured it would be easy to clean this up, but remember you couldn’t even see it in the WebUI, so you best believe there is no WebUI way to do this that protects you from human error.

Removing old Expired Certs

Instead, very brilliantly, you get… yes another KB! Booo Yeah… So let’s do this!

The main thing to note about this is…

Certificates are copied back to the VECS store because the CA Certificate which is expiring is published to the VMware Directory Service (VMDIR). When the Certificate is removed from VECS, VMDIR adds the Certificate back to VECS during a sync operation. This is done in order to ensure the integrity of the TRUSTED_ROOTS Certificate store, as deletion of an incorrect Certificate from this store could cause the environment to be irreparably damaged.

OK…. All I take away from this is Certs are important so they have a second cert store as a backup to the first cert store… that’s all I can take away form this odd statement.

/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOTS --text | less

“Find the Certificate you wish to remove and make a note of the Alias and the X509v3 Subject Key Identifier.

Note: There Could be several Certificates to remove. Any expired and not in use certificates should be removed to avoid certificate related alarms.”

Yes that is the plan…

List the trusted certs published to the VMware Directory Service using the following command (administrator@vsphere.local password required). This command is in the same location as vecs-cli:

/usr/lib/vmware-vmafd/bin/dir-cli trustedcert list

Huh… in this case it looks like it is not here, so I should be safe to delete it from the normal store and it shouldn’t auto populate back in.

If you do see it (CN equal to x509v3 Key Identifier) then follow the linked KB to remove it, which seems to save a copy of the cert and use that saved copy to run another command to remove it from the store… super weird.

/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store TRUSTED_ROOTS --alias 3276134ad93b3688b5dc5dcfaa402e9bfd7af12f

Restart all services on the PSCs and on the vCenter Servers and ensure that all services start and respond normally and that you can log in and manage the environment.

service-control --stop --all
service-control --start --all

Took a liil while, then logging in… alert still there, I guess I just have to Reset to Green?

For Now Clicked the Reset to Green link. Even after Yet another vCenter patch, it still did not show up anymore. Yay.

September 21, 2021September 21, 2021

Fixing [400] An error occurred while sending an authentication request to the vCenter Single Sign-On server

So After the last two blog posts about fixing vCenter7’s access issues due to it’s due certificate monument work flows. I was greeted with this error when trying to sign into the web UI on vCenter.

[400] An error occurred while sending an authentication request to the vCenter Single Sign-On server- An error occurred when processing meta data during vCenter Single Sign-On setup:the service provider validation failed. Verify that the server URL is correct and is in FQDN format, or that the hostname is a trusted service provider alias.

After a a quick google search I found yet another VMware KB discussing it.

Resolution

This is an expected behavior.
VMware vSphere 7.0 enforce FQDN or IP address reverse resolvable to FQDN to allow authentication for Single-Sign on.

Greeeeeeeeeeeeeeaaaaaaaaat! Thanks VMware, just another example of security destroying functionality.

What did I do? Exactly what it stated, I navigated to the WebUI URL using the hostnames Fully Qualified Domain Name E.G: Hostname.domain.end

Cause I was attempting to access it just by just the hostname as domain info was being auto resolved by the domain suffix during queries.

September 21, 2021

vSphere HA Agent cannot be correctly installed or configured

I updated a vCenter server to 7.0.x when logging into the newly updated vCenter one host in the cluster state the following alert.

Error: “vSphere HA agent cannot be correctly installed or configured” (2056299) (vmware.com)

The KB didn’t sound promising. Checking the hosts event logs. a bunch of errors about /tmp ramdisk being full…

The ramdisk ‘tmp’ is full – VMware ESXi on HPE ProLiant – Davoud Teimouri – Virtualization and Data Center

For real? Wow, not gettin’ lucky last couple weeks. Sure enough exact same issue, cleared /tmp temporality, and downloaded the patch. When I vMotion the VMs from this host onto another host the VMs themselves showed alerts.

Virtual machine failed to become vSphere HA Protected and HA may not attempt to restart it after a failure.

I kept chugging alone in hopes I’d resolve each VM later. However as soon as I placed the issued host into maintenance mode, the alerts from all VMs disappeared. Applied to patch exactly as the HPE KB stated for the ESXi version it was on.

With luck on my side, the host came up clean, and came out of maintenance mode without an issue, and all error and alerts were resolved. Woooo!

Hope this helps anyone doing a vCenter upgrade to 7.x

September 20, 2021November 9, 2021

Fixing vCenter [500] An error occurred while fetching identity providers.

Story

So The other day I posted about upgrading vCenter to 7.0.x while everything went fine during the upgrade. For some odd reason a couple days later when I went to navigate to the vCenter login page I was greeted with:

[500] An error occurred while fetching identity providers.

Kind of wished I had read this reddit post right off the hop, cause the first reply was is going to be my answer at the end of this post.

I did however first hit this KB about it as well I was a bit thrown off has it indicated to only do it if you see the following in the logs:

(/var/log/vmware/trustmanagement/trustmanagement-svcs.log)

2021-03-10T09:27:03.474Z [tomcat-exec-14  INFO  com.vmware.identity.token.impl.X509TrustChainKeySelector  opId=] Failed to find trusted path to signing certificate <STS Certificate Subject, example - C=US,CN=ssoserverSign\,dc\=vsphere\,dc\=local>
java.security.cert.CertPathBuilderException: Unable to find certificate chain.

Which I could not see, so I wasn’t sure if this was the issue or not. What I did see in my logs was the following:

2021-09-17T23:58:03.945Z [tomcat-exec-14 WARN com.vmware.vcenter.trustmanagement.impl.VcIdentityProviders opId=] com.vmware.sso.interop.ldap.NoSuchObjectLdapException: No such object
LDAP error [code: 32]

and

2021-09-18T01:19:01.322Z [tomcat-exec-26 INFO com.vmware.vapi.security.AuthenticationFilter opId=] Not successful authentication
java.lang.RuntimeException: Authentication data not found
Caused by: com.vmware.vapi.dsig.json.SignatureException: Cannot verify the signature over the provided data

So it wasn’t matching. Looking at my firewall I couldn’t see any LDAP connections from vCenter to my LDAP server since the upgrade. So I decided instead to try a reboot. This simply made things worse.

No Healthy Upsteam

Now when I’d try access vCEnter Web UI I was greeted with a blank white web page with simple text stating “No Healthy Upstream”, now looking into this, people reached this problem for several different reasons. As mentioned here and here and for some odd reason this guy just changed his IP address?! Weird.

For me I checked the local Hosts file and it was fine, and couple other mentioned fixes and they all didn’t work for me.

Try Anyway

For some reason at this point I decided to double the mentioned work around in the initial VMware KB I found as the main login symptom was exactly the same even though I couldn’t validate the same log entries within the logs.

How to Copy Files to VCSA via WinSCP

Now a couple real quick things to note here. You need to copy a script to the VCSA. If you get unable to agree on a cipher suite, you’ll need to update your copy of WinSCP to a newer version. Also instead of doing what VMware says to change the shell on the VCSA, do what this guy suggests instead:

“In the new connection dialog, specify the Host name, User name and then click the Advanced button,

(VCSA 6.5)

Choose the Environment/SFTP option

Specify for SFTP server: shell /usr/libexec/sftp-server”

so much easier.

I decided to take a look at the script after copying it to the VCSA, and it had this line which had me hopeful it would actually work to resolve my issue:

/opt/likewise/bin/ldapmodify -x -h localhost -p 389 -D "cn=administrator,cn=users,$DOMAINCN" -w "$DOMAINPASSWORD" -f sso-sts.ldif | tee -a $LOGFILE

So I followed along with the workaround specified in the KB…

1) Download the attached fixsts.sh script from this article and upload to the impacted PSC or vCenter Server with Embedded PSC to the /tmp folder.

2) If the connection to upload to the vCenter by the SCP client is rejected, run this from an SSH session to the vCenter:

chsh -s /bin/bash

3) Connect to the PSC or vCenter Server with an SSH session if you have not already per Step 2.

4) Navigate to the /tmp directory:

cd /tmp

5) Run chmod +x fixsts.sh to make the file executable.

chmod +x ./fixsts.sh

6) Run ./fixsts.sh.

./fixsts.sh

Restart services on all vCenters and/or PSCs in your SSO domain by using below commands:

service-control --stop --all
service-control --start --all

my results:

To my Amazement it actually worked, and I was able to login into the vCenter server!! Wooo!

*Update* Here’s a great blog post covering managing or creating custom certificates with vCenter 7

Kinda funny that 7.0 is stated as 6.8 in the scripts.. mhmm

September 19, 2021

ESXi Update Network Config Failed
Set ESXi IP via CLI

Real quick post here. I was moving my ESXi hosts and vCenter to a new dedicated subnet. I did the usual; had a temp Windows System in the new subnet, create VMK with temp IP in new subnet, connect to ESXi Web UI via new Temp IP in new Subnet via temp Windows machine. Reconfigure default TCP/IP stack default gateway, change VMK0 IP address (and edit management port group VLAN id if applicable). and Away I’d go.

However on this one host for some unknown stupid reason it would simply fail “Failed – An Error occurred during host configuration”, and the detailed log was just as vague “operation failed diagnostics report unable to set network unreachable” OK… whatever, that shouldn’t matter do as I tell you! Here’s a snippet of the error, and the CLI command that simply worked without bitching.

I just figured let’s try the CLI way and see if it worked, and it turns out it did. The source I used to figure out the command syntax.

The commands I used:

Get IPs:

esxcli network ip interface ipv4 get

Set new IP:

esxcli network ip interface ipv4 set -i vmk1 -I 1.1.1.1 -N 255.255.255.0 -t static

Hope this helps someone.

September 15, 2021September 15, 2021

FreeNAS Single SSD as ZIL and L2ARC

Quick Story I remember I set this up on my FreeNAS server in hopes to get better performance, in reality, I don’t think it helped anything cause of my FreeNAS servers setup. Which was an old desktop with 3 Gigs of memory and a couple SATA drives, 2 spindle and 1 SSD.

Took me a while but I finally found the original source I followed.

Main Parts (assume SSD is ada0):

root@freenas1:~ % gpart create -s gpt ada0
ada0 created
root@freenas1:~ % gpart add -a 4k -b 128 -t freebsd-zfs -s 10G ada0
ada0p1 added
root@freenas1:~ % gpart add -a 4k -t freebsd-zfs ada0
ada0p2 added

List Disk to get GUIDs

root@freenas1:~ % gpart list

Add partitions as Zil and L2ARC on a logical disk (volume0)

root@freenas1:~ % zpool add volume0 log gptid/94a4bd28-aeb7-11e5-99ac-bc5ff42c6cb2
root@freenas1:~ % zpool add volume0 cache gptid/9a79622f-aeb7-11e5-99ac-bc5ff42c6cb2

Nice you can use the zpool command to verify their used as such:

If you are paying attention you’ll noticed the guid are different. Anyway you can use the GUI to see the results as well, if you click the main volume under Storage -> Volume, Then click the Show details button.

If you pick any of the partitions in this list, at the bottom you get a button labeled “Remove”. To undo the previous additions made via the back end SSH.

After this I removed the old Volume completely, including the old File based extent I was using on it.

I then created all new Volumes, one volume on each drive, then created 1 zVol on each volume, then used those zVols as Device based Extents on the iSCSI service…. and I couldn’t believe the performance increase, I couldn’t saturate the 1gbps link before with storage vMotions. Now every single Datastore maxes out the NIC and I hot 100 MB/s plus on every storage vMotion and I increased my storage capacity. W00t (of course I never had storage redundancy to begin with so nothing lost, all gains.

Summary

Don’t bother using a SSD to try and gain speeds on simple homelab FreeNAS servers. It’s useless… “Some more specifics: as a rule of thumb L2ARC is only really useful if you have lots of RAM (64GB+) and a ZIL is only useful if you’re performing lots of synchronous writes.” – anodos

September 15, 2021September 20, 2021

Upgrade and Migrate a vCenter Server

Intro

Hello everyone! Today I’ll be doing a test in my home lab where I will be upgrading, not to be confused with updating, a vCenter server. If you are interested in staying on the version your vCenter is currently on but just patch to the latest version, see my other blog post: VMware vCenter Updates using VAMI – Zewwy’s Info Tech Talks

Before I get into it, there are a couple thing expected from you:

An existing instance of vCenter deployed (for me yup, 6.7)
A backup of the config or whole server via a backup product
A Copy of the latest vCenter ISO (either from VMware directly or for me from VMUG)

Side Story

*Interesting Side Note* VM Creation dates property is only a thing since vCenter 6.7. Before that it was in the events table that gets rotated out from retention policies. 🙂

*Side Note 2* I was doing some vmotions of VMs to prepare rebooting a storage device hosting some datastores before the vCenter update, and oddly even though the Task didn’t complete it would disappear from the recent task view. Clicking all Tasks showed the task in progress but @ 0% so no indication of the progress. The only trick that worked for me was to log off and back in.

A quick little side story, it was a little while since I had logged into VMUG for anything, and I have to admit the site setup is unbelievably bad designed. It’s so unintuitive I had to Google, again, how to get the ISO’s I need from VMUG.

Also for some reason, I don’t know why, when I went to log in it stated my username and password is wrong. Considering I use a password manager, I was very confident it was something wrong on their end. Attempting to do a password reset, provided no email to my email address.

Distort I decided to make a another account with the same email, which oddly enough when created brought me right back to my old account on first log in. Super weird. According to Reddit I was not the only one to experience oddities with VMUG site.

Also on the note of VMware certification, I totally forgot you have to take one of the mandatory classes before you can challenge, or take any of the VMware exams.

“Without the mandatory training? Yes, they represent a reasonable value proposition. With mandatory training? No, they do not. Requiring someone who’s been using your products for a decade to attend a class which covers how to spell ESXi is patronizing if not downright condescending. I only carry VMware certifications because I was able to attain them without going through the nonsense mandatory training.”

“The exam might as well cost $3500 and “include” the class for “free”.”

Don’t fully agree with that last one cause you can take any one class (AFAIK) and take all the exams. I get the annoyance of the barrier to entry, gotta keep the poor out. 😛

Simple Summary about VMUG.

Create account and Sign up for Advantage from the main site.
Download Files from their dedicated Repo Site.

Final gripes about VMUG:

You can’t get Offline Bundles to create custom ESXi images.
You can’t seem to get older versions of the software from there.
The community response is poor.
The site is unintuitive and buggy.

So now that we finally got the vCenter 7 ISO

For a more technical coverage of updating vCenter see VMware’s guide.

For shits.. moving esxi hosts, and vcenter to new subnet.

1) Build Subnet, and firewall rules and vlans
2) Configure all hosts with new VMPG for new vlan
3) Move each host one at a time to new subent, ensure again that network will be allowed to the vCenter server after migration
4) Can’t change VMK for mgmt to use VLAN from the vCenter GUI, have to do it at host level.
i) Place host into maintenance mode, remove from inventory (if host were added by IP, otherwise just disconnect)
ii) Update hosts IP address via the hosts console, and update DNS records
iii) Re-add the host to the cluster via new DNS hostname

Changing vCenter Server IP address

Source: How to change vCenter and vSphere IP Address ( embedded PSC ) – Virtualblog.nl

changed IP address in the VAMI, it even changed the vpxa config serverIP address to the new IP automatically. it worked. :O

Upgrading vCenter

Using the vCenter ISO

The ISO is not a bootable one, so for me I mount it on to a Windows machine that has access to the vCenter server.

Run the installer exe file…

Click Upgrade

I didn’t enter the source ESXi host IP.. lets see

nope wants all the info, fill all fields including source esxi host info.

Yes.

Target ESXi Host for new VCSA deployment. Next

Target VCSA VM info. Next

Would you like, large or eXtra large?

pick VMs datastore location, next.

VM temp info, again insure network connections are open between subnets if working with segregated networks.

Ready to deploy.

Deploying VM to target ESXi host. Once this was done got a message to move on to Stage 2, which can be done later, I clicked next.

Note right here, when you get a prompt for entering the Root password, I found it to be the target Root password not actually the source.

Second Note Resolving Certs Expired Pre-Check

While working on a client upgrade, it was more in my face when doing the source server pre-checks and would not continue stating certificates expired.

I was wondering how to check Existing certs and while this KB states you can check it via the WebUI There could be a couple issues.

1) You might not even be able to login into the WebUI as mentioned in this Blog, a bit of a catch 22. (Note* same goes for SSO domains, it can’t be managed by VAMI, so if there’s an AD issue with a source, you often get a service 503 error attempting to log on to the WebUI)

2) It might not even show up in that area of the WebUI.

In these cases I managed to find this blog post… which shockingly enough is the very guy who wrote the fixsts script used to fix my problem in this very blog post :O

Checking Certs via the CLI

Grab Script from This VMware KB

Download the checksts.py script attached to the above KB article.
Upload to attached script to the VCSA or external PSC.

For example, /tmp

Once the script has been successfully uploaded to VCSA, change the directory to /tmp.

For example:

cd /tmp

Run python checksts.py.

OK Dokie then, I guess this script doesn’t check the required cert… so instead I followed along with this VMware KB (Yes another one).

In which case I ran the exact commands as specified in the KB and saved the certificate to a txt, file and opened it up in Windows by double clicking the .crt file.

openssl s_client -connect MGMT-IP:7444 | more

So now instead of running the fixsts script, this KB states to run the following to reset this certificate to use the Machine Cert (self signed with valid date stamps, at least that’s what this server showed when checking them via the Certificate management are in the vCenter WebUI).

For the Appliance (I don’t deal with the Windows Server version as it EOL)

/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store MACHINE_SSL_CERT --alias __MACHINE_CERT > /var/tmp/MachineSSL.crt
/usr/lib/vmware-vmafd/bin/vecs-cli entry getkey --store MACHINE_SSL_CERT --alias __MACHINE_CERT > /var/tmp/MachineSSL.key
/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT > /var/tmp/sts_internal_backup.crt
/usr/lib/vmware-vmafd/bin/vecs-cli entry getkey --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT > /var/tmp/sts_internal_backup.key
/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT -y
/usr/lib/vmware-vmafd/bin/vecs-cli entry create --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --cert /var/tmp/MachineSSL.crt --key /var/tmp/MachineSSL.key

Then:

```
service-control --stop --all
```
```
service-control --start --all
```

In my case for some odd reason I saw a bunch of these… when stopping and starting the services

2021-09-20T18:35:47.049Z Service vmware-sts-idmd does not seem to be registered with vMon. If this is unexpected please make sure your service config is a valid json. Also check vmon logs for warnings.

I was nervous at first I may have broke it, after sometime it didn’t complete the startup command sequence, and after some time the WebUI was fully accessible again. Let’s validate the cert with the same odd method we did above.

Which sure enough showed a date valid cert that is the machine cert, self-signed.

Running the Update Wizard… Boooo Yeah!

Uhhh, ok….

Ok dokie?

I didn’t care too much about old metrics.

nope.

Let’s go!

After some time…

Nice! and it appears to have worked. 🙂

Another Side Trail

I was excited cause I deployed this new VCSA off the FreeNAS Datastore I wanted to bring and reboot. but low and behold some new random VMs are on the Datastore…

doing some research I found this simple explanation of them however it wasn’t till I found this VMware article with the info I was more after.

Datastore selection for vCLS VMs

The datastore for vCLS VMs is automatically selected based on ranking all the datastores connected to the hosts inside the cluster. A datastore is more likely to be selected if there are hosts in the cluster with free reserved DRS slots connected to the datastore. The algorithm tries to place vCLS VMs in a shared datastore if possible before selecting a local datastore. A datastore with more free space is preferred and the algorithm tries not to place more than one vCLS VM on the same datastore. You can only change the datastore of vCLS VMs after they are deployed and powered on.

If you want to move the VMDKs for vCLS VMs to a different datastore or attach a different storage policy, you can reconfigure vCLS VMs. A warning message is displayed when you perform this operation.

You can perform a storage vMotion to migrate vCLS VMs to a different datastore. You can tag vCLS VMs or attach custom attributes if you want to group them separately from workload VMs, for instance if you have a specific meta-data strategy for all VMs that run in a datacenter.

In vSphere 7.0 U2, new anti-affinity rules are applied automatically. Every three minutes a check is performed, if multiple vCLS VMs are located on a single host they will be automatically redistributed to different hosts.

Note:When a datastore is placed in maintenance mode, if the datastore hosts vCLS VMs, you must manually apply storage vMotion to the vCLS VMs to move them to a new location or put the cluster in retreat mode. A warning message is displayed.

The enter maintenance mode task will start but cannot finish because there is 1 virtual machine residing on the datastore. You can always cancel the task in your Recent Tasks if you decide to continue.

The selected datastore might be storing vSphere Cluster Services VMs which cannot be powered off. To ensure the health of vSphere Cluster Services, these VMs have to be manually vMotioned to a different datastore within the cluster prior to taking this datastore down for maintenance. Refer to this KB article: KB 79892.

Select the checkbox Let me migrate storage for all virtual machines and continue entering maintenance mode after migration. to proceed.

huh, the checkbox is greyed out and I can’t click it.

vmotioned them and the process kept moving up.

September 1, 2021September 14, 2021

How a Small Mistake Became a Big Problem

Background

This story is about implantation of compliance requirements, and the technical changes made that caused some heartburn. In particular Exchange server and retention policies.

Very simple; Compliance, and regulatory practices.

Retention Policy was becoming enforced. As such on Exchange no less, see here for more information on how to configure retention policies on Exchange.

You might notice you have to create tags of time frames. In this case there wasn’t one already pre-populated with my needs. So you have to create those. You may have also noticed that the only name time frame is all defined with number of days.

Human Error

So long story short, I wrongly defined the number of days for the length of period I wanted to defined. Simply due to bad arithmetic, I swear I was an ace at math in school. Anyway, after this small mistake on the tag definition, and it was deployed to all Mailboxes. (Yeap there was steps of approval, and wasn’t caught even during pilot users).

Once it was discovered, there were 2 options. 1) Wait till specific people notice and recover as required, or 2) do it all in one swoop.

Recover deleted messages in a user’s mailbox in Exchange Server | Microsoft Docs

After following this, it was determined that we couldn’t find just the emails from the time frame we needed to restore. This turns out cause all the emails “whenChanged” timestamp all became the same time the retention policy came into effect. So filtering by Date was completely useless.

Digging a Hole

At this point we figured we’d just restore all email, and let the retention policy rerun with the proper time frame tag applied. While this did work, there was a technique or property that was recently added that would have restored the emails into the sub folders in which they were removed from. Instead, all the emails were placed back into users Inbox.

This was a rough burn. Overall it did work, it just wasn’t very clean and there was some fallout from the whole ordeal.

Hope this story helps someone prevent the same mistakes.

August 5, 2021August 6, 2021

SharePoint Site Has Not Been Shared With You

SharePoint Site Not Accessible

Overview Story

Created brand new SharePoint Teams site.

Enabled Publishing Feature.

Created Access groups, Nest User in Group.

Troubleshooting

First Source Disable Access request feature. This feature was enabled, disable it, Result….

Second Source far more thing to try.

Cache, not the case same issue from any browser and client machine.
*PROCEED WITH CAUTION* Stop and Start “Microsoft SharePoint Foundation Web Application” service from Central Admin >>Application Management >>Manage services on server. In case, you face issues, use STSADM command line.cd “c:\program files\common files\microsoft shared\Web Server Extentions\16\BIN”Stop:
stsadm -o provisionservice -action stop -servicetype spwebservice
[Time taken ~30min not sure went for a break since it was taking so long]
iisreset /noforce [!1min]
Start:
stsadm -o provisionservice -action start -servicetype spwebservice
[Roughly 15 min]
iisreset /noforce

Result, SharePoint completely broken, Central Admin is accessible, but all sites are in a non working state. I cannot recommend to try this fix, but if you have to ensure you are doing so either in a test environment or have a backup plan.

I managed to resolve the issue in my test environment, as it turned out the sites were all defined to be HTTPS, however the binding was done manually on IIS to certs created, and then the AAMs updated on CA. Sources one and two

Running these commands SharePoint is not aware to recreate the HTTPS bindings so this has to be done manually.

Result:

Ughhhhhhhhhhhh!

3.If are migrated from SharePoint 2010, or backup-restore/import-exported: If your source site collection is in classic windows authentication mode and the target is in claims authentication

Not the case, but tried it anayway, and results…

This is really starting to bug me…

4.If you have a custom master page verify it’s published! Checked-out master pages could cause this issue.

We did make changes to the master page to resolve some UI issues, but this had to be published to even have those changes show, so YES, no change in result.

5.If you have this feature enabled: “Limited-access user permission lockdown mode” at site collection level – Deactivate it. – Because, this prevents limited access users from getting Form pages!

Found this, deactivated it and….

Ughhhhh….

6.On publishing sites: Make sure you set the cache accounts: Super User & Reader to a domain account for the SharePoint 2013 Web Application with appropriate rights.

I’ve read this from a different source as well however none of my other sites that are teams sites with publishing enabled have these extra properties defined and they are working without issue, I decided to try anyway,

Result:

7. If you didn’t run the product and configuration wizard after installation/patch, you may get this error even if you are a site collection administrator. Run it once and get rid of this issue.

Let’s give it a whirl… Now I pushed this down from #2 as it’s pretty rough like the suggestion I put as #2, which I should probably shift down for the same risk reasons, but I did try this before the other ones and it like the stsadm commands it broke my ShaerPoint Sites, it stated errors about features defined in the content database of attached sites that the features were not installed on the front end server.

In this case I tried to run my scripts I had written to fix these types of issues (publishing of script pending), but it wouldn’t accept the site URL stating it was not a site collection, it was at this point running get-spsite returned an error…

and sure enough…

AHHHHHHHHHHHH!!!!!!!!!

I asked my colleague for help since he seems to be good at solving issue when I feel I’ve tried everything. He noted the master pages and layouts area had unique permissions, setting it to inherit from parent made the pages finally load. But is this normal? I found someone asking about the permissions here, apparently it shouldn’t be inherited and they list the default permission sets:

Yes, Master Page Gallery use unique permissions by default.
Here is the default permissions for your information:
Approvers SharePoint Group Approvers Read
Designers SharePoint Group Designers Design
Hierarchy Managers SharePoint Group Hierarchy Managers Read
Restricted Readers SharePoint Group Restricted Readers Restricted Read
Style Resource Readers SharePoint Group Style Resource Readers Read
System Account User SHAREPOINT\system Full Control

Setting there defaults I still got the page was not shared problem, for some reason it works when I set the library to inherit permissions from the parent site.

When checking other sites I was intrigued when there was only the Style Resource Readers defined. I found this blog on a similar issue when sub site inheritance is broken which interesting enough directly mentions this group.

“It occurs when the built-in SharePoint group “Style Resource Readers” has been messed with in some way.”

Will check this out.

So I went through his entire validation process for that group, it all checked out, however as soon as I broke inheritance on the master page library, made it the “known defaults” and verified all Style Resources users were correct and all permission for it on all listed libraries and lists all match, I continued to receive Site not shared with you error.

I don’t feel the fix I found is correct, but I can’t find any other cause, or solution at this time. I hope this blog post helps someone, but I’m sad to say I wasn’t able to determine the root cause nor the proper solution.

August 4, 2021September 24, 2021

Renew Certificate with same Key via CMD

certutil -store my

The command above can be used to get the required serial number for the cert needing to be renewed. This should show the machine store, if you need certs displayed for the user store remove the “my” keyword.

certreq -enroll -machine -q -PolicyServer * -cert <serial#> renew reusekeys

If you get the following error:

Ensure the machine account has enroll permission on the published certificate template. For step by step guidance follow this blog post by itexperince.

If you get this error “The certificate Authority denied teh request. A required certificate is not within it’s validity period when verifying against currect system clock”:

Ensure the Certificate you are attempting to renew is not already expired.

If it is follow my guide on creating new certs via CLI.

afterwards it should succeed.

*NOTE* this option archives the old certificate, and generates a new one with a new expiration date, with the same key, with a new serial number. How services that are bound to the certificate update themselves, I’m not sure for this test I did not have the certificate bound to any particular services. Verifying this actually the web server using the cert did automatically bind to the new cert, I’d still recommend you verify where the certificate is being used and ensure those services are update/restarted accordingly to apply the changes.