Resolving a 503 response from HAProxy

Story

A while ago I blogged about using OPNsense with HAProxy as a reverse proxy for Exchange services. Now you can serve many other applications but HTTP(s) has become very common place. This has simplified network requirements at layer 4 and has pushed most security up to level 7 (either patch management (updates) or a next generation firewall (NGF)). Anyway, sometimes the best form of security is simply blocking access to areas that shouldn’t need to be accessed, specially from public facing sides. Imagine a dedicated room, such as a server room, you would keep the doors to this area locked, and generally not directly accessibly from the outside (a door facing an outside wall), same concept applies here for services. Of course you still want users to be able to access the receptionist area. In this case, receptionist area is like the OWA portal, and the server room access is like the ECP portal.

Now in my previous post, I did attempt to not have a public way access to the ECP area, you’d have to be on the inside network to reach it. However much like the comment on that post, if you new about the redirect URL with application layer (HTTP requests with URL parameters) and manually entered the redirect URL path you would still manage to get the ECP login page from the public facing side. (whoops).

Now this isn’t the point of this blog post but will be a nice follow up once the actual concept of this post is… presented?

The issue

Anyway, when using HA proxy one might notice that the logging is rather low. (this is by design for them as to prevent flooding the server’s local storage with well, logs). Why don’t they simply define limit based logging and do FIFO (first in, first out) log rotation based on these limits? Not sure, anyway, first thing you’ll notice is that you’ll get 503 responses, and nothing but “client connections” in the log area:

As you can tell, pretty ****in’ useless. Nothing we didn’t already know, connections on port 80/443 are allowed and passed to the load balancer. However the load balancer is still not servicing content correctly. Let’s move on.

Troubleshooting

At first I was fairly confident all my real servers, conditions, and rules were created successfully and the order was good within the “public services”(interface listener).

Googling the generic issue provided, well, generic answers which didn’t help me. If I knew what the HAProxy service was doing I could stand a way better chance to solve it.

Enable Logging

First we enable logging on the actual service from “info” to “Debug”.

*Note remember to change it back to info to avoid log flooding*

However, This still didn’t provide me any insight when I went to check out the log section.

Turns out there’s separate level of logging for each listener you have. So under your specific “Public Service” aka interface listener, enable advanced logging on it:

Once I had this level of logging enabled I could finally see which backend server was being hit after the request.

Solution

In my case it turned out it was hitting a completely different backend then what the rules defined within the “Public Service”/Listener was defined. When I checked the rule on which the wrong backend it was hitting, it turned out this rule was missing the very condition it was suppose to have on it, and actually had no conditions defined. As such it was hit on any request that was passed to it, since it was higher up in the list of rules in the list of rules on the “Public Service”/Listener.

I hope that made sense, anyway. In this case I ensured the rule for that backend server had the actual condition attached to it that it was suppose to serve. In this case it’s all mostly hostname based and not even complicated using things like regex, or path parameters, etc.

Icing on the Cake

Now remember my story at the beginning trying to block ECP and failing at the redirect. Now I didn’t like that and I came up with a Condition and Rule set that works.

Now as you can see from this, I created two conidtions, if the path ends with ecp (this might be an issue if there are any other backends that happened to have a path that ends in ecp) lucky for me that’s not the case. This woulda been great if managing alternative domains on the same interface, but the second condition is a bit more direct/specific. As you can see from the first image it states to look out for any URL with the parameter of URL if the parameter of the redirect to the ECP. Then in the rule specified the OR condition so if either condition is met, the request is blocked.

Cheers!

vCenter 503 Service Unavailable

I was going to test a auditing script from a DefCon presenter on my AD server, when I was adding the USB controller and the USB stick I was passing thorugh to get the script in my VM was being weird.

First USB 3.0 connected just fine, and connected the USB device to the VM, but diskpart was not showing it. So I went to remove it and try a USB 2.0 controller, that failed to connect since the USB 3.0 was still showing there and I selected to remove it again, which it errored another concurrent task. Makes sense, till refreshing the page told me unprivileged account. I wasn’t sure what this was about, so I decided to open another window and navigate to my center web app… 503 service unavailable:

“503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x000055aec30ef1d0] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)”

What the… rebooting the VCSA showed no success still same error even with an incognito window.. ughh.

I found this thread: https://communities.vmware.com/thread/588755

I was going through this, and decided to try to renew the certs, even though my internal PKI certs were still valide (AFAIK, and checking the cert provided when accessing the page). Now here’s the thing, while I ran the certificate-manager script and renewed all the certs, I noticed my AD server somehow was down. I booted it back up. I’m not exactly sure which fixed it. So I decided to take another snapshot while it was in this “fixed state” and revert to the  broken state. After restoring o the broken state nothing was responding at all on the https service from the VCSA, so I gave it a simple reboot (which I did initially before I noticed my AD server was down, for some reason). Sure enough after the reboot everything was working fine with my internal PKI certs.

I guess if you set vCenter to use MS AD as the primary login domain and that domain is not available the web management service becomes unavailable… that kind of sucks. I should have noticed my AD was not operational but I didn’t have monitoring on it 😉 or use my local workstation as a AD member. Mostly just random VMs I have for testing.

Like most people, should have looked at the logs for a better idea of what the root cause was. I threw 2 darts at a dart board and had to revert to find the true root cause. Not the best way to troubleshoot, but sometimes if logs are not available it is another method…

App Pool Crash on First Load

I’ll keep this on brief; for real this time.

So you created a MSA/gMSA for your Dev to use on ASP.NET.

You granted it Logon as a service rights, as well as batch logon right via group nesting in IIS_USRS group. You granted it all proper permissions on the physical path that IIS is using for the Site/App Pool, as well as any Database permissions if applicable. Yet every time you attempt to navigate the site you get a “503; Service unavailable” and when you go to check the app pool you find it is down. Right click it, select start and it comes right back up without issue, wash, rinse, repeat.

Turns out this happens cause you didn’t fully qualify the MSA/gMSA under the App Pool’s Identity settings. Even though you enter “gMSAAcct$” under the identity field and leave password fields blank, and IIS accepts this… without fault, what I believe is happening here is even though the check IIS has in place, does validate this to a be a real domain account, or service account, it doesn’t prepend or append (depending on which user construct you want to refer to) where ever it stores this user account. This is only a guess.

So you have to fully qualify it; “Domain\gMSAAcct$” You’ll notice it (IIS) will accept it just like it did before. Then watch in amazement as the page loads and doesn’t crash when you attempt to load it in a browser….