Careful Cloning ESXi Hosts

I’ll keep this post short. I was doing some ESXi host deployments in my home lab, and I noticed that when I would install on a 120GB SSD, the install would go smoothly, but I wasn’t able to use any of the storage as a Datastore. However, if I took a fresh install copy of ESXi from installing onto an 8GB USB Stick and DD’d it to the 120GB SSD I got several advantages from this:

When done via a USB3 Pipe of Linux live holding a copy of my base image to deploy I could get speeds in excess of 100 MB/s, and with only 8GB of data to transfer, the “install” would complete in a mere 90 seconds.
The IP address and root password are preconfigured to what I already now, and I can simply change the IP address from the DCUI and call it a day.

Using this method I could have a host up in less than 5 minutes (2 min to boot linux live, 90 seconds to install the base ESXi OS image, and 2 more to boot ESXi). This was of course on machine without ECC and all the server hardware firmware jazz… in those cases install times are always longer. anyway…

This was an amazing option, until I noticed that when I connect one machine in I just deployed and changed the IP address, and (since I’m super anal about networking during this type of project/operations) I noticed my ping from one machine (a completely different IP address) started to drop when the new device came up… and after a while the ping responses would come back but drop from the new host, and vice versa, flip and flop it goes. I’m used to this usually if there’s an IP conflict and two devices have the same IP address. In this case they were different IP addresses… after enough symptom gathering and logical deduction of because I had to assume that the MAC address just be the same and this is the same problem in reverse (different IP’s but same MAC) and as such experiencing the same symptoms.

To validate this I simply deployed my image to a new machine, then I went on the hunt to figure out how to see the MAC address, since I couldn’t plug in the NIC and get to the web based MGMT interface I had to figure out how to do that via the console CLI directly… mhmm after enough googling on my phone I found this spiceworks thread with my answer:

vim-cmd hostsvc/net/info | grep “mac =”

I then checked this against the ESXi host that I saw the flipping flopping with, and sure enough they matched… After doing a fresh install I noticed that the first 3 sections match the physical MAC, but in my DD deployed ones they retain the MAC of the system from which it was installed and those when I ran the command above, I could tell which ones were deployed via my method. This was further mentioned in this reddit thread by a commenter who goes by the name of sryan2K1:

“The physical NIC MACs are never used. vmk ports, along with VMs themselves will all use VMWare’s OUI as the first half of the address on the wire.”

OK, now maybe I can still salvage my deployment method by simply deleting and recreating the VMK after deployment, but I’d guess it best be done via the DCUI or direct console… I found one KB by VMware/Broadcom but it gave a 404 but Luckly there was a wayback machine link for it here.

Which states the following:

“During Initial Installation and DCUI, ESXi management interface (default vmk0) is created during installation.

The MAC address assigned will be the primary active physical NIC (pnic) associated.

If the associated vmnic is modified with the management interface vmkernel will once again assign MAC address of the associated physical NIC.

To create a VMkernel port and attach it to a portgroup on a Standard vSwitch, run these commands:

esxcli network ip interface add --interface-name=vmkX --portgroup-name=portgroup
esxcli network ip interface ipv4 set --interface-name=vmkX --ipv4=ipaddress --netmask=netmask --type=static"

Alternatively, you can also use esxcli to create the management interface vmkernel on the VDS.

Creation of the management interface with the ‘esxcli network’ will generate a VMware Universally Unique address instead of the pnic MAC address.

It is recommended to use the esxcli network IP interface method to create the management interface and not use DCUI.

Workarounds: None

Additional Information:
Using DCUI to remove vmnic binding from management vmkernel or any modification will apply change at vSwitch level. Management interface is associated with propagating the change to any port groups within the vSwtich level.

Impact/Risks: None.”

I”m assuming it means if you use the DCUI to reconfigure the MGMT interface settings the MAC will automatically be reconfigured to match what I found during initial clean install and mentioned in the reddit thread of using the first 3 sections to derive the MAC of the VMK.

But what if you don’t have any additional interfaces to use to make the section change in the DCUI to have that actually happen? cause what I’ve noticed changing the IP address and disabling IPv6 and rebooting did not change the VMK’s MAC address. Oh there’s in option in the DCUI “Reset Network Settings” within there there’s several options, I simply picked reset to factory defaults. Said success, checked the MAC via the first command stated above and bam the VMK nic changed to what it should be! Sweet my deployment method is still viable.

Hope this helps someone.

September 21, 2024September 27, 2024

Wireless ESXi Host

The Story

So, the other day I pondered an idea. I wanted to start making some special art pieces made from old motherboards, and then I also started to wonder could I actually make such an art piece… and have it functional?

I took an apart my old build that was a 1U server I made from an old PA-500 and a motherboard I repurposed from a colleague who gifted me their old broken system. Since it was a 1U system, I had purchased 2 special pieces to make it work, a special CPU heatsink (complete solid copper, with a side blower fan, and a 300 watt 1U PSU. both of which made lots of noise.

I also have another project going called “Operation Shut the fuck up” in which all the noisy servers I run will be either shutdown or modified to make zero noise. I hope with the project to also reduce my overall power consumption.

So I started by simply benching the Mobo and working off that, which spurred a whole interest into open case computer designs. I managed to find some projects on Thingiverse for 2020 extrusions and corner braces, cable ties… the works. The build was coming along swimmingly. There was just one thing that kept bugging me about the build… The wires…

Now I know the power cable will be required reguardless, but my hope was to have/install an outlet at the level the art piece was going to be placed at and have it nicely nested behind the art piece to hide it. Now there were a couple ways to resolve this.

Use an Ethernet over Power (Powerline) adapter to use the existing copper power lines already installed in the house. (Not to be confused with PoE).
There was just one problem with this, my existing Powerline kit died right when I wanted to use it for the purpose. (Looking inside looks like the, soldered to the board, fuse blew, might be as simple as replacing that but it could be a component behind the fuse failed and replacing it would simply blow the new fuse).
*This is still a very solid option as the default physical port can be used and no other software/configuration/hackery needs to be done, (Plug n Play).
The next best option would be to use one of these RJ45 to Wireless adapters:
Wireless Portable WiFi Repeater/Bridge/AP Modes, VONETS VAP11G-300.
VONETS VAP11G-500S Industrial 2.4GHz Mini WiFi Bridge Wireless Repeater/Router Ethernet to WiFi Adapter
This option is not as good as the signal quality over wireless is not has good as physical even when using Powerline adapters. However, this option much like the Powerline option, again allows the use of the default NIC, and only the device itself would need to be preconfigured using another system but otherwise again no software/configuration/hackery needs to be done.
Straight up use a WiFi Adapter on the ESXi host.

Now if you look up this option you’ll see many different responses from:

It can’t be done at all. But USB NICs have community drivers.
This is true and I’ve used it for ESXi hosts that didn’t have enough NICs for the different Networks that were available (And VLAN was not a viable option for the network design). But I digress here, that’s not what were are after, Wifi ESXi, yes?
It can’t be done. But option 1, powerline is mentioned, as well as option 2 to use a WiFi bridge to connect to the physical port.
Can’t be done, use a bridge. Option 2 specified above. and finally…
Yeah, ESXi doesn’t support Wifi (as mentioned many times) but….. If you pass the WiFi hardware to a VM, then use the vSwitching on the host.. Maybe…

As directly quoted by.. “deleted” – “I mean….if you can find a wifi card that capable, or you make a VM such as pfsense that has a wifi card passed through and that has drivers and then you router all traffic through some internal NIC thats connected to pfsense….”

It was this guys comment that I ran with this crazy idea to see if it could be done…. Spoiler alert, yes that’s why I’m writing this blog post.

The Tasks

The Caveats

While going through this project I was hit with one pretty big hiccup which really sucks but I was able to work past it. That is… It won’t be possible to Bridge the WAN/LAN network segments in OPNsense/PFsense with this setup. Which really sucked that I had to find this out the hard way… as mentioned by pfsense parent company here:

“BSS and IBSS wireless and Bridging

Due to the way wireless works in BSS mode (Basic Service Set, client mode) and IBSS mode (Independent Basic Service Set, Ad-Hoc mode), and the way bridging works, a wireless interface cannot be bridged in BSS or IBSS mode. Every device connected to a wireless card in BSS or IBSS mode must present the same MAC address. With bridging, the MAC address passed is the actual MAC of the connected device. This is normally a desirable facet of how bridging works. With wireless, the only way this can function is if all the devices behind that wireless card present the same MAC address on the wireless network. This is explained in depth by noted wireless expert Jim Thompson in a mailing list post.

As one example, when VMware Player, Workstation, or Server is configured to bridge to a wireless interface, it automatically translates the MAC address to that of the wireless card. Because there is no way to translate a MAC address in FreeBSD, and because of the way bridging in FreeBSD works, it is difficult to provide any workarounds similar to what VMware offers. At some point pfSense® software may support this, but it is not currently on the roadmap.”

Cool what does that mean? It means that if you are running a flat /24 network, as most people in home networks run a private subnet of 192.168.0.0/24, that this device will not be able to communicate in the layer 2 broadcast domain. The good news is ESXi doesn’t needs to work, or utilizes features of broadcast domains. It does however mean that we will need to manage routes as communications to the host using this method will have to be on it’s own dedicated subnet and be routed accordingly based on your network infrastructure. If you have no idea what I’m talking about here then it’s probably best not to continue on with this blog post.

Let’s get started. Oh another thing, at the time of this writing a physical port is still required to get this setup as lots of initial configurations still need to take place on the ESXi host via the Web GUI which can initially only be accessible via the physical port, maybe when I’m done I can make a mirco image of the ESXi hdd with the required VM, but even then the passthrough would have to be configured… ignore this rambling I’m just thinking stupid things…

Step 1) Have a ESXi host with a PCI-e based WiFi card.

I’ve tested this with both desktop Mobo with a PCI-e Wifi card, and a laptop with a built in Wifi Card, in both cases this process worked.

As you can see here I have a very basic ESXi server with some old hardware but otherwise still perfectly useable. For this setup it will be ESXi on USB stick, and for fun I made a Datastore on the remaining space on the USB stick since it was a 64 Gig stick. This is generally a bad idea, again for the same reasons mentioned above that USB sticks are not good at HIGH random I/O, and persistent I/O on top of that, but since this whole blog post is getting an ESXi host managed via WiFi which is also frowned upon why not just go the extra mile and really piss everyone off.

Again I could have done everything on the existing SATA based SSD and avoid so much potential future issue…. but here I am… anyway…

You may also note that at this time in the post I am connecting to a physical adapter on the ESXi host as noted by the IP addresses… once complete these IP addresses will not be used but remain bound the physical NIC.

Step 2) Create VM to manage the WiFi.

Again I’m choosing to use OPNsense cause they are awesome in my opinion.

I found I was able to get away with 1 GB of memory (even though min stated is 2) and 16 GB HDD, if I tried 8 GB the OPNsense installer would fail even though it states to be able to install on 4 GB SD Cards.

Also note I manually change boot from BIOS to EFI which has long been supported. At this stage also check off boot into EFI menu, this allows the VMRC tool to connect to ISO images from my desktop machine that I’m using to manage the ESXi host at this time.

Installing OPNsense

Now this would be much faster had I simply used the SSD, but since I’m doing everything the dumbest way possible, the max speed here will be roughly 8 MB/s… I know this from the extensive testing I’ve done on these USB drives from the ESXi install. (The install caused me so much grief hahah).

Wow 22 MB/s amazing, just remember though that this will be the HDD for just the OPNsense server that won’t need storage I/O, it’ll simply boot and manage the traffic over the WiFi card.

And much like how ESXi installed on the exact same USB drive, we are going to configure OPNsense to not burn out the drive. By following the suggestions in this thread.

Configuring OPNsense

Much like the ESXi host itself at this point I have this VM connected to the same VMPG that connects to my flat 192.168 network. This will allow us to gain access to the web interface to configure the OPNsense server exactly in the same manner we are currently configuring the ESXi host. However, for some reason the main interface while it will default assign to LAN it won’t be configured for DHCP and assumes 192.168.1.1/24 IP… cool, so log into the console and configure the LAN IP address to be reachable per your config, in my case I’m going to give it an IP address in my 192.168.0.0/24 network.

Again this IP will be temporary to configure the VM via the Web GUI. Technically the next couple steps can be done via the CLI but this is just a preference for me at this time, if you know what you are doing feel free to configure these steps as you see fit.

I’m in! At this point I configure SSH access and allow root and password login. Since this it a WiFi bridged VM and not one acting as a firewall between my private network and the public facing internet this is fine for me and allows more management access. Change these how you see fit.

At this point, I skip the GUI wizard. Then configured the settings per the link above.

Even with only 1 GB of memory defined for the VM, I wonder if this will cause any issues, reboot, system seems to have come up fine… moving on.

Holy crap we finally have the pre-reqs in place. All we have to do now is configure the WiFi card for PCI passthrough, give it to the VM, and reconfigure the network stacks. Let’s go!

Locate WiFi card and Configure Passthrough

So back on the ESXi web interface go to … Host -> Manage -> Hardware and configure the device for pasththrough until, you find all devices are greyed out? What the… I’ve done this 3 times what happed….

All PCI Passthrough devices grayed out on ESXi 6.7U3 : r/vmware (reddit.com)

FFS, OK I should have mentioned this in the pre-reqs but I guess in all my previous builds test this setting must have been enabled and available on the boards I was using… I hope I’m not hooped here yet again in this dang project…

Great went into the BIOS could find nothing specific for VT-d or VT-x (kind of amazed VM were working on this thing the whole time. I found one option called XD bit or something, it was enabled, I changed it to disabled, and it caused the system to go into a boot loop. It would start the ESXi boot up and then half way in randomly reboot, I changed the setting back and it works just fine again.

I’m trying super hard right now not to get angry cause everything I have tried to get this server up and running while not having to use the physical NIC has failed… even though I know it’s possible cause I did this 2 other times successfully and now I’m hung cause of another STUPID ****ING technicality.

K I have one other dumb idea up my ass… I have a USB based WiFi NIC, maybe just maybe I can pass that to OPNsense…

VMware seems to possibly allow it: Add USB Devices from an ESXi Host to a Virtual Machine (vmware.com)

OPNsense… Maybe? compatible USB Wifi (opnsense.org)

Here goes my last and final attempt at this hardware….

Attempting USB WiFi Passthrough

Add device, USB Controller 2.0.

Add Device, Find USB device on host from drop down menu.

Boot VM….. (my hearts racing right now, cause I’m in a HAB (Heightened Anger Baseline) and I have no idea if this final work around is going to work or not).

Damn it doesn’t seem to be showing under interfaces… checking dmesg on the shell…

I mean it there’s it has the same name as the PCI-e based WiFi card I was trying to use, but that is 1) pulled from the machine, and 2) we couldn’t pass it through, and dmesg shows it’s on the usbus1… that has to be it… but why can’t I see it in the OPNsense GUI?

OMG… I think this worked… I went to Interfaces wireless, then added the run0 I saw in dmesg….

I then added as an available interface….

For some weird reason it gave it a weird assignment as WifIBridge… I went back into the console and selected option 2 to assign interfaces:

Yay now I can see an assignable interface to WAN. I pick run0

Now back into OPNsense GUI… OMG… there we go I think we can move forward!

Once you see this we can FINALLY start to configure the wireless connection that will drive this whole design! Time for a quick break.

Configuring WiFi on OPNsense

No matter if you did PCI-e passthrough or USB passthrough you should now have an accessible OPNsense via LAN, and assigned the WiFi device interface to WAN. Now we need to get WAN connected to the actual WiFi.

So… Step 1) remove all blocking options to prevent any network issues, again this is an internal bridge/router, and not a Edge Firewall/NAT.

Uncheck Block Private Networks (Since we will be assigning the WAN interface a Private IP), and uncheck Block bogon networks.

Step 2) Define your IP info. In my case I’m going to be providing it a Static IP. I want to give it the one that is currently being used to access it that is bound to the vNIC, but since it’s alread bound and in use we’ll give it another IP in the same subnet and move the IP once it’s released from the other interface. For now we will also leave it as a slash 32 to prevent a network overlap of the interface bound on LAN thats configured for a /24.

No IPv6.

Step 3) Define SSID to connect to and Password.

I did this and clicked apply and to my dismay.. I couldn’t get a ping response… I ssh’d into the device by the current VMX nic IP and even the device itself couldn’t ping it (interface is down, something is wrong).

Checking the OPNsense GUI under INterface Assignments I noticed 2 WiFI interfaces (somehow I guess from me creating it above, and then running the wizard on the console?).

Dang I wanted to grab a snip, but from picking the main one (the other one was called a clone), it has now been removed from the dropdown, and after picking that one the pings started working!

Not sure what to say here, but now at this point you should have a OPNsnese server accessible by LAN (192.168.0.x) and WAN (192.168.0.x). The next thing is we need to make the Web interface accessible by the WAN (Wireless) interface.

Basically, something as horrendous as this drawing here:

Anyway… the first goal is to see if the WiFi hold up, to test this I simply unplug the physical cable from the beaitful diagram above, and make sure the pings to the WAN interface stay up… and they both went down….

This happened to me on my first go around on testing this setup… I know I fixed it.. I just can’t remember how… maybe a reboot of the VM, replug in physical cable. Before I reboot this device I’ll configure a gateway as well.

Interesting, so yup that fixed the WiFi issue, OPNsense now came up clean and WiFi still ping response even when physical nic is removed from the ESXi host… we are gonna make it!

interesting the LAN IP did not come up and disappeared. But that’s OK cause I can access the Web GUI via the WAN IP (Wirelessly).

finally OK, we finally have our wireless connection, now we just need to create a new vSwitch and MGMT network on the ESXi host that we will connect to the OPNsense on the VMX0 side (LAN) that you can see is free to reconfigure. This also free’d the IP address I wanted to use for the WAN, but since I’ve had so many issues… I’m just going to keep the one I got working and move on.

Configure the Special Managment network.

I’m going to go on record and say I’m doing it this way simply cause I got this way to work, if you can make it work by using the existing vSwitch and MGMT interfaces, by all means giver! I’m keeping my existing IPs and MGMT interfaces on the default switch0 and creating a new one for the wireless connection simply so that if I want to physically connect to the existing connection.. I simply plug in the cable.

Having said that on the ESXi host it’s time to create a new vSwitch:

Now create the new VMK, the IP given here is the in the new subnet that will be routed behind the OPNsense WAN. In my example I created a new subnet 192.168.68.0/24 this will be routed to the WAN IP address given to OPNsense in my example here that will be 192.168.0.33. (Outside the scope of this blog post I have created routes for this on my devices gateway devices, also since my machine is in the same subnet at the OPNsense WAN IP, but the OPNsense WAN IP address is not my subnets gateway IP this can cause what is known as asymetric routing, to resolve this you simply have to add the same route I just mentioned to the machine managing the devices. You have been warned, design your stuff better than I’m doing here… this is all simply for educational purposes… don’t do this ever in production)

Now we need to create a VMPG for the VM to connect the VMX0 IP into the new vSwitch to provide it the gateway IP for that new subnet (192.168.68.1/24)

Now we can finally configure the vNIC on the OPNsense VM to this new VMPG:

Before we configure the OPNsense box to have this new IP address let’s configure the ESXi gateway to be that:

OK finally back on the OPNsense side let’s configure the IP address…

Now to validate this it should simply be making sure the ESXi host can ping this IP…

All I should have to do now is configure the route on my machine doing all this work and I should also be able to ping it…

More success… final step.. unplug physical nic to pings stay up?? OMG and they do!!! hahaha:

As you can see the physical NIC IP drops but the new secret MGMT IPs behind the WiFi stay up! There’s one final thing we need to do though.

Configure Auto Start of OPNsense

This is a critical step in the design setup as the OPNsense needs to come up automatically in order to be able to manage the ESXi host if there is ever a reboot of the host.

Then simply configure the auto start setting for this VM:

I also go in and change the auto start delay to 30 seconds.

Summary

And there you have it… and ESXi host completely managed via WiFi….

There are a ton of limitations:

No Bridging so you can’t keep a flat layer 2 broadcast domain. Thus:
Requires dedicated routes and complex networking.
All VM traffic is best handled directly on internal vSwitch otherwise all other VM traffic will share the same WiFi gateway providing a terrible experince.
The Web interface will become sluggish when the network interface is under load.
However it is overall actually possible.
* Using PCI-e passthrough disallows snapshots/vMotions of the OPNsense VM but USB does allow it, when doing a storage vMotion the VM crashed on me, for some reason auto start disabled too had to manually start the VM back up. (I did this by re-IPing the ESXi server via console and plugging in a phsyical cable)
With USB WiFi Nic connections can be connected/disconnected from the host, but with PCI-e Passthrough these options are disabled.
With USB NIC you can add more vNICs to OPNsense and configure them, it just brings down the network overall for about 4-5 min, but be patient it does work.Here’s a Speedtest from a Windows Virtual Machine on the ESXi host.

Hope you all enjoyed this blog post. See ya all next time!

*UPDATE* Remember when I stated I wanted to keep those VMKs in place incase I ever wanted to plug the physical cable back in? Yeah that burnt me pretty hard. If you want a backup physical IP make it something different then you existing network subets and write it down on the NIC…

For some really strange reason HTTPS would work but all other connections such as SSH would timeout very similar to an asymmetric routing issue, and it actually cause it kind was. I’m kinda shocked that HTTPS even managed to work… huh…

Here’s a conversation I had with other on VMware IRC channel trying to troubleshoot the issue. Man I felt so dumb when I finally figured out what was going on.

*Update 2* I notice that the CPU usage on the OPNsense VM would be very high when traffic through it was taking place (and not even high bandwidth here either) AND with the pffilter service disabled, meaning it working it pure routing mode.

High CPU load with 600Mbit (opnsense.org)

Poor speeds and high CPU usage when going through OPNsense?

“Furthermore, set the CPU to 1 core and 4 sockets. Make sure you use VirtIO nics and set Multiqueue to 4 or 8. There is some debate going on if it should be 4 or 8. By my understanding, setting it to 4 will force the amount of queues to 4, which in this case matches your amount of CPU cores. Setting it to 8 will make OPNsense/FreeBSD select the correct amount.” Says Mars

“In this case this is also comparing a linux-based router to a BSD based one. Linux will be able to scale throughput much easily with less CPU power required when compared to the available BSD-based routers. Hopefully with FreeBSD 13 we’ll see more optimization in this regard and maybe close the gap a bit compared to what Linux can do.” Says opnfwb

Mhmmm ok I guess first thing I can try is upping the CPU core count. But this VM also hosts the connection I need to manage it… Seems others have hit this problem too…

Can you add CPU cores to VM at next restart? : r/vmware (reddit.com)

while the script is decent, the comment by cowherd is exactly what I was thinking I was going to do here: “Could you clone the firewall, add cores to the clone, then start it powering up and immediately hard power off the original?”

I’ll test this out when time permits and hopefully provide some charts and stats.

September 9, 2024September 9, 2024

PA VM in bazaar state… by Design

So today I had some weird stuff happening (Fedora Download was downloading slow, 300 KB/s)… I thought it was the mirror, but no matter what mirror I picked I had the same results, I asked a buddy to verify my findings and they could download Fedora with speed… Long story short, I thought maybe it was my firewall, and my colleague mentioned the same. Since this is a Lab setup it would be nice to get a perpetual license for learning purposes, but PAN clearly don’t work like. I was pretty sure my license had expired, so decided to first quick finds out what happens when a license expires: What Happens When Licenses Expire? (paloaltonetworks.com)…

Threat Prevention	Alerts appear in the System Log indicating that the license has expired. You can still: Use signatures that were installed at the time the license expired, unless you install a new Applications-only content update either manually or as part of an automatic schedule. If you do, the update will delete your existing threat signatures and you will no longer receive protection against them. Use and modify Custom App-ID™ and threat signatures. You can no longer: Install new signatures. Roll signatures back to previous versions.

Good to know, nothing that would cause the issue I’m experiencing….

DNS Security	You can still: Use local DNS signatures if you have an active Threat Prevention license. You can no longer: Get new DNS signatures.

nope… and…

Advanced URL Filtering / URL Filtering	You can still: Enforce policy using custom URL categories. You can no longer: Get updates to cached PAN-DB categories. Connect to the PAN-DB URL filtering database. Get PAN-DB URL categories. Analyze URL requests in real-time using advanced URL filtering.
WildFire	You can still: Forward PEs for analysis. Get signature updates every 24-48 hours if you have an active Threat Prevention subscription. You can no longer: Get five-minute updates through the WildFire public and private clouds. Forward advanced file types such as APKs, Flash files, PDFs, Microsoft Office files, Java Applets, Java files (.jar and .class), and HTTP/HTTPS email links contained in SMTP and POP3 email messages. Use the WildFire API. Use the WildFire appliance to host a WildFire private cloud or a WildFire hybrid cloud.
AutoFocus	You can still: Use an external dynamic list with AutoFocus data for a grace period of three months. You can no longer: Access the AutoFocus portal. View the AutoFocus Intelligence Summary for Monitor log or ACC artifacts.
Cortex Data Lake	You can still: Store log data for a 30-day grace period, after which it is deleted. Forward logs to Cortex Data Lake until the end of the 30-day grace period.
GlobalProtect	You can still: Use the app for endpoints running Windows and macOS. Configure single or multiple internal/external gateways. You can no longer: Access the Linux OS app and mobile app for iOS, Android, Chrome OS, and Windows 10 UWP. Use IPv6 for external gateways. Run HIP checks. Use Clientless VPN. Enforce split tunneling based on destination domain, client process, and video streaming application.

All a bunch of nope…

VM-Series	See the VM-Series Deployment Guide.
Support	You can no longer: Receive software updates. Download VM images. Benefit from technical support.

This is a VM series yes… so what does that link mean….

VM-Series	You can still: You can continue to configure and use the firewall you deployed prior to the license expiring with no change in session capacity. The firewall won’t reboot automatically and cause a disruption in traffic. However, if the firewall reboots for any reason, the firewall enters an unlicensed state. While unlicensed, a firewall supports a maximum of 1,200 sessions. No other management plane features or configuration options are restricted.

OK… Maybe… but I’m sure a download of a single file doesn’t take over 1,200 sessions… while I did reboot the unit (cloned, power off OG, power on clone, etc)

All other things are the same as posted above… Then I noticed some really weird things….

Checking for updates doesn’t state anything about license status, just tries and quietly fails.
Checking support status shows “Device not found on this update server”
Dynamic Updates do not show a “currently installed” version.
1. The current version installed with Review Policies, and review apps under action.
2. The previous installed one will have the same plus a revert action.
3. Downloaded one will have an install action.
4. All others seen since last communication to PAN will have download
Retrieving licenses from licenses server returns “Failed to install features. The device is not found.
Finally the smoking gun… Serial Number on the Dashboard will be listed as unknown.

So, I ended Googling this and found not one, but TWO KB’s!!!

Serial number becomes “unknown” after changing the instance typ… – Knowledge Base – Palo Alto Networks

and

Serial number becomes “unknown” upon rebooting PA-VM – Knowledge Base – Palo Alto Networks

After reading these, it all made sense… and it’s all rather dumb… to paraphrase it simply….

It’s due to DRM, how the DRM works is it derives the serial number from two ID’s CPUID and UUID… and when you migrate a PAN VM the CPU is different cause of the different host it resides… this in turn breaks the licensing.

*Standing Ovation*

What’s PAN solution… Open a support ticket… that’s right.. instead of coming up with a technical solution to make DRM work while still retaining the ability to migrate the VM (The most important and valuable reason why you want to run it as a VM anyway)….

Instead of having a way to edit the CPUID and UUID in the PAN portal to fix this yourself…..

No they want you to waste their tech support personals time….

This ….. IS……. DUMB!!!!!

October 6, 2023April 17, 2024

Configure Certificate-Based Administrator Authentication on a Palo Alto Networks Firewall

Source

As a “more secure” alternative to password-based authentication to the firewall web interface, you can configure certificate-based authentication for administrator accounts that are local to the firewall. Certificate-based authentication involves the exchange and verification of a digital signature instead of a password.

Configuring certificate-based authentication for any administrator disables the username/password logins for all administrators on the firewall; administrators thereafter require the certificate to log in.

To avoid any issues I created a snapshot of the PA VM. This took out my internet for roughly 30 seconds or so.

Step 1) Generate a certificate authority (CA) certificate on the firewall.
You will use this CA certificate to sign the client certificate of each administrator.
Create a Self-Signed Root CA Certificate.
Alternatively, Import a Certificate and Private Key from your enterprise CA or a third-party CA.

I do have a PKI I can use but no specfic key-pair that’s nice for this purpose, for the ease of testing I’ll create a local CA cert on the PAN FW.

Step 2) Configure a certificate profile for securing access to the web interface.
Configure a Certificate Profile.
Set the Username Field to Subject.
In the CA Certificates section, Add the CA Certificate you just created or imported.

Now for ease of use and testing I’m not defining CRL or OCSP.

Step 3) Configure the firewall to use the certificate profile for authenticating administrators.
Select Device -> Setup – > Management and edit the Authentication Settings.
Select the Certificate Profile you created for authenticating administrators and click OK.

Step 4) Configure the administrator accounts to use client certificate authentication.
For each administrator who will access the firewall web interface, Configure a Firewall Administrator Account and select Use only client certificate authentication.
If you have already deployed client certificates that your enterprise CA generated, skip to Step 8. Otherwise, go to Step 5.

Step 5) Generate a client certificate for each administrator.
Generate a Certificate. In the Signed By drop-down, select a self-signed root CA certificate.

Step 6) Export the client certificate.
Export a Certificate and Private Key. (I saved as pcks12, with a password)
Commit your changes. The firewall restarts and terminates your login session. Thereafter, administrators can access the web interface only from client systems that have the client certificate you generated.

File was in my downloads folder.

Step 7)Import the client certificate into the client system of each administrator who will access the web interface.

Refer to your web browser documentation. I am using windows, so I’m assuming the browser (Edge) will use the windows store, so I installed it to my user cert store by simply double clicking the file and providing the password in the import wizard prompt. Then checked my local user cert store.

Time to commit and see what happens…

as soon as I committed I got a prompt for the cert:

If I open a new InPrivate window and don’t offer the certificate I get blocked:

If I provide the certificate the usual FBA login page loads.

So now any access to the firewall requires the use of this key, and a known login creds. Though the notice stated it “disables the username/password logins for all administrators on the firewall” my testing showed that not to be true, it simply locks down access to the FBA page requiring the user of the created certificate.

Using Internal PKI

Let’s try to set this up, but instead of self signed, let’s try using an interal PKI, in this case Windows PKI using Windows based CA’s.

Pre-reqs, It is assumed you already have a windows domain, PKI and CA all already configured. If you require asstance please see my blog post on how to set such a environment up from scratch here: Setup Offline Root CA (Part 1) – Zewwy’s Info Tech Talks

This post also assumes you have a Palo Alto Networks firewall in which you want to secure the mgmt web interface with increased authentication mechanisms.

Step 1) Import all certificates into the PA firewall so it shows a valid stack:

Step 3) Generate a client certificate for each administrator.
Generate a Certificate. In the Signed By drop-down, select a CSR.

Now I’m not 100% certain how this all works, so I name the name common name and SAN the same as the local admin account I wanted to secure.

Then export CSR, and sign it by your internal CA server. and import it back into the PA firewall. In my case I decided (for testing purposes and simply due to pure ignorance) to create the certificate using the Web Server Template, even though I know this is going to be a certificate used for user authentication. *shrug* The final result should look like this:

Step 4) Configure the firewall to use the certificate profile for authenticating administrators. Pick the Cert profile created in Step 2.
Select Device -> Setup – > Management and edit the Authentication Settings.
Select the Certificate Profile you created for authenticating administrators and click OK. (At this point I recommend to not commit until at least the certificate created earlier is exported.)

Step 5) Configure the administrator accounts to use client certificate authentication.
For each administrator who will access the firewall web interface, Configure a Firewall Administrator Account and select Use only client certificate authentication.

This is where things start to feel weird in the whole process of this stuff…. It seem as soon as you check this checkbox off, the password fields disappear:

Before:

After:

Which makes it seem like it just changes the account the account from password based to just certificate based, and not 2fa as expected. On top of that, why can’t I specify which certificate to use, does this mean any certificate that exists within the PA store is good enough? I guess I’ll have to test to see if that’s the case anyway…

Step 6) Export the client certificate.
Export a Certificate and Private Key. (I saved as pcks12, with a password)

Step 7) Commit, and watch it be like before, where the web login won’t even show an FBA page until you present a certificate first. Which again seems like the firewall doesn’t associate certain certificates with certain users, instead it seems to lock down the FBA page to require ANY certificate (with key?) that is configured or signed by the CA’s specified in the Certificate profile.

Which seems like such a dumb design, it be way better off, that when you check off Certificate based option for a user, you have to pick which cert, then instead of blocking the FBA page as a whole, when that user’s credentials are entered into the FBA page, it then checks/asks for the certificate specified in the one selected in the user creation process.

I seemed to be getting stuck at 400 bad request even with the certificate in my personal store. My only guess is due to the point I mentioned about that I picked web server template when I signed the certificate, which you can see client auth is missing from the useage field:

I didn’t make a snapshot (or you maybe running a physical firewall), how do I fix this? Well… access the console directly (VM use the hypervisor console), or if physical use the console port, or if you configured SSH access, SSH in, and revert the config. I figured “load config last-saved” would have worked, but it didn’t I guess last saved is running config so the command to me feels useless. I could be missing something on that, so instead I had to pick a config from a couple months ago. The first time around it didn’t state anything about restarting the web mgmt service, but when picking the older config it does:

This must be cause of the Cert Profile binding option in the auth section of the mgmt settings. Further validating my assumptions on the design choice.

Now I was able to log back in to the MGMT web interface, load the config with all my work on it (so I didn’t have to redo all the steps above). Let’s simply recreate the “user” cert but using a client template, and see how that goes…

1) Delete the old cert (check)

2) Create new cert (check)
This time, no additional fields (not even a SAN):

Signed using User template:

Import it into the firewall… (Check) (No clue where that TLSv1.3 cert came from…)

Export it from the firewall… (check)

Import into client machine, user’s personal store.. (check) (Interesting shows assign to the admin account that requested the certificate)

Double check the Mgmt auth settings (check), so only main difference is the client cert now and… error 400… ****

I reverted again, after which I loaded the config above again, but this time changing the cert profile selected on the mgmt auth section to be the self signed one that worked in the orginal posting I made about this stuff, oddly enough after commit on my reg web browser I couldn’t get the web interface to load (400 error) but with incong/in-private window I got the prompt for the admin cert and I got the FBA page.

So for testing one last time to get the Internal PKI cert to work. I decided to make one last change. When I made the certificate I specified the subject name to be that of the account (in this case I had an account on the PA firewall of akamin. I also decided to use the Template I created for making user certs for Global Protect which were templated for client auth. The final results on the PA looked like this:

and exporting, and importing into client machine cert store looked like this:

As you can see this looks much cleaner then all my previous attempts, and shows all assigned to be the user in which we want to login as. The only other change was I created another Certificate Profile, but did not check off any of the Blocked options. Once I committed this change I got a 400 on my regular web browser, but opening an in-private window I got:

Finally! Picking it we can see it auto populated the username:

However don’t be fooled by this, I was easily able to change the name in the field and log in as another user. In this case I changed the name to another local admin, and entered the password of that user and logged in just fine. Further validating that all it’s doing is blocking access to the FBA page to anyone who has Any cert signed by the CA’s listed in the Certificate Profile.

Now I want to figure out the regular browser 400 error problem so I don’t have to open an in-private window each time. Usually this means just cleaning the cache, but when picking what to clear I picked last hour and everything but browsing history, that didn’t work. Reboot did work.

The next task is to see if I could load this certificate onto a YubiKey, and be able to use it’s ability to act as a certificate key store.

Yubikey

Source

First annoying thing this source is missing is that you need the YubiKey MiniDriver installed in order to complete this task.

The next thing that burnt me, was when I went to import the certificate it kept saying my PIN was wrong. Which first lead to my PIN becoming blocked, which lead me to reading all this stuff. Since my PIN was locked after 3 attempts, I nearly locked the PUK as the first two entries were wrong, and lucky me it was set to the default, and I managed to unblock the PIN. I then managed to set a new PUK and PIN. I did this by using ykman. Which was available on Fedoras native repo, so I did this using fedora live.

What I don’t get is, is there a different pin for WebAuthN vs this one for certificates? It seems like it, cause even when the certificate pin was blocked my WebAuthN was still working.

Back to my Windows machine

Plug in Yubikey.. then:

certutil –csp "Microsoft Base Smart Card Crypto Provider" –importpfx C:\Path\to\your.pfx

When prompted, enter the PIN. If you have not set a PIN, the default value is 123456.

This time it worked, yay…

ok now how to test this…., I’ll try to access the mgmt web interface from a random computer one thats not the one that I tested above which has the key already installed in the user’s windows cert store. Mhmmm what do I have… how about my old as Acer Netbook running windows 7 32bit, there’s no way that’ll work… would it…

I try to acess the web console and sure enough 400 bad gateway… plug in the Yubi Key…

There’s no way….

No freaking way…. and try to access the web mgmt…

No way! it actually worked, that’s unbelievable!

enter the pin I configured above (not the WebAuthN Pin)…

crazy… I can’t believe that works… so yeah this is a feasible solution, but it’s still not as good as WebAuthN, which I hope will be supported soon.

Weird… I went to access the web interface from my machine that has the cert in my cert store, but now it seems to want the yubikey even though the cert is in my user store, I tried an in-private window but same problem… do I have to reboot my machine again? Fuck no, that didn’t work either… like WTF.

Tried another browser, Chrome, SAME THING! It’s like when running the command to import a certificate into a YubiKey it overrides the one on the local store and always asks for the YubiKey when picking that certificate. Which doesn’t make any sense…

I grabbed the cert, imported into the user store on another machine, and bam it works as intended… it just seems on the machine in which you import it into a yubi then it always wants the yubikey on that machine, regardless of the certificate being in the users cert store… which still doesn’t make sense…

OK, so I deleted the certificate from my user cert store, re-imported it, open an in-private window and now it accepted the cert without asking for the YubiKey. I still don’t understand what’s going on here…. but that fixed it….

Things I still don’t understand though… if I set this user option to require certificate and the password fields disappear in the PAN Web mgmt interface, then why is it still asking for a password for the user? Why is a certificate required before login if there’s a toggle for certificate based login on the user’s setting? Wouldn’t it make more sense that the Web UI stays available and once you enter your creds then based on the creds entered the PAN OS looks up if that check box is enabled, and then ask for the certificate? And you’d have to configure which certificate in the User settings so that it actually ties a specific user to a specific cert, so you don’t have any cert is good for any admin? So many questions…. so little answers…

Wait a second, I can’t remember this users password, and I can’t login, ah nuts I made a typo in the cert.. FFS man…

What makes it even dumber is it states No auth profile found, but what it really means is that user doesn’t exist. Now instead of mucking around creating a new cert import/export/import and all that jazz, lets create a user akamin check off Cert based which means no password set and lets see what happens…

Oh interesting….

Now that the user was properly defined as the common name when the cert was created you can no longer specify a user account, and it forces the one specified. But if this is the case, how does an admin login who isn’t defined to use certificate based login? While this makes sense on which user is supposed to use which cert without having to defined it in the user’s setting. However, it doesn’t explain the forced certificated requirement before the FBA page, or how admins not configured for certificate based login can even login now.

¯\_(ツ)_/¯

I lost my keys… what do I do?

If you have the default admin account and left it as normal (no cert requirement), you can sign in via SSH or direct console and remove the config from the auth settings:

Configuration
delete deviceconfig system certificate-profile
commit

That should be all to get back to normal weblogin, but you’d still need to have an accounts configured to not have the certificate checkbox on those user’s settings.

It seems like that this can work as long as you leave the default admin account configured for regular auth (username and password).Maybe you can still make it work as long as there’s a lot of certs and redundancy. I haven’t exactly tested that out.

~~OK so above I simply reverted cause it was the only change I had. This only works in two conditions:~~

~~You know exactly when the change was implemented.~~
~~You have the latest running config saved.~~

Thinking about this I think the latest running will always be there so you just have to know when the change was implemented. revert to that, then load the last running, and turn off the cert profile on the mgmt auth settings area.

~~but what if you don’t know when that was made, well let’s see if we can make the change via the CLI…~~

~~So I found the location on where to set it….~~

~~set deviceconfig system certificate-profile~~

~~I can’t seem to set it to null… I found a similar question here, which only further validated my concern above about other admins who aren’t configured for cert login…~~

~~“However at the very beginning of the Web Page I can read:~~

“Configuring certificate-based authentication for any administrator disables the username/password logins for all administrators on the firewall; administrators thereafter require the certificate to log in.””

Unfortunately, the linked source is dead, but I’m sure it’s still in play. With the thread having no real answer to the question, it seems my only option is the steps I did before… revert to a config before it was implemented, load the old running config, and within the web UI remove the Cert profile, which totally fucking sucks ass…. However, as we discovered, if we configure a cert with a common name that isn’t a user on the PAN, then we can use that to access the FBA page with accounts that are not checked off with the setting in the users’ setting. I wonder if this is something that wasn’t intended and I discovered it simply by chance which kinda shows the poor implementation design here.

~~I think I covered everything I can about this topic here… Now since this account I created was a superuser (read only) and now that the user exists… I’ll revert…. or wait….~~

~~maybe I can delete that user, and then go back to just needing the cert and I can sign in with another account to fix this… let’s try that LOL.~~

~~Finally direct guidance! Woo~~

~~and….~~

~~No matter what browser or machine I try to connect to it, it just error 400.~~

~~This… shit… sucks.~~

~~Can you make this idea work… yes.~~

Can you fix this if you lose all your keys, not easily, you’d have to know the exact commit the change was made, and if there were other changes made after that, they temp not be applied during the recovery period.

Facepalm…. I don’t know why I didn’t think of this sooner… you don’t use set, you use delete in the cli to set it to none.

April 10, 2023April 10, 2023

ASUS calling Microsoft

Back Story

I’ll try to keep this post short as I’m behind on many other posts I have to finish. hahah :S

Anyway, I was thinking it’s time to update my pihole, when I checked the admin web interface to check for clients to see who’d still be using it for DNS, and then I’d make a list and be prepared to change them as required (any outside of DHCP of course, as I’d simply change the IP there). Now you might be wondering, why change the IP address? Which is a fair question, I could just update the one in question, but I had bigger plans to move it to another server, I didn’t want to give the other server multiple IPs, so I figured it be easier to spin up the new service on that server and simply change the DNS on the DHCP server/service. Anyway… where was I, oh right, checking the web admin I noticed the top client was my new ASUS RT-AX88U. I was hoping to get a model that supported Tomato like the old RT-N16 I had for so many years which I recently broke and so replaced it with this unit. It currently can’t run Tomato like I managed to do with the RT-N16. So, I just had configured it for AP mode. Figured it doesn’t need to do much else for now besides serve unreal good WiFi.

Yet it’s calling home to “dns.msftncsi.com”, when I looked up this domain it seems to be used mostly by windows machines to check to make sure they are online.

Fix This

Looking a bit further into it I managed to find this magical Reddit post (I really love reddit, I’ve found so many helpful posts there). Anyway let’s see if we can follow the steps on this router.

Step 1 – Enable Access

The source uses telnet, but I’m not a fan of transferring creds in cleartext, unless I know for certain it’s a completely isolated network. Since the router supports SSH, I enabled that instead and logged in. *note* I had to remove the fingerprint from the old RT-N16 I used to SSH into.

Step 2 – Gain Shell Access to your Router

K, with that done, let’s see if we can edit the nvram, but let’s take a look as the OP suggests.

Step 3 – Look deep into NVRAM

nvram show | sort | less

I used the less command instead, as my old linux instructor once said “less is more” using less you can use the up and down arrow keys to scroll through the results, and look-e-here: (Press Q to exit less)

Step 4 – Finding the Droids

The droids I was after. Time to eliminate them.

Step 5 – Kill the Probe Content Droid

nvram set dns_probe_content=127.0.0.1

Step 6 – Kill the Probe Host Droid

nvram set dns_probe_host=""

Step 7 – Prevent Droid Resurrection

nvram commit

Step 8 – Fully Enforce Your New Empire

reboot

Verify:

Noice!

September 19, 2021

ESXi Update Network Config Failed
Set ESXi IP via CLI

Real quick post here. I was moving my ESXi hosts and vCenter to a new dedicated subnet. I did the usual; had a temp Windows System in the new subnet, create VMK with temp IP in new subnet, connect to ESXi Web UI via new Temp IP in new Subnet via temp Windows machine. Reconfigure default TCP/IP stack default gateway, change VMK0 IP address (and edit management port group VLAN id if applicable). and Away I’d go.

However on this one host for some unknown stupid reason it would simply fail “Failed – An Error occurred during host configuration”, and the detailed log was just as vague “operation failed diagnostics report unable to set network unreachable” OK… whatever, that shouldn’t matter do as I tell you! Here’s a snippet of the error, and the CLI command that simply worked without bitching.

I just figured let’s try the CLI way and see if it worked, and it turns out it did. The source I used to figure out the command syntax.

The commands I used:

Get IPs:

esxcli network ip interface ipv4 get

Set new IP:

esxcli network ip interface ipv4 set -i vmk1 -I 1.1.1.1 -N 255.255.255.0 -t static

Hope this helps someone.

February 9, 2021

UniFi Shows MAC address instead of Hostname

I noticed this recently, that the UniFi management interface would show some clients as just their mac addresses instead of the host names like most other devices.

Searching I found this one, but it was after an update, I did not update the software.

Then I found this thread which was more what I was looking for, which tells me how the name is retrieved … “DHCP Snooping”.

Alright, so taking a look at the DHCP server, I noticed it was indeed empty names on the IPs that were given out.

Didn’t take me long to determine that it was Android devices. When I wanted to configure a hostname to the device I found out with the latest version.. I can’t?

“Hostname is used to easily identify and remember hosts connected to a network. It’s set on boot, e.g. from /etc/hostname on Linux based systems. Hostname is also a part of DHCPREQUEST (standardized as code 12 by IETF) which a DHCP client (Android device in our case) makes to DHCP server (WiFi router) to get an IP address assigned. DHCP server stores the hostnames to offer services like DNS. See details in How to ping a local network host by hostname?.

Android – instead of using Linux kernel’s hostname service – used property net.hostname (since Android 2.2) to set a unique host name for every device which was based on android_id. This hostname property was used for DHCP handshake (as added in Android 2.2 and 4.0). In Android 6 net.hostname continued to be used (1, 2, 3, 4) in new Java DHCP client when native dhcpcd was abandoned and later service was removed in Android 7. Since Android 8 – when android_id became unique to apps – net.hostname is no more set, so a null is sent in DHCPREQUEST. See Android 8 Privacy Changes and Security Enhancements:

net.hostname is now empty and the dhcp client no longer sends a hostname

So the WiFi routers show no host names for Android 8+, neither we can set / unset / change it.

However on rooted devices you can set net.hostname manually using setprop command or add in some init’s .rc file to set on every boot. Or use a third party client like busybox udhcpc to send desired hostname and other options to router. See Connecting to WiFi via ADB Shell.”

Well then… Now I have to manually set Aliases and use DHCP reservations just to be able to track these devices… cause “privacy”

Summary…. Thumbs up… man!

February 8, 2021

Palo Alto Networks – Service Routes

The Story

You can read about Service routes from PAN directly here.

Basically … “The firewall uses the management (MGT) interface by default to access external services, such as DNS servers, external authentication servers, Palo Alto Networks services such as software, URL updates, licenses and AutoFocus. An alternative to using the MGT interface is to configure a data port (a regular interface) to access these services. The path from the interface to the service on a server is known as a service route. The service packets exit the firewall on the port assigned for the external service and the server sends its response to the configured source interface and source IP address.”

This is generally used if you configure the firewall, but don’t actually happen to physically plug anything into the MGMT port of the Firewall (MGMT on Physical or VNIC0 on VMs). However the device does have a internet connection, or has some interface on the dataplane that has access to a specific service. Whatever the need may be they can be useful to know they exist and can be utilized for certain situations.

When I discussed this with a friend who deploys many of these devices, it was opted to use the MGMT interface for most things. I did note one case such as Email, where you could configure the service route for that via the gateway interface for the mail server, thus only require one IP in the ACLs of the mail relay/server.

He did note that you could not test email from the passive firewall, as the interface won’t be active. Which could be problematic for other monitoring services such as SNMP, if utilized. Which was noted. Luckily many different services (SNMP/Email/LDAP) can be configured independently and all default to the MGMT interface.

Summary

The main reason I even noticed this was due to email not working on the alternative firewall after it took over from a failover, even though the dashboard on both firewall stated the running configs are both the same. Well it turns out that service routes I guess are not tested for synchronization between peers.

So yeah… not that if you are using Service Routes with PAN firewalls.

December 21, 2020

Resolving a 503 response from HAProxy

Story

A while ago I blogged about using OPNsense with HAProxy as a reverse proxy for Exchange services. Now you can serve many other applications but HTTP(s) has become very common place. This has simplified network requirements at layer 4 and has pushed most security up to level 7 (either patch management (updates) or a next generation firewall (NGF)). Anyway, sometimes the best form of security is simply blocking access to areas that shouldn’t need to be accessed, specially from public facing sides. Imagine a dedicated room, such as a server room, you would keep the doors to this area locked, and generally not directly accessibly from the outside (a door facing an outside wall), same concept applies here for services. Of course you still want users to be able to access the receptionist area. In this case, receptionist area is like the OWA portal, and the server room access is like the ECP portal.

Now in my previous post, I did attempt to not have a public way access to the ECP area, you’d have to be on the inside network to reach it. However much like the comment on that post, if you new about the redirect URL with application layer (HTTP requests with URL parameters) and manually entered the redirect URL path you would still manage to get the ECP login page from the public facing side. (whoops).

Now this isn’t the point of this blog post but will be a nice follow up once the actual concept of this post is… presented?

The issue

Anyway, when using HA proxy one might notice that the logging is rather low. (this is by design for them as to prevent flooding the server’s local storage with well, logs). Why don’t they simply define limit based logging and do FIFO (first in, first out) log rotation based on these limits? Not sure, anyway, first thing you’ll notice is that you’ll get 503 responses, and nothing but “client connections” in the log area:

As you can tell, pretty ****in’ useless. Nothing we didn’t already know, connections on port 80/443 are allowed and passed to the load balancer. However the load balancer is still not servicing content correctly. Let’s move on.

Troubleshooting

At first I was fairly confident all my real servers, conditions, and rules were created successfully and the order was good within the “public services”(interface listener).

Googling the generic issue provided, well, generic answers which didn’t help me. If I knew what the HAProxy service was doing I could stand a way better chance to solve it.

Enable Logging

First we enable logging on the actual service from “info” to “Debug”.

*Note remember to change it back to info to avoid log flooding*

However, This still didn’t provide me any insight when I went to check out the log section.

Turns out there’s separate level of logging for each listener you have. So under your specific “Public Service” aka interface listener, enable advanced logging on it:

Once I had this level of logging enabled I could finally see which backend server was being hit after the request.

Solution

In my case it turned out it was hitting a completely different backend then what the rules defined within the “Public Service”/Listener was defined. When I checked the rule on which the wrong backend it was hitting, it turned out this rule was missing the very condition it was suppose to have on it, and actually had no conditions defined. As such it was hit on any request that was passed to it, since it was higher up in the list of rules in the list of rules on the “Public Service”/Listener.

I hope that made sense, anyway. In this case I ensured the rule for that backend server had the actual condition attached to it that it was suppose to serve. In this case it’s all mostly hostname based and not even complicated using things like regex, or path parameters, etc.

Icing on the Cake

Now remember my story at the beginning trying to block ECP and failing at the redirect. Now I didn’t like that and I came up with a Condition and Rule set that works.

Now as you can see from this, I created two conidtions, if the path ends with ecp (this might be an issue if there are any other backends that happened to have a path that ends in ecp) lucky for me that’s not the case. This woulda been great if managing alternative domains on the same interface, but the second condition is a bit more direct/specific. As you can see from the first image it states to look out for any URL with the parameter of URL if the parameter of the redirect to the ECP. Then in the rule specified the OR condition so if either condition is met, the request is blocked.

Cheers!

October 7, 2020

Windows MPIO to FreeNAS iSCSI Target

Intro

Well I made some mistake, the system worked but not utilizing its max capabilities..

I had been successfully using FreeNAS as a iSCSI target for a disk mounted in Windows Server, but only one path being used at all times…

Windows Side

Source

I first needed the MPIO feature installed:

Click Manage > Add Roles And Features.
Click Next to get to the Features screen.
Check the box for Multipath I/O (MPIO).
Complete the wizard and wait for the installation to complete.

Noice.

Then we need to configure MPIO to use iSCSI

Click Start and run MPIO.
Navigate to the Discover Multi-Paths tab.
Check the box to Add Support For iSCSI Devices.
Click OK and reboot the server when prompted.

For me I didn’t get prompted for a reboot and reopening MPIO showed the checkbox unchecked, I had to click the add button then I got a prompt to reboot:

Now before I continue to get MPIO working on the source side, I need to fix some mistakes I made on the Target side. To ensure I was safe to make the required changes on the target side I first did the following:

Completed any tasks that were using the disk for I/O
Validated no I/O for disk via Resource manager
Stopped any services that might use the disk for I/O
Took the disk offline in Disk Manager
Disconnected the Disc in iSCSI initiator

We are now safe to make the changes on the target before reconnecting the disk to this server, now on to FreeNAS.

FreeNAS Side

Source

I much like the source specified added an IP to the existing portal.. which I apparently shouldn’t have done.

Stop the iSCSI service for changes to be made.

Now delete the secondary IP from the one portal:

Now click add portal to create the secondary portal with the alternative IP.

There we go now just have to edit the target:

Now, that you have multiple portals/Group IDs configured with different IP addresses, these can be added to the targets.

Editing the existing targets to add iSCSI Group IDs

Once you have a target defined, you can click the Add extra iSCSI Group link to add the multiple Port Group ID backings.

Add extra iSCSI group IDs to each target in FreeNAS

Make sure you have the iSCSI service running. It does hurt at this point to bounce the service to ensure everything is reading the latest configuration, however with FreeNAS the configuration should take effect immediately.

Make sure iSCSI service is running in FreeNAS

Now we can go back to Windows to get the final configurations done. 🙂

Back on Windows

Configuring iSCSI

Launch iSCSI on the application server and select the iSCSI service to start automatically. Browse to the Discovery tab. Do the following for each iSCSI interface on the storage appliance:

Click Discover Portal.
Enter the IP address of the iSCSI appliance.
Click OK.
Repeat the above for each IP address on the iSCSI storage appliance.

Browse to Targets. An entry will appear for each available volume/LUN that the server can see on the storage appliance.

Configure Each Volume

For each volume, do the following:

Click Connect to open the Connect To Target dialogue.
Check the box to Enable Multi-Path.
Click Advanced. This will allow us how to connect the first iSCSI session from the first NIC on the server. We can connect to the first interface on the iSCSI appliance.
In the Advanced Settings box, select Microsoft iSCSI Initiator in Local Adapter, the first NIC of the server in Initiator IP, and the first NIC of the storage appliance in Target Portal IP.
Click OK to close Advanced Settings.
Click OK to close Connect To Target.

The volume is now connected. However, we only have 1 session between the first NIC of the server and the first NIC of the storage appliance. We do not have a fault-tolerant connection enabled:

Click Properties in the Targets dialogue to edit the properties of the volume connection.
Click Add Session.
Check the box to Enable Multi-Path.
Click Advanced.
Select Microsoft iSCSI Initiator in Local Adapter. Select the second iSCSI NIC of the server in Initiator IP and the second NIC of the storage appliance in Target Portal IP.

Click OK a bunch of times.

If you open Disk Management, your new volume(s) should appear. You can right-click a disk or volume that you connected, select properties, and browse to MPIO. From there, you should see the paths and the MPIO customizable policies that are being used by this disk.

I left the load balancing algo to Round Robin, as Noted from here:

MCS

Fail Over Only – This policy utilizes one path as the active path and designates all other paths as standby. Upon failure of the active path the standby paths are enumerated in a round robin fashion until a suitable path is found.
Round Robin – This policy will attempt to balance incoming requests evenly against all paths.
Round Robin With Subset – This policy applies the round robin technique to the designated active paths. Upon failure standby paths are enumerated round robin style until a suitable path is found.
Least Queue Depth – This policy determines the load on each path and attempts to re direct I\O to paths that are lighter in load.
Weighted Paths – This policy allows the user to specify the path order by using weights. The larger the number assigned to the path the lower the priority.
MPIO

As above plus

Least Blocks – This policy sends requests to the path with the least number of pending I\O blocks.

Now did it actually work?

Seems like it.. performance is still not as good as I expected. must keep optimizing!

Hope this helps someone…