Wireless ESXi Host

The Story

So, the other day I pondered an idea. I wanted to start making some special art pieces made from old motherboards, and then I also started to wonder could I actually make such an art piece… and have it functional?

I took an apart my old build that was a 1U server I made from an old PA-500 and a motherboard I repurposed from a colleague who gifted me their old broken system. Since it was a 1U system, I had purchased 2 special pieces to make it work, a special CPU heatsink (complete solid copper, with a side blower fan, and a 300 watt 1U PSU. both of which made lots of noise.

I also have another project going called “Operation Shut the fuck up” in which all the noisy servers I run will be either shutdown or modified to make zero noise. I hope with the project to also reduce my overall power consumption.

So I started by simply benching the Mobo and working off that, which spurred a whole interest into open case computer designs. I managed to find some projects on Thingiverse for 2020 extrusions and corner braces, cable ties… the works. The build was coming along swimmingly. There was just one thing that kept bugging me about the build… The wires…

Now I know the power cable will be required reguardless, but my hope was to have/install an outlet at the level the art piece was going to be placed at and have it nicely nested behind the art piece to hide it. Now there were a couple ways to resolve this.

  1. Use an Ethernet over Power (Powerline) adapter to use the existing copper power lines already installed in the house. (Not to be confused with PoE).
    There was just one problem with this, my existing Powerline kit died right when I wanted to use it for the purpose. (Looking inside looks like the, soldered to the board, fuse blew, might be as simple as replacing that but it could be a component behind the fuse failed and replacing it would simply blow the new fuse).
    *This is still a very solid option as the default physical port can be used and no other software/configuration/hackery needs to be done, (Plug n Play).
  2.  The next best option would be to use one of these RJ45 to Wireless adapters:
    Wireless Portable WiFi Repeater/Bridge/AP Modes, VONETS VAP11G-300.
    VONETS VAP11G-500S Industrial 2.4GHz Mini WiFi Bridge Wireless Repeater/Router Ethernet to WiFi Adapter
    This option is not as good as the signal quality over wireless is not has good as physical even when using Powerline adapters. However, this option much like the Powerline option, again allows the use of the default NIC, and only the device itself would need to be preconfigured using another system but otherwise again no software/configuration/hackery needs to be done.
  3.  Straight up use a WiFi Adapter on the ESXi host.

Now if you look up this option you’ll see many different responses from:

  1. It can’t be done at all. But USB NICs have community drivers.
    This is true and I’ve used it for ESXi hosts that didn’t have enough NICs for the different Networks that were available (And VLAN was not a viable option for the network design). But I digress here, that’s not what were are after, Wifi ESXi, yes?
  2.  It can’t be done. But option 1, powerline is mentioned, as well as option 2 to use a WiFi bridge to connect to the physical port.
  3.  Can’t be done, use a bridge. Option 2 specified above. and finally…
  4.  Yeah, ESXi doesn’t support Wifi (as mentioned many times) but….. If you pass the WiFi hardware to a VM, then use the vSwitching on the host.. Maybe…

As directly quoted by.. “deleted” – “I mean….if you can find a wifi card that capable, or you make a VM such as pfsense that has a wifi card passed through and that has drivers and then you router all traffic through some internal NIC thats connected to pfsense….”

It was this guys comment that I ran with this crazy idea to see if it could be done…. Spoiler alert, yes that’s why I’m writing this blog post.

The Tasks

The Caveats

While going through this project I was hit with one pretty big hiccup which really sucks but I was able to work past it. That is… It won’t be possible to Bridge the WAN/LAN network segments in OPNsense/PFsense with this setup. Which really sucked that I had to find this out the hard way… as mentioned by pfsense parent company here:

“BSS and IBSS wireless and Bridging

Due to the way wireless works in BSS mode (Basic Service Set, client mode) and IBSS mode (Independent Basic Service Set, Ad-Hoc mode), and the way bridging works, a wireless interface cannot be bridged in BSS or IBSS mode. Every device connected to a wireless card in BSS or IBSS mode must present the same MAC address. With bridging, the MAC address passed is the actual MAC of the connected device. This is normally a desirable facet of how bridging works. With wireless, the only way this can function is if all the devices behind that wireless card present the same MAC address on the wireless network. This is explained in depth by noted wireless expert Jim Thompson in a mailing list post.

As one example, when VMware Player, Workstation, or Server is configured to bridge to a wireless interface, it automatically translates the MAC address to that of the wireless card. Because there is no way to translate a MAC address in FreeBSD, and because of the way bridging in FreeBSD works, it is difficult to provide any workarounds similar to what VMware offers. At some point pfSense® software may support this, but it is not currently on the roadmap.”

Cool what does that mean? It means that if you are running a flat /24 network, as most people in home networks run a private subnet of 192.168.0.0/24, that this device will not be able to communicate in the layer 2 broadcast domain. The good news is ESXi doesn’t needs to work, or utilizes features of broadcast domains. It does however mean that we will need to manage routes as communications to the host using this method will have to be on it’s own dedicated subnet and be routed accordingly based on your network infrastructure. If you have no idea what I’m talking about here then it’s probably best not to continue on with this blog post.

Let’s get started. Oh another thing, at the time of this writing a physical port is still required to get this setup as lots of initial configurations still need to take place on the ESXi host via the Web GUI which can initially only be accessible via the physical port, maybe when I’m done I can make a mirco image of the ESXi hdd with the required VM, but even then the passthrough would have to be configured… ignore this rambling I’m just thinking stupid things…

Step 1) Have a ESXi host with a PCI-e based WiFi card.

I’ve tested this with both desktop Mobo with a PCI-e Wifi card, and a laptop with a built in Wifi Card, in both cases this process worked.

As you can see here I have a very basic ESXi server with some old hardware but otherwise still perfectly useable. For this setup it will be ESXi on USB stick, and for fun I made a Datastore on the remaining space on the USB stick since it was a 64 Gig stick. This is generally a bad idea, again for the same reasons mentioned above that USB sticks are not good at HIGH random I/O, and persistent I/O on top of that, but since this whole blog post is getting an ESXi host managed via WiFi which is also frowned upon why not just go the extra mile and really piss everyone off.

Again I could have done everything on the existing SATA based SSD and avoid so much potential future issue…. but here I am… anyway…

You may also note that at this time in the post I am connecting to a physical adapter on the ESXi host as noted by the IP addresses… once complete these IP addresses will not be used but remain bound the physical NIC.

Step 2) Create VM to manage the WiFi.

Again I’m choosing to use OPNsense cause they are awesome in my opinion.

I found I was able to get away with 1 GB of memory (even though min stated is 2) and 16 GB HDD, if I tried 8 GB the OPNsense installer would fail even though it states to be able to install on 4 GB SD Cards.

Also note I manually change boot from BIOS to EFI which has long been supported. At this stage also check off boot into EFI menu, this allows the VMRC tool to connect to ISO images from my desktop machine that I’m using to manage the ESXi host at this time.

Installing OPNsense

Now this would be much faster had I simply used the SSD, but since I’m doing everything the dumbest way possible, the max speed here will be roughly 8 MB/s… I know this from the extensive testing I’ve done on these USB drives from the ESXi install. (The install caused me so much grief hahah).

Wow 22 MB/s amazing, just remember though that this will be the HDD for just the OPNsense server that won’t need storage I/O, it’ll simply boot and manage the traffic over the WiFi card.

And much like how ESXi installed on the exact same USB drive, we are going to configure OPNsense to not burn out the drive. By following the suggestions in this thread.

Configuring  OPNsense

Much like the ESXi host itself at this point I have this VM connected to the same VMPG that connects to my flat 192.168 network. This will allow us to gain access to the web interface to configure the OPNsense server exactly in the same manner we are currently configuring the ESXi host. However, for some reason the main interface while it will default assign to LAN it won’t be configured for DHCP and assumes 192.168.1.1/24 IP… cool, so log into the console and configure the LAN IP address to be reachable per your config, in my case I’m going to give it an IP address in my 192.168.0.0/24 network.

Again this IP will be temporary to configure the VM via the Web GUI. Technically the next couple steps can be done via the CLI but this is just a preference for me at this time, if you know what you are doing feel free to configure these steps as you see fit.

I’m in! At this point I configure SSH access and allow root and password login. Since this it a WiFi bridged VM and not one acting as a firewall between my private network and the public facing internet this is fine for me and allows more management access. Change these how you see fit.

At this point, I skip the GUI wizard.  Then configured the settings per the link above.

Even with only 1 GB of memory defined for the VM, I wonder if this will cause any issues, reboot, system seems to have come up fine… moving on.

Holy crap we finally have the pre-reqs in place. All we have to do now is configure the WiFi card for PCI passthrough, give it to the VM, and reconfigure the network stacks. Let’s go!

Locate WiFi card and Configure Passthrough

So back on the ESXi web interface go to … Host -> Manage -> Hardware and configure the device for pasththrough until, you find all devices are greyed out? What the… I’ve done this 3 times what happed….

All PCI Passthrough devices grayed out on ESXi 6.7U3 : r/vmware (reddit.com)

FFS, OK I should have mentioned this in the pre-reqs but I guess in all my previous builds test this setting must have been enabled and available on the boards I was using… I hope I’m not hooped here yet again in this dang project…

Great went into the BIOS could find nothing specific for VT-d or VT-x (kind of amazed VM were working on this thing the whole time. I found one option  called XD bit or something, it was enabled, I changed it to disabled, and it caused the system to go into a boot loop. It would start the ESXi boot up and then half way in randomly reboot, I changed the setting back and it works just fine again.

I’m trying super hard right now not to get angry cause everything I have tried to get this server up and running while not having to use the physical NIC has failed… even though I know it’s possible cause I did this 2 other times successfully and now I’m hung cause of another STUPID ****ING technicality.

K I have one other dumb idea up my ass… I have a USB based WiFi NIC, maybe just maybe I can pass that to OPNsense…

VMware seems to possibly allow it: Add USB Devices from an ESXi Host to a Virtual Machine (vmware.com)

OPNsense… Maybe? compatible USB Wifi (opnsense.org)

Here goes my last and final attempt at this hardware….

Attempting USB WiFi Passthrough

Add device, USB Controller 2.0.

Add Device, Find USB device on host from drop down menu.

Boot VM….. (my hearts racing right now, cause I’m in a HAB (Heightened Anger Baseline) and I have no idea if this final work around is going to work or not).

Damn it doesn’t seem to be showing under interfaces… checking dmesg on the shell…

I mean it there’s it has the same name as the PCI-e based WiFi card I was trying to use, but that is 1) pulled from the machine, and 2) we couldn’t pass it through, and dmesg shows it’s on the usbus1… that has to be it… but why can’t I see it in the OPNsense GUI?

OMG… I think this worked… I went to Interfaces wireless, then added the run0 I saw in dmesg….

I then added as an available interface….

For some weird reason it gave it a weird assignment as WifIBridge… I went back into the console and selected option 2 to assign interfaces:

Yay now I can see an assignable interface to WAN. I pick run0

Now back into OPNsense GUI… OMG… there we go I think we can move forward!

Once you see this we can FINALLY start to configure the wireless connection that will drive this whole design! Time for a quick break.

Configuring WiFi on OPNsense

No matter if you did PCI-e passthrough or USB passthrough you should now have an accessible OPNsense via LAN, and assigned the WiFi device interface to WAN. Now we need to get WAN connected to the actual WiFi.

So… Step 1) remove all blocking options to prevent any network issues, again this is an internal bridge/router, and not a Edge Firewall/NAT.

Uncheck Block Private Networks (Since we will be assigning the WAN interface a Private IP), and uncheck Block bogon networks.

Step 2) Define your IP info. In my case I’m going to be providing it a Static IP. I want to give it the one that is currently being used to access it that is bound to the vNIC, but since it’s alread bound and in use we’ll give it another IP in the same subnet and move the IP once it’s released from the other interface. For now we will also leave it as a slash 32 to prevent a network overlap of the interface bound on LAN thats configured for a /24.

No IPv6.

Step 3) Define SSID to connect to and Password.

I did this and clicked apply and to my dismay.. I couldn’t get a ping response… I ssh’d into the device by the current VMX nic IP and even the device itself couldn’t ping it (interface is down, something is wrong).

Checking the OPNsense GUI under INterface Assignments I noticed 2 WiFI interfaces (somehow I guess from me creating it above, and then running the wizard on the console?).

Dang I wanted to grab a snip, but from picking the main one (the other one was called a clone), it has now been removed from the dropdown, and after picking that one the pings started working!

Not sure what to say here, but now at this point you should have a OPNsnese server accessible by LAN (192.168.0.x) and WAN (192.168.0.x). The next thing is we need to make the Web interface accessible by the WAN (Wireless) interface.

Basically, something as horrendous as this drawing here:

Anyway… the first goal is to see if the WiFi hold up, to test this I simply unplug the physical cable from the beaitful diagram above, and make sure the pings to the WAN interface stay up… and they both went down….

This happened to me on my first go around on testing this setup… I know I fixed it.. I just can’t remember how… maybe a reboot of the VM, replug in physical cable. Before I reboot this device I’ll configure a gateway as well.

Interesting, so yup that fixed the WiFi issue, OPNsense now came up clean and WiFi still ping response even when physical nic is removed from the ESXi host… we are gonna make it!

interesting the LAN IP did not come up and disappeared. But that’s OK cause I can access the Web GUI via the WAN IP (Wirelessly).

finally OK, we finally have our wireless connection, now we just need to create a new vSwitch and MGMT network on the ESXi host that we will connect to the OPNsense on the VMX0 side (LAN) that you can see is free to reconfigure. This also free’d the IP address I wanted to use for the WAN, but since I’ve had so many issues… I’m just going to keep the one I got working and move on.

Configure the Special Managment network.

I’m going to go on record and say I’m doing it this way simply cause I got this way to work, if you can make it work by using the existing vSwitch and MGMT interfaces, by all means giver! I’m keeping my existing IPs and MGMT interfaces on the default switch0 and creating a new one for the wireless connection simply so that if I want to physically connect to the existing connection.. I simply plug in the cable.

Having said that on the ESXi host it’s time to create a new vSwitch:

Now create the new VMK, the IP given here is the in the new subnet that will be routed behind the OPNsense WAN. In my example I created a new subnet 192.168.68.0/24 this will be routed to the WAN IP address given to OPNsense in my example here that will be 192.168.0.33. (Outside the scope of this blog post I have created routes for this on my devices gateway devices, also since my machine is in the same subnet at the OPNsense WAN IP, but the OPNsense WAN IP address is not my subnets gateway IP this can cause what is known as asymetric routing, to resolve this you simply have to add the same route I just mentioned to the machine managing the devices. You have been warned, design your stuff better than I’m doing here… this is all simply for educational purposes… don’t do this ever in production)

Now we need to create a VMPG for the VM to connect the VMX0 IP into the new vSwitch to provide it the gateway IP for that new subnet (192.168.68.1/24)

Now we can finally configure the vNIC on the OPNsense VM to this new VMPG:

Before we configure the OPNsense box to have this new IP address let’s configure the ESXi gateway to be that:

OK finally back on the OPNsense side let’s configure the IP address…

Now to validate this it should simply be making sure the ESXi host can ping this IP…

All I should have to do now is configure the route on my machine doing all this work and I should also be able to ping it…

More success… final step.. unplug physical nic to pings stay up?? OMG and they do!!! hahaha:

As you can see the physical NIC IP drops but the new secret MGMT IPs behind the WiFi stay up! There’s one final thing we need to do though.

Configure Auto Start of OPNsense

This is a critical step in the design setup as the OPNsense needs to come up automatically in order to be able to manage the ESXi host if there is ever a reboot of the host.

Then simply configure the auto start setting for this VM:

I also go in and change the auto start delay to 30 seconds.

Summary

And there you have it… and ESXi host completely managed via WiFi….

There are a ton of limitations:

  1. No Bridging so you can’t keep a flat layer 2 broadcast domain. Thus:
  2. Requires dedicated routes and complex networking.
  3. All VM traffic is best handled directly on internal vSwitch otherwise all other VM traffic will share the same WiFi gateway providing a terrible experince.
  4. The Web interface will become sluggish when the network interface is under load.
  5.  However it is overall actually possible.
  6. * Using PCI-e passthrough disallows snapshots/vMotions of the OPNsense VM but USB does allow it, when doing a storage vMotion the VM crashed on me, for some reason auto start disabled too had to manually start the VM back up. (I did this by re-IPing the ESXi server via console and plugging in a phsyical cable)
  7. With USB WiFi Nic connections can be connected/disconnected from the host, but with PCI-e Passthrough these options are disabled.
  8. With USB NIC you can add more vNICs to OPNsense and configure them, it just brings down the network overall for about 4-5 min, but be patient it does work.Here’s a Speedtest from a Windows Virtual Machine on the ESXi host.

Hope you all enjoyed this blog post. See ya all next time!

*UPDATE* Remember when I stated I wanted to keep those VMKs in place incase I ever wanted to plug the physical cable back in? Yeah that burnt me pretty hard. If you want a backup physical IP make it something different then you existing network subets and write it down on the NIC…

For some really strange reason HTTPS would work but all other connections such as SSH would timeout very similar to an asymmetric routing issue, and it actually cause it kind was. I’m kinda shocked that HTTPS even managed to work… huh…

Here’s a conversation I had with other on VMware IRC channel trying to troubleshoot the issue. Man I felt so dumb when I finally figured out what was going on.

*Update 2* I notice that the CPU usage on the OPNsense VM would be very high when traffic through it was taking place (and not even high bandwidth here either) AND with the pffilter service disabled, meaning it working it pure routing mode.

High CPU load with 600Mbit (opnsense.org)

Poor speeds and high CPU usage when going through OPNsense?

“Furthermore, set the CPU to 1 core and 4 sockets. Make sure you use VirtIO nics and set Multiqueue to 4 or 8. There is some debate going on if it should be 4 or 8. By my understanding, setting it to 4 will force the amount of queues to 4, which in this case matches your amount of CPU cores. Setting it to 8 will make OPNsense/FreeBSD select the correct amount.” Says Mars

“In this case this is also comparing a linux-based router to a BSD based one. Linux will be able to scale throughput much easily with less CPU power required when compared to the available BSD-based routers. Hopefully with FreeBSD 13 we’ll see more optimization in this regard and maybe close the gap a bit compared to what Linux can do.” Says opnfwb

Mhmmm ok I guess first thing I can try is upping the CPU core count. But this VM also hosts the connection I need to manage it… Seems others have hit this problem too…

Can you add CPU cores to VM at next restart? : r/vmware (reddit.com)

while the script is decent, the comment by cowherd is exactly what I was thinking I was going to do here: “Could you clone the firewall, add cores to the clone, then start it powering up and immediately hard power off the original?”

I’ll test this out when time permits and hopefully provide some charts and stats.

Using Fake PMem to Strach That Itch

Sooooo… let’s say you have a ESXi server setup, with lots of memory, you have local storage and you could simply install ESXi on that and be done with it… but let’s just say you wanna be a hard ass and use the old USB install method. VMware’s ESXi will allow such an install but also warn you that no scratch (USB drives are not meant for heavy sustained I/O and fail due to this, thus they disable scratch here) and no core dump locations are configured (no way it could write the data fast enough, or be reliable with there’s a system crash).

So you wanna stick to your guns and have ESXi installed on USB, and you already disabled the annoying alert for core dumps cause this is a lab host and you simply don’t care…. alright….

Even then the pestering “System logs on host localhost.localdomain are stored on non-persistent storage. Consult product documentation to configure a syslog server or a scratch partition.” shows up, in my previous post I simply added a note…

Note* Option 2 was still required to get rid of another message: System logs are stored on non-persistent storage (2032823) (vmware.com)

That being “Option 2 – Set Syslog.global.logDir”, for this special beast of a lab setup though while we DO have an SSD planned to be a datastore, and again we could simply have installed ESXi here and moved on with my life. I want this to be the most comlpex ESXi host ever, and instead have full separate of the SSD and the ESXi host SO install, in a way the SSD can be fully unplugged up the system would still boot fine (given the USB drive is still alive).

So now I discovered you can create a datastore in memory, what a perfect place for a scratch partition (I know it won’t actually persist as noted by Willam himself and it won’t mean anything cause any core dumps won’t exist if the system experiences a power failure, but in my use case I simply don’t care I just need scratch, and what better place then memory…. Yes I’m aware scratch is suppose to be when memory is low, but if its reserved it’ll be there even when the host thinks there’s no memory to flow over to)… Anyway…

Where the fuck was I… oh Yeah PMem… Let’s see if this works in ESXi 7…

How to simulate Persistent Memory (PMem) in vSphere 6.7 for educational purposes?  (williamlam.com)

“Disclaimer: This is not officially supported by VMware. Unlike a real physical PMem device where your data will be persisted upon a reboot, the simulated method will NOT persist your data. Please use this at your own risk and do not place important or critical VMs using this method.

In ESXi 6.7, there is an advanced boot option which enables you to simulate or “fake” PMem by consuming a percentage of your physical ESXi hosts memory and allocating that to form a PMem datastore. You can append this boot option during the ESXi boot up process (e.g. Control+O) or you can easily manage it using ESXCLI which is my preferred method of choice.

Run the following command and replace the value with the desired percentage for PMem allocation:”

esxcli system settings kernel set -s fakePmemPct -v 5

“Note: To disable fake PMem, simply set the value to 0

You can also verify whether fake PMem is enabled or its current configured value is by running the following command:

esxcli system settings kernel list -o fakePmemPct

For the changes to go into affect, you obviously will need to reboot your ESXi host.”

Man I fought with getting my install to work for hours cause of a faulty USB drive.. I really should just install on the SSD… nah! I want the difficult way!

… Reboot and…. finally! it shows 🙂

I went to change the scratch location, but providing the path with the escape character didn’t work…. but specifying it directly the GUI took it.

as you can see the old path, but after reboot I got the error again, looks like it didn’t like the path…

That’s great…. the datastore name changes, and so does it’s mounted path, which explains why the error popped up again.. OI

So much for that idea

 

PVE Hosts Won’t Boot, Missing Drive

This is pretty dumb… every other hypervisor I’ve ever played with, if the boot drive is fine… the OS boots… period….

Yet the other day I tried to boot my PVE host and it just won’t boot it would get stuck stating that a datastore (nothing that’s a dependency for the OS to actually boot) was causing the OS not to boot….

I found this PVE thread that was more recent with a comment that worked for the OP.

“if you created the partition via the gui, we create a systemd-mount unit under /etc/systemd/system

(e.g. mnt-datastore-foo.mount)

you can disable that unit with ‘systemctl disable <unit>’

or delete the file

we’ll improve the docs in the near future and have planned to make the gui disk management a bit easier in regards to removing/reusing”

This was posted in 2021, yet when I checked that path (booting into Recovery mode) it didn’t contain any file with a ending of .mount… So not sure what this is about. I did however find this thread which was exactly my problem… and funny enough the OP literally posted their own answer (which is the answer here as well) and no other comments made on the post, which was created in  2018…

[SOLVED] – Reboot Proxmox Host now will not boot | Proxmox Support Forum

“So I had upgraded some packages and the proxmox host recommended rebooting the system. After rebooting the system hangs at the screen showing [DEPEND] in yellow for 3 lines:
Dependency failed for /mnt/Media
Dependency failed for local file system
Dependency failed for File system check on /dev/Data-Storage/Media

I tried running control-D to continue but it does not continue.

I’m guessing I need to clean up the entries how can i do that? I’m assuming I just need to boot into emergency and edit /etc/fstab and remove the entries?

OK yes removing those from /etc/fstab fixed it and now it boots.”

This is exactly what I did as well… I saw the offending entry which was a BTRFS storage I had configured in the past and that storage unit had been shutdown. (I thought I blogged this, but I only blogged about using LVM over iscsi.. Configuring shared LVM over iSCSI on Proxmox – Zewwy’s Info Tech Talks)

Anyway, removing the entry from fstab and rebooting.. bam PVE host came right up.

Constructive criticism to PVE, while yes any knowledgeable Linux sysadmin will figure out how to fix this, as I just did here. However, how about NOT having the boot process fail simply cause a configured storage is not available… like all other hypervisors… BOOT the host and show the storage as failed in the management UI to clean it up that way…. Just.. food for thought….

PA VM in bazaar state… by Design

So today I had some weird stuff happening (Fedora Download was downloading slow, 300 KB/s)… I thought it was the mirror, but no matter what mirror I picked I had the same results, I asked a buddy to verify my findings and they could download Fedora with speed… Long story short, I thought maybe it was my firewall, and my colleague mentioned the same. Since this is a Lab setup it would be nice to get a perpetual license for learning purposes, but PAN clearly don’t work like. I was pretty sure my license had expired, so decided to first quick finds out what happens when a license expires: What Happens When Licenses Expire? (paloaltonetworks.com)…

Threat Prevention
Alerts appear in the System Log indicating that the license has expired.
You can still:
  • Use signatures that were installed at the time the license expired, unless you install a new Applications-only content update either manually or as part of an automatic schedule. If you do, the update will delete your existing threat signatures and you will no longer receive protection against them.
  • Use and modify Custom App-ID™ and threat signatures.
You can no longer:
  • Install new signatures.
  • Roll signatures back to previous versions.

Good to know, nothing that would cause the issue I’m experiencing….

DNS Security
You can still:
  • Use local DNS signatures if you have an active Threat Prevention license.
You can no longer:
  • Get new DNS signatures.

nope… and…

Advanced URL Filtering / URL Filtering
You can still:
  • Enforce policy using custom URL categories.
You can no longer:
  • Get updates to cached PAN-DB categories.
  • Connect to the PAN-DB URL filtering database.
  • Get PAN-DB URL categories.
  • Analyze URL requests in real-time using advanced URL filtering.
WildFire
You can still:
  • Forward PEs for analysis.
  • Get signature updates every 24-48 hours if you have an active Threat Prevention subscription.
You can no longer:
  • Get five-minute updates through the WildFire public and private clouds.
  • Forward advanced file types such as APKs, Flash files, PDFs, Microsoft Office files, Java Applets, Java files (.jar and .class), and HTTP/HTTPS email links contained in SMTP and POP3 email messages.
AutoFocus
You can still:
  • Use an external dynamic list with AutoFocus data for a grace period of three months.
You can no longer:
  • Access the AutoFocus portal.
Cortex Data Lake
You can still:
  • Store log data for a 30-day grace period, after which it is deleted.
  • Forward logs to Cortex Data Lake until the end of the 30-day grace period.
GlobalProtect
You can still:
  • Use the app for endpoints running Windows and macOS.
  • Configure single or multiple internal/external gateways.
You can no longer:
  • Access the Linux OS app and mobile app for iOS, Android, Chrome OS, and Windows 10 UWP.
  • Use IPv6 for external gateways.
  • Run HIP checks.
  • Enforce split tunneling based on destination domain, client process, and video streaming application.

All a bunch of nope…

VM-Series
Support
You can no longer:
  • Receive software updates.
  • Download VM images.
  • Benefit from technical support.

This is a VM series yes… so what does that link mean….

VM-Series
You can still:
You can continue to configure and use the firewall you deployed prior to the license expiring with no change in session capacity. The firewall won’t reboot automatically and cause a disruption in traffic.
However, if the firewall reboots for any reason, the firewall enters an unlicensed state. While unlicensed, a firewall supports a maximum of 1,200 sessions. No other management plane features or configuration options are restricted.

OK… Maybe… but I’m sure a download of a single file doesn’t take over 1,200 sessions… while I did reboot the unit (cloned, power off OG, power on clone, etc)

All other things are the same as posted above… Then I noticed some really weird things….

  1. Checking for updates doesn’t state anything about license status, just tries and quietly fails.
  2. Checking support status shows “Device not found on this update server”
  3. Dynamic Updates do not show a “currently installed” version.
    1. The current version installed with Review Policies, and review apps under action.
    2. The previous installed one will have the same plus a revert action.
    3. Downloaded one will have an install action.
    4. All others seen since last communication to PAN will have download
  4. Retrieving licenses from licenses server returns “Failed to install features. The device is not found.
  5. Finally the smoking gun… Serial Number on the Dashboard will be listed as unknown.

So, I ended Googling this and found not one, but TWO KB’s!!!

Serial number becomes “unknown” after changing the instance typ… – Knowledge Base – Palo Alto Networks

and

Serial number becomes “unknown” upon rebooting PA-VM – Knowledge Base – Palo Alto Networks

After reading these, it all made sense… and it’s all rather dumb… to paraphrase it simply….

It’s due to DRM, how the DRM works is it derives the serial number from two ID’s CPUID and UUID… and when you migrate a PAN VM the CPU is different cause of the different host it resides… this in turn breaks the licensing.

*Standing Ovation*

What’s PAN solution… Open a support ticket… that’s right.. instead of coming up with a technical solution to make DRM work while still retaining the ability to migrate the VM (The most important and valuable reason why you want to run it as a VM anyway)….

Instead of having a way to edit the CPUID and UUID in the PAN portal to fix this yourself…..

No they want you to waste their tech support personals time….

This ….. IS……. DUMB!!!!!

Rebuild FreeNAS/TrueNAS

So, the boot drive died in my FreeNAS server that was running 11.1u1. Good build. I figured since it died now would be a time to try TrueNAS Core. My SAN didn’t seem to be booting the installer, I tried different USB sticks and eventually just ended up using my IODD device, but all of which wouldn’t show the installer (albet on xterm/serial connection since the SAN is headless).

I decided to run the installer on a laptop and installing on …a….USB stick… wait a second….. when FreeNAS was first failing I tried moving the system log files off there since USB drives aren’t really meant for high R/W operations. These are set by the System -> System Dataset. What I think happed was the drive probably could’ve lasted a while longer but since the log files was still configured for the default “boot-pool” aka on the USB boot stick, and I had configured iSCSI with proxmox and at the end I discovered the log flooding problem… I bet it was this log flooding along with writing and log rotating on the USB boot device must have killed it…. Just wow man….

Anyway, I installed on a USB stick on a laptop picking the USB drive as a souce and picking BIOS boot option. I plugged the device into my SAN and boot up, now watching the boot operation (after verifying the boot order was correct) via the serial console window, after the system info the console gets stuck at “Press any key to continue”. I looked at the USB drive and the light indicated that something was happening, also looking at the NIC lights seemed something was going on, but also seem a bit pattern like and I wasn’t sure if something or anything was working. I decided to quickly check my DHCP server for any new entries and sure enough there it was….

I navigated to the address on my browser and sure enough, it booted!

There it is the dataset we need to change, now I don’t have any other pools defined so I can’t change it just yet, but lets make this our first priority so that if another bug from proxmox creeps in it won’t break my SAN. To do that I’m going to need to import the ZFS pools I had on this SAN when FreeNAS died.

Storage -> Pools -> Add

What a clean look. Import, No encryption. Looking for pools, this took a while.

There’s my pools! Sweet:

Next… exciting stuff…

Click Import.

Waiting….. waiting…. Let’s go!

What’s this notification…

I’ll leave that for now… can I change my system dataset? Oh no way it did it automatically….

well… even though I managed to import the pools successfully, since I can’t remember how the extents were made, I think they were file based, but no matter how I make it file based or device based the ESXi hosts see the drive but they aren’t automatically importing the datastore, probably cause they can’t see the FS that’s suppose to be on the iSCSI disk that’s being presented (the extents). I should have simply kept them device based and used the root pool. To add icing to the cake the one server I wanted to recover was the only one I didn’t run the backup copy job on after my USB drive that was hosting those files went belly up and I had to rerun all the jobs, AND the Veeam servers main backup repo was on one of the extents I can’t recover… so even though I had 3-2-1 rule in place I still ended up losing this server….

this is too much for me today, I’m goin to bed…

A new day, but end of it, so won’t finish this yet today either but there is hope.

I found this, which lead to this, which lead to this, and there’s sign of hope seeing the same thing in my own vmkernel log file.

So there it is, and ran the commands as specified in the threads and KB:

tail /var/log/vmkernel.log

Which showed me the volume not mounting due to snapshot.

esxcli storage vmfs snapshot list

Which showed me my old Datastore information.

esxcli storage vmfs snapshot resignature -l "iOSlow"

Which mounted my datastore under a new name:

Sure enough connected the backup drive to my Veeam server. And restored my missing VM. Now I just need to move the rest of the data and create a clean datastore. Or I guess I could simply rename it, but it didn’t auto mount on the other esxi server.

Remove Orphaned Datastore in vCenter Again

Story

I did this once before, but that time was due to rebuilding a ESXi host and not removing the old datastore. This time however it’s due to the storage server failing.

The Precursor

I did NOT want to write this blog post. This post comes from the fact that VMware is not perfect and I’m here to air some dirty laundry…. Let’s get started.

What Happened?

The short answer is I don’t fully know, all I know is that the backend storage server (FreeNas 11.1u7) running iSCSI started showing weird signs of problems (Reporting Graphs not rendering). Since I wanted to possibly do some Frankenstein surgery on the unit (iOmega px12-350r). I started to vMotion the primary VMs I needed on to local ESXi storage.

Even though I checked the logs, I can’t determine what is causing all the services to not start.  Trying to manually start it, just showed gibberish in the system log.

The Problem

Since I couldn’t get it back up they show as inaccessible in vCenter:

Attempting to unmount them results in an error:

Not sure what that means, I even put the host in maintenance mode and gives the same error. Attempting to remove the iSCSI configuration to which hosts those datastores, also errors out with:

Strange how can there be active sessions when it literally dead?

I tried following my old blog post on a similar case, but I was only able to successfully unmount the datastore via esxcli but the Web GUI would still show them…

esxcli storage filesystem list
esxcli storage filesystem unmount -u UID

Any attempt to set them as offline failed as they were status as dead anyway…

As you can see no diff:

Solutions?

I attempted to look up solutions, I found one post of a similar nature here:

How to remove unmounted/inaccessible datastore from ESXi Host (tomaskalabis.com)

When I attempted to run the command,

esxcli storage core device detached remove -d naa.ID

it sadly failed for me:

I was at a dead end… I could see the dead devices with no files or I/O bound to them, but I can’t seem to removed them.. they show as detached…

esxcli storage core device detached list

as a last ditch effort I rescanned one last time and then ran the command to check for devices.

esxcli storage core adapter rescan --all
esxcli storage core device list

checking the Web Gui I could see the Datastores gone but the iSCSI config was still there, attempting to remove it would result in the same error as above. Then I realized there were still static records defined, once I deleted them, everything was finally clean on the host.

Do It Again!

Since this seem to be a per host thing lets see if we can fix it without maintenance mode, or moving VMs. Test host.. this broken datastores check:

Turns out its even easier… just remove static iSCSI targets, remove dynamic target, rescan storage and adapters:

I guess sometimes you just overthink things and get lead down rabbit holes when a simple solution already easily exists. I followed these simple steps on the final host and oddly one datastore lingered:

Well let’s enable SSH and see what’s going on here…

esxcli storage filesystem list
esxcli storage filesystem unmount -u 643e34da-56b15cb2-0373-288023d8f36f

esxcli storage core device list
esxcli storage core device set -d naa.6589cfc0000005e95e5e4104f101a307 --state=off

“Unable to set device’s status. Error was: Unable to change device state, the device is marked as ‘busy’ by the VMkernel.: Busy”

Mhmmm different then last time, which might explain why it wasn’t auto removed.

esxcli storage core device world list -d naa.6589cfc0000005e95e5e4104f101a307

hostd-worker and if I run the command to get process VMs it doesn’t show makes me think the old scratch/core dump…

I’m not sure what restarting HostD does so I’ll move critical VMs off just to be save and then test restarting that service to see if it released it’s strangle hold…

/etc/init.d/hostd restart

After this it did show disconnected from vCenter for a short while, then came back, and the old Datastore was done.

Although the datastore was gone.. the disk remained, and I couldn’t get rid of it.

I don’t get it… do I have to reboot this host….

ughh reboot worked… what a pain though.

If you want to know what datastore/UUID is linked to what disk run

esxcli storage vmfs extent list

Now for G9-SSD2, I tried to remove it since it showed signs of on the way out. and I couldn’t… seem like an on going story here. I could only unmount it from the CLI.

Weird, I deleted The G9-SSD3 normally, then I detached the disk containing G9-SSD2. Then when I recreated G9-SSD3, the G9-SSD2 just disappeared. The drive still shows as unconsumed and detached.

Now I have to go rebuilt my shared storage server…

Getting A’s for my Site

Secure HTTP(S)

Yes, securing the secure…

First up SSL(The Certificates serving this website):  SSL Server Test

Old Score: B

Since I use HAProxy Plugin for OPNsense. OPNsense HTTP Admin Page -> Services (Left hand Nav) -> HA Proxy -> Settings -> Global Parameters -> SSL Default Settings (enable) -> Min Version TLS1.2. -> Cipher List

Old list:  ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256

Remove the last entry: ECDHE-RSA-AES128-SHA256.

Apply. New Score now gives A. Yay.

What about your short(Security) game?
What are you talking about… oh… HTTP headers… oh man here we goo…

HTTP Header Security

Site to test: https://securityheaders.com/

results:

Phhh get real… an F… c’mon, so this took me WAY longer than I’d like to admit and I went down several rabbit holes before finally coming to the answer. We’ll cover one at a time:

Referrer Policy

What is it?

Well we got two sources… W3C and Mozilla which is a bit more readable.. in short:

“The Referrer-Policy HTTP header controls how much referrer information (sent with the Referer header) should be included with requests. Aside from the HTTP header, you can set this policy in HTML.”

Bunch of tracking rubbish it seems like.

What types can be configured?

Referrer-Policy: no-referrer
Referrer-Policy: no-referrer-when-downgrade
Referrer-Policy: origin
Referrer-Policy: origin-when-cross-origin
Referrer-Policy: same-origin
Referrer-Policy: strict-origin
Referrer-Policy: strict-origin-when-cross-origin
Referrer-Policy: unsafe-url

Which is the safest?

(As in most browser compatible) From checking the Mozilla site seems like “strict-origin-when-cross-origin” but this type seem to give you an F grade. on Security. I’m assuming “no-referrer” is most secure (our goal).

What’s the impact?

Site working on different web browsers, old browsers may stop working.

When a user leaves your website from a link that points elsewhere, it may be useful for the destination server to know where the user came from (your website). It might also be more appropriate that you don’t tell them any information about your website. The referrer header that is sent is typically a string that includes the URL of the page that the user clicked the link to the destination. There are multiple ways to configure if and what information is sent, but things to keep in mind are referrers may be necessary to properly configure web advertisements, analytics, and some authentication platforms. You can also ensure that an HTTPS URL is not leaked into HTTP headers (and consequently leaking website path information unencrypted across the internet).”

How do you configure it?

This is a bit of a loaded question, which took me a while to figure out. If you are behind a load balancer, do you configure it on the load balancer side, or the backend server that actually servs the web content? (Turns out you configure it on the backend server).

In my case HA Proxy is the service that serves the website externally, while the backend server that hosts the web content is actually Apache (subject to change), but at the time of this writing that’s the hosting back end. Now it was actually a TurnKey Linux appliance, but maintained manually (OS patches and updates).

Now I found many guides online stating how to apply it, however some failed to mention all the pre-reqs, however the use of apache2 or httpd is dependent on the Linux distro. In our case it’s apache2. What is that dependency you might be asking… well it’s the headers module. Which can be verified is available by checking: /etc/apache2/mods-available/headers.load

To enable it:

a2enmod headers

failure to do so will cause the service to not start when calling the service restart command, and if you decided to use the .htaccess method instead the service will successfully restart but when you try to navigate to the website you’ll be greeted with an internal server error page.

After that I finally added this to my apache config file (which is also another loaded statement as to which file this is, as it’s referenced as so many different things on the internet, in my case it was /etc/apache2/site-enabled/wordpress.conf)

Header always set Referrer-Policy: "no-referrer"

I also added the no downgrade as a root option, but the no referrer was set under both VirtualHosts (even though the snippet below shows just port 80 host being configured).

Even though, for some reason, I can’t explain, the root will never obey the policy change defined:

Just ignore it… we got it completed on the scan 🙂 (Double checking if you ignore “General” and look down at “Response Headers” you can actually see it take affect there.)

Finally one down.. and a D… a solid D if you know what I mean…

Content Security Policy

What is it?

Content-Security-Policy is a security header that can (and should) be included on communication from your website’s server to a client. When a user goes to your website, headers are used for the client and server to exchange information about the browsing session. This is typically all done in the background unbeknownst to the user. Some of those headers can change the user experience, and some, such as the Content-Security-Policy affect how the web-browser will handle loading certain resources (like CSS files, javascript, images, etc) on the web page.

Content-Security-Policy tells the web-browser what resource locations are trusted by the web-server and is okay to load. If a resource from an untrusted location is added to the webpage by a MiTM or in dynamic code, the browser will know that the resource isn’t trusted and will fail to process that resource.

Pretty much, you know what your sites uses for external dependencies and you strictly allow only what you know you should be serving. If someone tries to mimic your website and do drive by downloads to alternative domain, this should block it.

For more details: Content-Security-Policy (CSP) Header Quick Reference

 How to enable?

Again, subjective but in our case just added to our apache/httpd conf file:

Header always set Content-Security-Policy: "default-src ‘self’ zewwy.ca"

What’s the Impact?

This one is actually a bigger PITA then I ordinally thought, keep reading to see.

The only thing I noticed failing was calls to strip.com, who are they… mhmmm… official source states:

“Stripe. js (and its iOS and Android SDK counterparts) is a JavaScript library that businesses use to integrate Stripe and accept online payments. Once Stripe. js is added to a site or mobile app, fraud signals are used to differentiate legitimate behavior from fraudulent behavior.”

In preparation for this I did see a call out to Facebook, and I know I use a plugin “Ultimate Social Media PLUS” that manages the social links and buttons on my site (I guess those might have to be added, not sure how hyperlinks go in this regard). Anyway, here’s the snip, and I simply disabled the button for that social link and it was gone from my home page. Here’s a snip of what it looked like:

K I was about to get into another rabbit hole and test and validate an assumption, as the only plugin I have that would connect to something like that is my donations button/plugin. However while attempting to manage it I kept getting a pop up about disabling a plugin on my WordPress admin page:

(Funny cause as I was testing this externally, I also forgot about my image hosting provider imgur so my pictures weren’t rendering. Will have to add them too.)

I temp reverted the config as it was causing havoc on my website.

Now lets try and deal with all the things…:

  1. Stripe… so after reading this guys blog post, it seems to be true. Pretty invasive for something I’m not using. I’m using PayPal donate button but not Stripe. There’s an option in the Plugin I’m using, but turning it off still shows the js call being made on my homepage with nothing on it. Only by deactivating the plugin entirely does the call go away. So be it I guess, even though I just fixed my donations button…
    Meh, took this button link and saved it on my homepage for now.
  2. When navigating to my blog directory, I noticed one network connection I was unaware of “s.w.org”

Not knowing what this was, I started looking in the HTML for where it might be:

What is that? 9 years ago, and this is still the case?!?

Following a more modern guide, I simply added the following to my theme functions.php file:

/**
* Disable the emoji's
*/
function disable_emojis() {
remove_action( 'wp_head', 'print_emoji_detection_script', 7 );
remove_action( 'admin_print_scripts', 'print_emoji_detection_script' );
remove_action( 'wp_print_styles', 'print_emoji_styles' );
remove_action( 'admin_print_styles', 'print_emoji_styles' ); 
remove_filter( 'the_content_feed', 'wp_staticize_emoji' );
remove_filter( 'comment_text_rss', 'wp_staticize_emoji' ); 
remove_filter( 'wp_mail', 'wp_staticize_emoji_for_email' );
add_filter( 'tiny_mce_plugins', 'disable_emojis_tinymce' );
add_filter( 'wp_resource_hints', 'disable_emojis_remove_dns_prefetch', 10, 2 );
}
add_action( 'init', 'disable_emojis' );

/**
* Filter function used to remove the tinymce emoji plugin.
* 
* @param array $plugins 
* @return array Difference betwen the two arrays
*/
function disable_emojis_tinymce( $plugins ) {
if ( is_array( $plugins ) ) {
return array_diff( $plugins, array( 'wpemoji' ) );
} else {
return array();
}
}

/**
* Remove emoji CDN hostname from DNS prefetching hints.
*
* @param array $urls URLs to print for resource hints.
* @param string $relation_type The relation type the URLs are printed for.
* @return array Difference betwen the two arrays.
*/
function disable_emojis_remove_dns_prefetch( $urls, $relation_type ) {
if ( 'dns-prefetch' == $relation_type ) {
/** This filter is documented in wp-includes/formatting.php */
$emoji_svg_url = apply_filters( 'emoji_svg_url', 'https://s.w.org/images/core/emoji/2/svg/' );

$urls = array_diff( $urls, array( $emoji_svg_url ) );
}

return $urls;
}

Restarted Apache and all was good.

3. Imgur, legit, all the pictures on my sites, lets add it to the policy.

adding just imgur.com didn’t work, but since I see them all coming from i.imgur.com, I added that and it seems to be working now. What I can’t understand it how this policy is making my icons from this one plugin change size…:


Normal


With Policy enabled. Besides that after all that freaking work do I get a reward?!

Yes, but I had to turn it off again cause the plugin deactivation problem.

I’ve seen that unsafe-inline in a reference somewhere… but I didn’t want to use it as then it felt it made the content security policy useless? This thread implies the same thing

What I don’t know is what plugin is having an issue. I did notice I forgot to include gravatar for user icons, I don’t think that would be the one though.

I fixed the images, by defining a separate images part of the policy. Then to resolve the icon size and the plugin pop up alert I also added a style-src. So now it looks like this:

Header always set Content-Security-Policy: "default-src 'self' zewwy.ca;img-src 'self' *.imgur.com secure.gravatar.com; script-src 'self';style-src 'self' 'unsafe-inline'"

It still breaks my Classic WordPress editor though, and the charts on the dashboard don’t work, but I guess I can enable it when I’m not working on managing my server or writing these blog posts, as a temp work around until I can figure out how to properly define the CSP.

Strict Transport Security

What is it?

A PITA is what it is. Have you ever SSH’d into a server, and then had the server key’s change? Then when you go to SSH it fails cause the finger print of the server changed, so you have to go and deleted the old fingerprint in your .ssh path. This is that, but for web browsers/web sites.

I guess back a decade, you were vulnerable to MitM I guess, but with browsers defaulting to trying https first this is not so much the case anymore. Possibly still are but from my understanding is HSTS only works if you:

  1. Connect to the legit server the first time.
  2. Keep the public key/fingerprint incase it changes.

How to enable it?

Pretty much like my first example. But alright let’s configure it anyway.

Header always set Strict-Transport-Security: "max-age=31536000; includeSubDomains"

Rescan… got it:

Still a D cause the content policy was turned off… we’ll get there.

What is the impact?

Someone has to access the site for the first time and will save a copy of the certificate (public certificate of the service being hosted, I.E. Website) in the browser cache. Then if being man in the middle,  cause the certificate provided won’t match the saved one, and the user will get a error message and the website simple will not load, and there won’t be a way to load the page from any buttons that exist on the website.

This however can also happen if the site being visited certificate as legit changed for reasons like expiry of the old one, or someone changing Certificate providers.

In this case you have to clear the cache, or use an incognito window which won’t have a copy of the old certificate stored and will simply connect to the website.

Permission Policy

What is it?

According to Mozilla:

“Permissions Policy provides mechanisms for web developers to explicitly declare what functionality can and cannot be used on a website. You define a set of “policies” that restrict what APIs the site’s code can access or modify the browser’s default behavior for certain features. This allows you to enforce best practices, even as the codebase evolves — as well as more safely compose third-party content.”

What’s the Impact?

I don’t know. If your apps have geolocation or require access to camera or microphone it might affect that.

I just want a checkbox while having my site still work… so…

How do enable?

Well I saw nothing particular about iframes, so lets just block geolocation and see what happens?

Header always set Permissions-Policy: "geolocation=()"

alright… C baby!

X Frame Options

What is it?

The X-Frame-Options header (RFC), or XFO header, protects your visitors against clickjacking attacks. An attacker can load up an iframe on their site and set your site as the source, it’s quite easy:

 <iframe src="https://zewwy.ca"></iframe>.

Using some crafty CSS they can hide your site in the background and create some genuine looking overlays. When your visitors click on what they think is a harmless link, they’re actually clicking on links on your website in the background. That might not seem so bad until we realize that the browser will execute those requests in the context of the user, which could include them being logged in and authenticated to your site! Troy Hunt has a great blog on Clickjack attack – the hidden threat right in front of you. Valid values include DENY meaning your site can’t be framed, SAMEORIGIN which allows you to frame your own site or ALLOW-FROM https://example.com/ which lets you specify sites that are permitted to frame your own site.

I get it, using HTML trickery to hide the actual link behind something else, in Troy’s case the assumption is made that people don’t log out of their banking website, and have active session cookies. Blah blah blah…

How do you enable it?

For all options go to Hardening your HTTP response headers (scotthelme.co.uk)

Since I’m using Apache:

Header always set X-Frame-Options "DENY"

What’s the Impact?

Since I don’t have user logins, or self reference my site using frames, and I have no plans to have any collaboration in which I would allow someone to frame my site, DENY is perfectly fine and there’s been zero impact on my site, other than my now B Grade. 😀

X-Content-Type-Options

What is it?

Nice and easy to configure, this header only has one valid value, nosniff. It prevents Google Chrome and Internet Explorer from trying to mime-sniff the content-type of a response away from the one being declared by the server. It reduces exposure to drive-by downloads and the risks of user uploaded content that, with clever naming, could be treated as a different content-type, like an executable.

*Smiles n nods*  Yup, mhmmm, whatever you say Scotty.

How do you enable it?

Again I’m using Apache so:

Header always set X-Content-Type-Options "nosniff"

What’s the Impact?

Nothing I can tell so far, but even with the heaviest hitter disabled (due to the pain in the ass impact) and I still have yet to get to tuned properly… I finally got that dang A… Wooooooo!

Summary

I’ll enable the CSP Header, when I’m not working on my site, aka writing these blog posts. Then hopefully tune it so I can leave it on all the time. However, for now I’ll reenable to once this post is done and Ill check my score.

*Update* OK, I temp disabled the Content Security Policy cause it was breaking my floating Table of Contents, and while I do love keeping a site as simple as humanly possible I do like having some cool features. However I got this nice snip of an A+ before I turned it back off 😉

Hope all this helps someone.

VMware Patches May 2024

Yup this shit never ends:

VMSA-2024-0011:VMware ESXi, Workstation, Fusion and vCenter Server updates address multiple security vulnerabilities

Patching vCenter

Login to VAMI, lets see what I’m on:

Here’s the fix Matrix:

Can you tell if I’m good, no cause the Matrix uses a different version coding (7.0 u3q) vs the version shown in VAMI (7.0.3.01700). You can either look up, by googling the version, which I did and it’s 7.0 u3o), or clicking the link in the KB and checking the build number.

VMware: constructive criticism.. make the Matrix have the same versioning syntax as VAMI so it’s easy to know, and verify.

Anyway, in VAMI click update. there it is….

Accept the EULA, Pass pre-update checks, Installing…

It’s chugging along…

at this point the vCenter regular web interface was unresponsive, and had to use the host that was running the VCSA to get the CPU usage. However, as you can see VAMI appears to be up and showing status just fine.

45 Minutes later…

alright… 1% woo, woo, woo! Why does this seem oddly familiar…. mhmm anyway. After about an hour…

Re-log into VAMI.

Looks good, going to the main mgmt page… mhmm shows 404, but by the time I wanted to get a snip, it refreshed to show the FBA page, so I logged in like normal.

Yay it worked.

Patching ESXi

In vCenter, go to the host, pick updates, then baseline, and check compliance.

On the two baselines, select them and pick remediate.

Server went into maintenance mode, and after about 20 min (I think it rebooted, I didn’t have an active ping on it, not sure will check on the next one).

My PA-ESXi is a special beast, it for some reason needs a helping hand during boot, so we’ll know if it reboots this time…

yup… it rebooted.

Fun times had by all.

Configuring shared LVM over iSCSI on Proxmox

So, I’ve been recently playing with Proxmox for virtualization. It’s pretty nice, but in my cluster (which consisted of two old laptops) whenever I would migrate VM’s or Containers it would have to migrate the storage over the network as well. Since they are just old laptops everything connects together with 1 gbps to switches with the same rated ports.

I’m used to iSCSI so I checked the Proxmox storage guidance to see what I could use.

I was interested in ZFS over iSCSI. However, I temporarily gave up on this cause for some reason… you have to allow root access to the FreeNAS box over SSH, on the same network that the iSCSI is for….

First of all we need to setup SSH keys to the freenas box, the SSH connection needs to be on the same subnet as the iSCSI Portal, so if you are like me and have a separate VLAN and subnet for iSCSI the SSH connection needs to be established to the iSCSI Portal IP and not to the LAN/Management IP on the FreeNAS box.
The SSH connection is only used to list the ZFS pools”

Also mentioned in this guide.

This was further verified when I attempted to setup ZFS on an iSCSI disk, I go this error message:

Since I didn’t want to configure my NAS to have root access over SSH, on the iSCSI network. I was still curious then what the point of iSCSI was for PVE if you can’t use a drive shared… Reviewing the chart above, and this comment “i guess the best way to do it, is to create a iscsi storage via the gui and then an lvm storage also via the gui (if you want to use lvm to manage the disks) or directly use the luns (they have to be managed on the storage server side)

I ended up using LVM on the disk “3: It is possible to use LVM on top of an iSCSI or FC-based storage. That way you get a shared LVM storage”

However, using this model you can’t use snapshots. 🙁
You can use LVM-Thin but that’s not shared.

Step 1) Setup Storage Server

In my case I’m using a FreeNAS server, with spare drive ports, so for this test I took a 2TB drive (3.5″), plugged it in and wiped it from the web UI.

After this I configured a new extent as a raw device share.

Created the associated targets and portals. Once this was done (since I had dynamic discovery on my ESXi hosts) they discovered the disk. I left them be, but probably best to have separate networks…. but I’ll admit… I was lazy.

Step 2) Configure PVE hosts

In my case I had to add the iSCSI network (VLAN tagged) on to my hosts. This is easy enough Host -> System -> Network -> Create Linux VLAN

OK, so where in ESXi you simply add an iSCSI adapter, in PVE you have to install it first? Sure ok lets do that… Turns out it was already installed.
after reading that and seeing what my ESXi did, I managed to edit my /etc/pve/storage.cfg and added

iscsi: freenas
portal 172.16.69.2
target iqn.2005-10.org.freenass.ctl:proxhdd
content none

To my surprise… it showed as a storage unit on both my PVE hosts. :O

mhmm doing a df -h, I don’t see anything… but doing a fdisk -l sure enough I see the drive.. so cool 🙂
So now that I got both hosts to see the same disk, I guess it simply comes down to creating a file system on the raw disk.
Or not… when I try to create a ZFS using the WebUI it just says no disk are available.

Step 3) Setup LVM

However, adding an LVM works:

After setting up LVM the data source should show up on all nodes in the cluster that have access to the disk. One on of my nodes it wasn’t showing as accessible until I rebooted the node that had no problems accessing it. ¯\_(ツ)_/¯

So, there’s no option to pick storage when migrating a VM, you have to go into the VM’s hardware settings and “move the disk”.

When I went to do my first live VM migration, I got an error:

I soon realized this was just my mistake by not having selected “delete source” since when “moving the disk” it actually converted the disk from qcow2 to raw and didn’t delete the old qcow2 file. So I simply deleted it. then tried again…

and it worked! Now the only problem is no snapshots. I attempted to create an LVM-Thin on top the LVM, and it did create it, but as noted in the chart both my hosts could not access it at the same time, so not shared.

Guess I’ll have to see how Ceph works. That’ll be a post for another day. Cheers.

*Update* I’ll have to implement a filter on FreeNAS cause Proxmox I guess won’t implement a fix that was given to them for free.

https://forum.proxmox.com/threads/iscsi-reconnecting-every-10-seconds-to-freenas-solution.21205/#post-163412

https://bugzilla.proxmox.com/show_bug.cgi?id=957

*UPDATE May 2025*

Ohhh Looks like they may have finally got off their butts and implemented a fix…

Fedrich “As Victor notes, their patch is applied and available in libpve-storage-perl >= 8.3.4, which is part of Proxmox VE 8.4. Thanks for your contribution!

One thing I want to point out is that, even with this patch, the Proxmox VE node will still perform a connection check (via TCP ping) when there is no active session (yet) on some occasions, e.g., when first logging in or after boot. However, with this patch it will not do TCP pings against a portal if there is an active session to the portal, and this should get rid of the large majority of (recurring) TCP pings against portals.”

I have not personally had a chance to test or verify this however.

Delete Root Certificate from vCenter

In my last two posts, we renewed the Root Certificate on the VCSA.

We then renewed the STS certificate.

But we were left with the old Root certificate in on the VCSA, how do we removed it?

You can use the Certificate Management vCenter Trusted Root Chains interface to add, delete and read trusted root certificate chains. This use case demonstrates how to delete a root certificate or certificate chain from the trusted root store of your vCenter Server system.

Deleting certificates is not available through the vSphere Client and you can only do this by using the vSphere Automation API or the CLI tools.

Caution:
Deleting a root certificate or certificate chain that is in use might cause breakage of your systems. Proceed to delete a root certificate only if you are sure it is not in use by your vCenter Server or any connected systems.

The above link may have good warning, the steps in it are useless, and didn’t work for me, possibly cause I did have the “vSphere Automation API server” or something, I’m not sure putting in the get into a browser simply prompted for creds and didn’t accept them.

So, you can also use PowerCLI, or vecs-cli lets try the latter.

1 ) List the certificates using vecs-cli.

/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOTS --text | less

2) Find the Certificate you wish to remove and make a note of the Alias and the X509v3 Subject Key Identifier.

My case:
Alias : 9eadf42a18387ee983d3dfa4f607eee91a3e5b67
X509v3 Subject Key Identifier: 0B:62:2D:98:7B:28:34:2A:14:81:CD:34:AC:46:40:06:80:DA:84:3E

3) List the trusted certs published to the VMware Directory Service using the following command (administrator@vsphere.local password required). This command is in the same location as vecs-cli:
Windows:
C:\Program Files\VMware\vCenter Server\vmafdd>dir-cli trustedcert list

/usr/lib/vmware-vmafd/bin/dir-cli trustedcert list

This will output a list of Certificates published to VMDIR. It will look similar to the following output:

4) Locate the Certificate’s CN (thumbprint) which matches the Key Identifier from Step 2 above. In this example, the Certificate will be the first one in the list with the following CN:

0B622D987B28342A1481CD34AC46400680DA843E

5) Using the ID located in Step 4, run the following command, change ID from step 4:

/usr/lib/vmware-vmafd/bin/dir-cli trustedcert get --id 0B622D987B28342A1481CD34AC46400680DA843E --login administrator@vsphere.local --outcert /tmp/oldcert.cer

6) Un-publish the CA Certificate from VMDIR by running the following command:

/usr/lib/vmware-vmafd/bin/dir-cli trustedcert unpublish --cert /tmp/oldcert.cer

7) Delete the Certificate from VECS utilizing the Alias located in Step 2 by running the following command:

/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store TRUSTED_ROOTS --alias 9eadf42a18387ee983d3dfa4f607eee91a3e5b67

8) Confirm that the Certificate was deleted by running the following command:

/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOTS --text | grep Alias

9) Force a refresh of VECS by running the following command. This will ensure updates are pushed to the other PSCs in the environment if there is more than one.

/usr/lib/vmware-vmafd/bin/vecs-cli force-refresh

10) Restart all services on the PSCs and on the vCenter Servers and ensure that all services start and respond normally and that you can log in and manage the environment. (aka giver a reboot)

Logged in just fine, and certs are now clean as a whistle:

Looks like Root Certs are good for 10 Years, STS Certs are good for 10 years, machine Cert is good for 2 years.

Hope these last couple posts help someone.