I’ll try to keep this one short. The other night I was installing updates on some computers, I like to see system resources when doing this, as Windows is a heavy, HEAVY bitch. As I was scorlling through my VMs I noticed my DC’s hovering around 18%. While that may not sound like much, I know it was high lol for what they do. So I went to check the vSphere logs to see how long has this been going on… 1 day… mhmm. 1 week…. mgmmm 1 month… could this be normal? I don’t think so I just have caught it…. and looking at the Year view showed the increase a couple months back… but what could it be….
I noticed it on my other DC’s as well… all the same time frame.
After a while I noticed it was the svchost process. Using Mark’s ProcExp.exe I narrowed it down to svchost (DHCP/ nethost / eventlog)… I decided after many other failed searches to view this particular process having CPU issues. Then I found this, exactly what I was experiencing…. Funky CPU in Taskmgr, that process. all of it. and his answer:
“TL;DR: EventLog file was full. Overwriting entries is expensive and/or not implemented very well in Windows Server 2008.”
just as he mentions in detail in his answer, the security log was at max and being overwritten. Now I know there isn’t much happening at these times of the day so how did the log get filled so fast and being overwritten to cause CPU spikes. looking at the Log (Palo Alto User Agent Log on, and Logoff events) lots of them. I haven’t blogged yet in my series with Palo Alto about User mappings when it comes to the monitor area of the Palo ALto Firewalls, but you can configure Palo Alto to use Server monitoring directly instead of a user-ID agent server, which you can install on a dedicated windows server which will use SMI to query client devices on behalf of the Palo Alto firewall to determine what IP address is being used by whom…
“In most cases, the majority of your network users will have logins to your monitored domain services. For these users, the Palo Alto Networks User-ID agent monitors the servers for login events and performs the IP address to username mapping.”
Now I can’t find a good Palo Alto Networks source on it, but when you configure the Monitoring Servers which “enable the User-ID agent to map IP addresses to usernames by searching for logon events in the security event logs of servers, configure the settings described in the following table.”
which is all good and great however, the default for this is:
Server Log Monitor Frequency (sec)
Specify the frequency in seconds at which the firewall will query Windows server security logs for user mapping information (range is 1-3600; default is 2)
and apparently this process is not session based itself, so every 2 seconds the firewalls were hitting the DC’s looking to see who’s got what IP based on the logon events, and this in itself was creating a logon, and logoff event every 2 seconds. That apparently not only filled the log, but is enough garbage to flood the security log and cause the overwrite function on eventviewer to cause CPU “spikes”.
The solution was to increase the frequency of this lookup. This obviously reduces the accuracy of the mapping, but when you have long lease times on your DHCP settings, and users don’t change networks (like almost ever) this is a low risk, while still retaining user field information the Palo Alto Monitoring section. This along with a backup and clearing of the security event and the systems all went back to low CPU usage.
Happy happy joy joy