imfile: wd # already in wdmap!

If you’re here, chances are you’re seeing “imfile: wd 25 already in wdmap!” in your rsyslog logs. You know, the logs, for your logging service that also logs logs, would you like some logs with that… anyway where was I, oh right.. logs…

Step #1) Know you’ve got a problem.

Generally, this comes in one of three ways:

  1. You’re monitoring your systems system resources and usage and found an anomaly, you ran top or htop and find rsyslog is the culprit.
  2. You’re checking the service and noticed the output of rsyslog service.
  3. You’re actually looking at the rsyslog logs for some other reason.

In all cases you see this:

journalctl -u rsyslog -b

 

Step #2) Determine what files are the culprit.

This is easier said than done, cause:

  1. It depends how your config files are structured.
  2. The default logging, for some reason will never just simply tell you what file was already defined in the wdmap.

before making any system changes you can get a general idea of what might be the problem by checking what files the service has open handles on:

So, get the PID of rsyslog “service rsyslog status”

lsof -p PID

You can use the FD and TYPE columns to distinguish read/writes and to what files. In my particular case study, the problematic turd comes from vCenter.  This alone will not telling you anything about the wd problem. Though I have used this to determine other underlying issue which sure enough stemmed from the config files.

Step 2.1) be confused by the config parser…

rsyslog stops logging to vmdird and messages in vCenter Server 7.0

nm, this is why. for the error below anyway, not the wdmap issue.

rsyslogd -N1
rsyslogd: version 8.2001.0, config validation run (level 1), master config /etc/rsyslog.conf
rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 94: STOP is followed by unreachable statements! [v8.2001.0 try https://www.rsyslog.com/e/2207 ]

To actually figure this out you have to stop rsyslog and run it for a short while in debug mode saving to a custom log file:

systemctl stop rsyslog && rsyslogd -dn > /var/log/rsyslog.debug 2>&1

Then to get context run:

grep -n "err.*already in wdmap" /var/log/rsyslog.debug | cut -d: -f1 | while read n; do sed -n "$((n-5)),$((n+1))p" /var/log/rsyslog.debug; echo "----"; done

Now we got some context based on the errors we saw above:

For the above, I cleared out a manual entry I created for a single file:
File=”/storage/log/vmware/vtsdb/postgresql-22.log”

After restarting I only had what was left below (I had another manually created duplicate (on purpose for this post) but pointing to

root@vCenter [ ~ ]# grep -n "err.*already in wdmap" /var/log/rsyslog.debug | cut -d: -f1 | while read n; do sed -n "$((n-3)),$((n+1))p" /var/log/rsyslog.debug; echo "----"; done
6330.967073425:imfile.c : imfile.c: act_obj_add: edge 0x5639e15e7800, name '/var/log/vmware/vtsdb/postgresql-11.log' (source '---')
6330.967076621:imfile.c : imfile.c: need to add new active object '/var/log/vmware/vtsdb/postgresql-11.log' in '/var/log/vmware/vtsdb/postgresql-*.log' - checking if accessible
6330.967082358:imfile.c : imfile.c: add new active object '/var/log/vmware/vtsdb/postgresql-11.log' in '/var/log/vmware/vtsdb/postgresql-*.log'
6330.967092728:imfile.c : errmsg.c: Called LogMsg, msg: imfile: wd 23 already in wdmap!
6330.967096479:imfile.c : operatingstate.c: osf: MSG imfile: wd 23 already in wdmap!: signaling new internal message via SIGTTOU: 'imfile: wd 23 already in wdmap! [v8.2001.0 try https://www.rsyslog.com/e/2175 ]'
----
6331.013269259:imfile.c : imfile.c: act_obj_add: edge 0x5639e15e3dd0, name '/storage/log/vmware/vsan-health' (source '---')
6331.013272732:imfile.c : imfile.c: need to add new active object '/storage/log/vmware/vsan-health' in '/var/log/vmware/wcp' - checking if accessible
6331.013278242:imfile.c : imfile.c: add new active object '/storage/log/vmware/vsan-health' in '/var/log/vmware/wcp'
6331.013289760:imfile.c : errmsg.c: Called LogMsg, msg: imfile: wd 82 already in wdmap!
6331.013293415:imfile.c : operatingstate.c: osf: MSG imfile: wd 82 already in wdmap!: signaling new internal message via SIGTTOU: 'imfile: wd 82 already in wdmap! [v8.2001.0 try https://www.rsyslog.com/e/2175 ]'
----
6331.036189358:imfile.c : imfile.c: act_obj_add: edge 0x5639e15e3dd0, name '/storage/log/vmware/vpxd' (source '---')
6331.036193267:imfile.c : imfile.c: need to add new active object '/storage/log/vmware/vpxd' in '/var/log/vmware/wcp' - checking if accessible
6331.036201104:imfile.c : imfile.c: add new active object '/storage/log/vmware/vpxd' in '/var/log/vmware/wcp'
6331.036210037:imfile.c : errmsg.c: Called LogMsg, msg: imfile: wd 88 already in wdmap!
6331.036213414:imfile.c : operatingstate.c: osf: MSG imfile: wd 88 already in wdmap!: signaling new internal message via SIGTTOU: 'imfile: wd 88 already in wdmap! [v8.2001.0 try https://www.rsyslog.com/e/2175 ]'
----
root@vCenter [ ~ ]# grep -n "add new active object '/storage/log/vmware/vsan-health'" /var/log/rsyslog.debug | cut -d: -f1 | while read n; do sed -n "$((n-3)),$((n+1))p" /var/log/rsyslog.debug; echo "----"; done
6331.013260502:imfile.c : main Q: queue.c: MultiEnqObj advised worker start
6331.013265898:imfile.c : imfile.c: process_symlink: adding parent '/storage/log/vmware/vsan-health' of target '/storage/log/vmware/vsan-health/vsanvcmgmtd-13.log'
6331.013269259:imfile.c : imfile.c: act_obj_add: edge 0x5639e15e3dd0, name '/storage/log/vmware/vsan-health' (source '---')
6331.013272732:imfile.c : imfile.c: need to add new active object '/storage/log/vmware/vsan-health' in '/var/log/vmware/wcp' - checking if accessible
6331.013278242:imfile.c : imfile.c: add new active object '/storage/log/vmware/vsan-health' in '/var/log/vmware/wcp'
----
6331.013265898:imfile.c : imfile.c: process_symlink: adding parent '/storage/log/vmware/vsan-health' of target '/storage/log/vmware/vsan-health/vsanvcmgmtd-13.log'
6331.013269259:imfile.c : imfile.c: act_obj_add: edge 0x5639e15e3dd0, name '/storage/log/vmware/vsan-health' (source '---')
6331.013272732:imfile.c : imfile.c: need to add new active object '/storage/log/vmware/vsan-health' in '/var/log/vmware/wcp' - checking if accessible
6331.013278242:imfile.c : imfile.c: add new active object '/storage/log/vmware/vsan-health' in '/var/log/vmware/wcp'
6331.013289760:imfile.c : errmsg.c: Called LogMsg, msg: imfile: wd 82 already in wdmap!

Pay attention to the end of the line like “add new active object ‘/storage/log/vmware/vsan-health’ in ‘/var/log/vmware/wcp'” that’s the line that exists in a config file somewhere. To help find that:

root@vCenter [ ~ ]# grep -RniE ".*/var/log/vmware/wcp" /etc/rsyslog.conf /etc/rsyslog.d /etc/vmware-syslog 2>/dev/null
/etc/vmware-syslog/vmware-services-wcpsvc.conf:3: File="/var/log/vmware/wcp/wcpsvc.log"
cat /etc/vmware-syslog/vmware-services-wcpsvc.conf
#wcpsvc log
input(type="imfile"
File="/var/log/vmware/wcp/wcpsvc.log"
Tag="wcpsvc"
Severity="info"
Facility="local0")

well that’s the file, but it doesn’t explain the duplicate entry attempt.

I cleared out the other duplicate file entry I manually created, and restarted the service, it appeared to be clean, but I was skeptical. Seems you may need to manually kill the debug instance. Verify with:

ps  aux | grep rsyslog

Well I manually created duplicate entries, so that I could figure out how to track it down.. now I have legit entries happening, not of my own making.. Oiii this sucks… is what is is I guess? or this?

/storage/log filling up with imfile-state files | rsyslogd

*Update* So that link above, simply denotes many imfile-state files, but once you know what imfile-state files are, its easy understand that they themselves are not the problem. What are imfile-state files, you ask?

imfile‑state files are state‑tracking metadata files created by rsyslog’s imfile module. The imfile module monitors regular text files (like log files) and forwards new lines to rsyslog. To avoid re‑reading the same data after a restart, it needs to remember how far it got.

That’s exactly what these files store.

What’s inside an imfile‑state file?

Each state file contains things like:

  • The offset (byte position) rsyslog last read from the monitored file
  • The inode of the file
  • Timestamps and other tracking metadata

This allows rsyslog to resume reading from the correct position after a restart.

Why they exist

According to rsyslog documentation, the state file is used to track the read position for each monitored file so rsyslog can resume correctly after restarts.

TLDR, they are simply files to keep track of where the read operation should continue if the service (rsyslog) is ever restarted.. so if you have “too many imfile-state files”, clearly the reason for that is that there are simply a lot of files being monitored (read).

So, one thing I noticed… in my test example I deliberately created two duplicating entries in my rsyslog config files from the snip above I’ll show you the direct files I created duplicate entries for:

What I find interesting is all the other entries of duplicates that emerged in the logs, especially baffling are the folder paths that were reported as being duplicate entries instead of specific files. To add to this confusion I checked the service again just now (after clearing those two manual entries I created to simulate this error), and all errors are now gone… weird:

As you can see from this snip, it’s now clean (ignore the error about files not existing, the vmware rsyslog config, by default, has a ton of log files coded to be parsed, many of which won’t exist, thus the errors).

Anyway… I’m starting to have a weird suspicion that the manual file enteries I created, were, somehow, responsible for all the other path already exists in wdpath errors… hmmm one way to find out. Before I continue, I need to note a couple things about my tests above.

  1. /var/log/vmware is a syslink to /storage/log/vmware
  2. all rsyslog entries pointing to files at /var/log/vmware
  3. The second manual entry I created for testing was pointing to a file in /storage/log/vmware path

From some reading I have done, this MIGHT be part of the problem… but I will again test by going through step by step to see if we can replicate the issue again.

Clean Slate

So first off, is it possible for me to start with a clean slate? Well let me remove the syslog setting in VAMI and see if it causes rsyslog to go back to not loading (parsing/read) any log files… K, deleted the syslog server entry.. and, wow look at that…

It did…

I read that running “tdnf reinstall vmware-syslog” would reinstall that service and factory reset it, so I gave it a shot to see what would happen…

Good start, till it just hung here, and it seem the system has CPU spike now..

Just when I went to start another session to check what was going on (roughly 10-15 min after starting the command), it finally spat back at my that package doesn’t exist.. really? it took nearly 100% cpu usage for over 10 minutes just to tell me that package doesn’t exist.. that deserves a standing ovation.

the command appears to be “tdnf reinstall rsyslog” and this time it was fast.

ok dokie.. I don’t know if this would factory reset the conf files that originally come with vCenter (I’d assume not, since it’s just the underlying service, and VMware specific). However, at this time I can’t think of a way to prove or disprove this, until I muck with the files and attempt this again to see.

Step #1) Configure Syslog server

So again, in VAMI -> syslog -> configure an IP to send logs to.

Then let’s check the service, which should have started it log file parsing…

man, are you ****in’ kidding me… this was supposed to be clean!

This is some royal bullshit…. K may have finally got my clean slate…

find /var/log/vmware -type f -exec truncate -s 0 {} \;
rm -f /var/log/vmware/rsyslogd/imfile-state:*

So, what I did here was, I basically cleared all the logs, by replacing their file contents with nothing, cleared the imfile-state files (since all logs should be blank now), then restarted the rsyslog service.. it’s been a lil while and so far, it looks clean. but it’s always after a reboot.. *double taps chest* REBOOT

classic…. there it is again.. well lets check it out again.

time to go nuclear:

lsof -p 7387 -Ftn | grep '^tREG' -A1 | grep '^n/' | sed 's/^n//' | grep -Ev '^(/dev/|/usr/)' | sort -u
lsof -p 7387 -Ftn | grep '^tREG' -A1 | grep '^n/' | sed 's/^n//' | grep -Ev '^(/dev/|/usr/)' | sort -u | xargs -r rm --
lsof -p 7387 -Ftn | grep '^tREG' -A1 | grep '^n/' | sed 's/^n//' | grep -Ev '^(/dev/|/usr/)' | sort -u

What I did here was find the PID of rsyslog and list all the files it was parsing… then… well deleted them… after restarting the service…

K it’s staying clean.. but will it survive a reboot?

Nope… Theory about the system links.. change all paths.

So, I picked one of the many entries and I singled out rhttpproxy entry.

So first, it was determined that this service logs are not rotated by logrotate, but it’s own config rotates it. AND, it turns out that the main file in the inital conf “rhttpproxy.log” is literally syslinked to the rotated log file:

After enough back n worth and many attempts to clear this (specially after a vCenter reboot), I managed to get it cleared by modifying the rsyslog conf file for it to this:

cat /etc/vmware-syslog/vmware-services-rhttpproxy.conf
#rhttpproxy log
input(type="imfile"
File="/storage/log/vmware/rhttpproxy/rhttpproxy-*.log"
Tag="rhttpproxy-main"
Severity="info"
Facility="local0"
followSymlink="on"
readMode="2"
freshStartTail="on"
wildcard="on")

Now the only problem is.. all the other services.. are they all messed like this?

Well.. lets pick another one and see if we can fix it…

vsan health… but what file, lets extend our query back (change 3 to a 5)

and sure enough there’s a syslink on the OG file, that is the target in the rsyslog config…

Lets change the conf file from this:

to this:

after reboot, issue gone. After more testing I’d recommend not to configure:

readMode=”2″
freshStartTail=”on”

on the vpxd, it’s cause rsyslog to crash and segment fault in debug mode. Not a fun time.

Summary

  • mixed rotation models require split rules
  • symlink targets cause wdmap loops
  • duplicate watches cause wdmap loops
  • wildcard rules must be precise
  • rsyslog imfile is fragile around symlinks and directories
  • VMware uses inconsistent log rotation models
  • To find out the source files, requires debug mode
    systemctl stop rsyslog && rsyslogd -dn > /var/log/rsyslog.debug 2>&1

    Then to get context run:

    grep -n "err.*already in wdmap" /var/log/rsyslog.debug | cut -d: -f1 | while read n; do sed -n "$((n-5)),$((n+1))p" /var/log/rsyslog.debug; echo "----"; done

Leave a Reply

Your email address will not be published. Required fields are marked *