Skip to content
Florian Forster edited this page Nov 21, 2023 · 1 revision
Name: Tail plugin
Type: read
Callbacks: config, init, read, shutdown
Status: supported
FirstVersion: 4.4
Copyright: 2008 Florian octo Forster
License: MIT License
Manpage: collectd.conf(5)
See also: List of Plugins

Description

The Tail plugin can be used to “tail” log files, i. e. follow them as tail -F does. Each line is given to one or more “matches” which test if the line is relevant for any statistics using a POSIX extended regular expression. So you could, for example, count the number of failed login attempts via SSH by using the following regular expression with the /var/log/auth.log logfile:

\<sshd[^:]*: Invalid user [^ ]+ from\>

But counting lines that match is only the simplest application of this plugin. Take, for example, a daemon that writes the current number of users to a file periodically. You could collect this information with a regular expression like this:

There are currently (\d+) users

As you can see the actual number of users is stored in the first "sub match". This value can then be used by collectd as a gauge value.

And there's even more: Per default, Exim logs the size of each email in its logfile. You can match this size and add all the values up. So you'll end up with a typical octet-counter which you can use with the ipt_bytes type, for example. Such a regular expression would look like this:

\<S=(\d+)\>

This plugin is a generic plugin, i.e. it cannot work without configuration, because there is no reasonable default behavior. Please read the Plugin tail section of the collectd.conf(5) manual page for an in-depth description of the plugin's configuration.

“Following” files

To “follow” files, the Tail plugin does the following each interval:

  1. Read and handle each line of an already opened file descriptor until the end of the file is reached.
  2. Check if the file has been truncated. If so, seek to the beginning of the file and start processing the file from there.
  3. Retrieve the inode number associated with the file name that should be followed. This number is compared to the inode of the currently open file descriptor.
  4. If the inodes differ, the originally opened file has been moved or replaced. The file descriptor is closed and the file name is (re)opened.
  5. If no file had been open in step 1 (usually only true on the first iteration), open the file name, seek to the end but do not handle any lines.

To understand what's going on completely, you need to have a basic understanding of UNIX file systems. Especially: An inode is a number that determines the position of data on the disk (or whatever storage medium is in use). It's similar to an IP address, for example. A file name is basically a human readable name for an inode, similar to a domain name. A file descriptor is the representation of an opened file to a running program. To complete the analogy, it's similar to a TCP connection.

Once a file is opened (a file descriptor has been created), changes to the file name no longer affect the the running program directly. The file name can be removed or renamed, but since it is only an alias for an inode, the running program won't notice. By the way: This is why you have to notify some daemons when “rotating” the log file. collectd's LogFile plugin doesn't keep the file descriptor open, though. (Analogy: Once a TCP connection is established, changes to the DNS won't be noticed.)

Because changes to the file name aren't noticed automatically, the inode currently open is compared to the (newly looked up) inode the file name points to. If they differ, the file name now points to a new piece of data on the disk. We then assume that the file has been rotated, open the new file name and start reading from the beginning. (Analogy: Re-resolve the host name and compare IP addresses.)

Last but not least: In the previous interval, we have read to the end of the file. If the file has been shortened in the meantime, our file descriptor will now point to an undefined void somewhere after the file. We'll assume that the file has been truncated to length zero. Some daemons do this rather than change the file name and create a new file with the old name. The Tail plugin will therefore start processing this file from the beginning.

Synopsis

 <Plugin "tail">
  <File "/var/log/exim4/mainlog">
   Instance "exim"
   <Match>
    Regex "S=([1-9][0-9]*)"
    DSType "CounterAdd"
    Type "ipt_bytes"
    Instance "total"
   </Match>
   <Match>
    Regex "\\<R=local_user\\>"
    DSType "CounterInc"
    Type "counter"
    Instance "local_user"
   </Match>
  </File>
 </Plugin>

Example graphs

Invalid SSH login attempts

Number of failed SSH log-in attempts (per second) due to a wrong user names. As you can see in the graph, there's a brute-force attack going on with two tries per second.

Plugin-tail-invalid-user.png

The configuration used when creating this graph was:

 <Plugin "tail">
   <File "/var/log/auth.log">
     Instance "auth"
     <Match>
       Regex "\\<sshd[^:]*: Invalid user [^ ]+ from\\>"
       DSType "CounterInc"
       Type "counter"
       Instance "sshd-invalid_user"
     </Match>
   </File>
 </Plugin>

Emails received for local recipients

Number of emails (per second) for local recipients, i. e. mail that is passed to a mail delivery agent (MDA). The graph shows about 3.5 hours of heavy spam activity. This graph is from a private box, for which four mails per second is a lot.

Plugin-tail-email-type.png

The configuration used when creating this graph was:

 <Plugin "tail">
   <File "/var/log/exim4/mainlog">
     Instance "exim"
     <Match>
       Regex "\\<R=local_user\\>"
       DSType "CounterInc"
       Type "email_type"
       Instance "incoming"
     </Match>
   </File>
 </Plugin>

Emails blocked by a real-time block list

Number of emails rejected due to the sender's IP-address being contained in a real-time block list.

Plugin-tail-rbl-block.png

Dependencies

  • none

See also

Clone this wiki locally