Skip to content
This repository has been archived by the owner on May 21, 2024. It is now read-only.

RFC3164: square brackets in the message screw parsing of ProcID #31

Open
dimonomid opened this issue Mar 1, 2020 · 3 comments
Open
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@dimonomid
Copy link

Parser created as follows:

		p := rfc3164.NewParser(
			rfc3164.WithYear(rfc3164.Year{YYYY: 2020}),
			rfc3164.WithRFC3339(),
		)

Example input:

<0>Mar  1 09:38:48 myhost myapp[12345]: foo [bar] baz

Output (take a look at ProcID):

{
  "Facility": 0,
  "Severity": 0,
  "Priority": 0,
  "Timestamp": "2020-03-01T09:38:48Z",
  "Hostname": "myhost",
  "Appname": "myapp",
  "ProcID": "foo [bar",
  "MsgID": null,
  "Message": "foo [bar] baz"
}
@russorat russorat added bug Something isn't working help wanted Extra attention is needed labels Mar 2, 2020
@russorat
Copy link

russorat commented Mar 2, 2020

@dimonomid thanks again!

@russorat russorat assigned leodido, russorat and goller and unassigned russorat Mar 2, 2020
@goller goller removed their assignment Apr 13, 2020
@z3bra
Copy link

z3bra commented Aug 2, 2023

I also encounter a variant of this issue (within telegraf) with the following message:

Aug  2 11:54:06 hostname e Random message

Which is parsed by telegraf as:

{
  "name": "syslog",
  "fields": {
    "facility_code": 3,
    "message": "Random message",
    "procid": "servicename[48737",
    "severity_code": 5,
    "timestamp": 1690979105000000000
  },
  "tags": {
    "appname": "catsitd",
    "facility": "daemon",
    "host": "localhost",
    "hostname": "myhost",
    "severity": "notice",
    "source": "X.X.X.X"
  },
  "timestamp": 1690971905
}

Notice how the message is trimmed. It should be servicename[48737]: Random message. Either the procid field pattern is too greedy (unlikely, as it would still include the original PID otherwise), or the parsing code takes the rightmost match as the ProcID.

The relevant code seems to be in rfc3164/machine.go.rl:

# The first not alphanumeric character starts the content (usually containing a PID) part of the message part
contentval = !alnum @err(err_contentstart) >mark print* %set_content @err(err_content);

content = '[' contentval ']'; # todo(leodido) > support ':' and ' ' too. Also they have to match?

Here, contentval seems to also match any separator that would fit between the [ ] pair. I'm not familiar with this lexer though so I have no idea how to fix it.

@z3bra
Copy link

z3bra commented Aug 2, 2023

So I've quickly read about Ragel SM parsing, and could come up with a solution which doesn't break any test.
I've also added a test case with my previous example, to confirm it's working.

In the code, the content field is apparently only used for parsing the procid. You can tell by the content = '[' contentval ']', which limits matching to what's inside the square brackets. The RFC3164 states that the CONTENT should be everything that immediately follows the TAG. In this case the ProcID is only a small part of the CONTENT, which had me confused when I first read this parser.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants