Add alternate mechanism for parsing log lines #494

msakrejda · 2023-12-27T18:39:40Z

Our current log line parsing mechanism relies on matching each log
line to one of a set of recognized prefix patterns, then pulling
relevant information out of the prefix "manually" according to their
regex groups.

This works, but is difficult to maintain and requires customers to use
one of our prefixes. Occasionally, we add prefixes (e.g., to support a
provider that does not allow changing log_line_prefix), but this is
error-prone and requires the customer to wait for a collector release.

This patch takes a different approach: we inspect the log_line_prefix,
generate a regular expression to match lines coming with that specific
prefix pattern, and use that to pull relevant metadata out of the log
stream.

Currently, this patch is a proof of concept, with only
ParseLogLineWithPrefix implemented. However, it passes all the tests
of our existing ParseLogLineWithPrefix function. Given it uses a
single purpose-built regex, it can also be more lenient with patterns
for specific fields in a match (e.g., roles with spaces like "Admin
User").

TODO:

Get server's log_line_prefix and store where it can be passed to
ParseLogLineWithPrefix
Avoid re-compiling the prefix regex on every call to
ParseLogLineWithPrefix
Add handling for syslog log line format
Support calling both old and new ParseLogLineWithPrefix via
collector setting to have an escape hatch in case of issues
- we could do this via a db_log_line_prefix setting or similar, and
  set that to 'auto' (the new behavior), 'legacy' (the current behavior)
  or a literal prefix (the new behavior but prefix can't be read from
  settings for whatever reason)
Fix naming (right now, the core method is just called
ParseLogLineWithPrefix2, but that's not ideal)
Add tests for untested prefixes (some of the existing
allegedly-supported prefixes do not have tests in the old code,
so it's not clear if the new code supports them—they should just
work, but it'd be good to verify this.)
Warn when prefix does not include database or role
Benchmark (this is almost certainly faster, but it'd be good to
confirm)

I'm not sure about the best path forward for these. I think we could
store log_line_prefix along with the LogTimezone we are already
storing on state.Server. If we follow the same pattern, it would be
easy to refresh it on every full snapshot. We may want to keep the
regex on state.Server as well, so we can recompile it as necessary (if
the prefix changes) but otherwise leave it alone. We'd need to keep
the matchers on the state.Server object as well.

Support for calling both the old and new code may be unnecessary, but
since this is a rewrite of a fundamental piece of collector code, it's
probably wise.

I'd love to hear any thoughts or concerns.

msakrejda · 2023-12-27T18:44:36Z

Oh, another thing not covered is parsing lines received through syslog. I don't understand how syslog interacts with log_line_prefix--do we only support a single prefix when syslog is used?

msakrejda · 2023-12-27T19:59:40Z

I should also add the case from #449; I think this fixes it.

In preparation for also storing log_line_prefix.

Add alternate mechanism for parsing log lines

b56081a

msakrejda added 2 commits January 9, 2024 14:58

Rename LogTimezoneMutex to LogSettingsMutex

e066f57

In preparation for also storing log_line_prefix.

Update new log parsing interface

e401680

msakrejda force-pushed the log-line-parsing-experiment branch from 3b55ab6 to f38dbd3 Compare January 10, 2024 18:50

Integrate new log line prefix parsing into log pipelines

6363114

msakrejda force-pushed the log-line-parsing-experiment branch from f38dbd3 to 6363114 Compare January 10, 2024 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add alternate mechanism for parsing log lines #494

Add alternate mechanism for parsing log lines #494

msakrejda commented Dec 27, 2023 •

edited

msakrejda commented Dec 27, 2023

msakrejda commented Dec 27, 2023

Add alternate mechanism for parsing log lines #494

Are you sure you want to change the base?

Add alternate mechanism for parsing log lines #494

Conversation

msakrejda commented Dec 27, 2023 • edited

msakrejda commented Dec 27, 2023

msakrejda commented Dec 27, 2023

msakrejda commented Dec 27, 2023 •

edited