Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to disable tail'ed processing of an analyzer's logs #44

Open
philrz opened this issue Apr 20, 2021 · 2 comments
Open

Option to disable tail'ed processing of an analyzer's logs #44

philrz opened this issue Apr 20, 2021 · 2 comments

Comments

@philrz
Copy link
Contributor

philrz commented Apr 20, 2021

Repro is with Brimcap commit 1fa5fc4 and https://archive.wrccdc.org/pcaps/2018/wrccdc.2018-03-23.010014000000000.pcap.gz (uncompressed) as my test data.

In my verifications steps #16 (comment), I first used this unsuccessful approach to try to work around https://redmine.openinfosecfoundation.org/issues/4106, mistakenly thinking that all I needed to do was leave behind only valid logs to be subject to Zed processing.

$ cat /tmp/mysuricata 
#!/bin/bash
suricata -r /dev/stdin
cat eve.json | jq -c . > deduped-eve.json
shopt -s extglob
rm !("deduped-eve.json")

$ brimcap analyze -Z -config variant.yml ~/pcap/wrccdc.pcap > wrccdc.zson
{"type":"error","error":"duplicate field subject"}

@mattnibs explained to me what went wrong here. The "ztail" functionality in Brimcap starts performing Zed processing on the logs generated by the analyzer processes even before those processes are finished, since this allows users to potentially perform early querying on partial output. Because of this, Brimcap ended up choking on the partially-built eve.json (which contains the duplicate field names) before my wrapper script had a chance to delete it.

This led me to learn about and start using the globs parameter in the Brimcap config YAML such that the ztail would only tail the deduped-eve.json file, so I was all set. However, having gone through the experience, I now recognized it would still be convenient to have a way to disable this ztail behavior entirely when processing an analyzer's generated logs, for two reasons I can think of:

  1. Whereas the post-processing I was doing here with jq lent itself to output you could still "tail", some kinds of post-processing may not be (e.g. they might rely on making an entire pass through a generated log after the complete output is present)
  2. Some users may know they don't want to query partial results and therefore don't want to burn CPU cycles on the incremental Zed processing and instead just wait until all logs are finished being output
@philrz philrz added this to the Data MVP0 milestone Apr 21, 2021
@philrz philrz modified the milestones: Data MVP0, Data MVP1 May 5, 2021
@mattnibs
Copy link
Collaborator

@philrz I'm not quite clear on what this ticket calls for. Is this a mode for an analyzer process that would wait to start reading records until the process has successfully exited?

@philrz
Copy link
Contributor Author

philrz commented Jul 16, 2021

@mattnibs: Yes, that was the essence.

Reading the text again now, my filing it at the time was in some ways a reflection of Brimcap's new-ness and me not yet being completely familiar with its bells & whistles. Revisiting it again now that it's been around longer and we've documented it more fully, I don't see it as urgent. Perhaps most importantly, the Custom Brimcap Configuration article discloses a couple key points:

  1. It states how Brimcap assumes an analyzer "writes to log outputs only by appending", so anyone who's read & absorbed the article should not be surprised by the "tailing" behavior.
  2. The NetFlow example shows how the globs parameter can be used to isolate files that have been post-processed and hence avoid the ones that are unsafe to tail while the analyzer is still running.

As long as best practices are followed, it seems users could accomplish pretty much whatever they need without this option. Granted, if I use my imagination, I could see a future where it would still be handy. For instance, there's formats like Parquet that (as I understand it) can't be read until they're fully written. However, Brimcap doesn't have a way to directly import these formats right now (#80) so it's kind of moot.

If it's ok, I think I'll drop the MVP1 marker off this one but keep it open in the Deep Freeze so it's easy to find if a use case does surface again.

@philrz philrz removed this from the Data MVP1 milestone Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants