Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to analyze/load local paths, not just stdin #94

Open
philrz opened this issue Jun 8, 2021 · 0 comments
Open

Option to analyze/load local paths, not just stdin #94

philrz opened this issue Jun 8, 2021 · 0 comments

Comments

@philrz
Copy link
Contributor

philrz commented Jun 8, 2021

While drafting the "Custom Brimcap Configuration" article in #72, I found myself having to to create tiny wrapper scripts to deal with the expectation that a Brimcap analyzer expects its pcap input to be streamed on stdin. So for instance, my config YAML looked like:

analyzers:
  - cmd: /usr/local/bin/zeek-wrapper.sh
  - cmd: /usr/local/bin/suricata-wrapper.sh

And those wrapper scripts looked like:

$ cat zeek-wrapper.sh
#!/bin/bash
exec /opt/zeek/bin -C -r - --exec "event zeek_init() { Log::disable_stream(PacketFilter::LOG); Log::disable_stream(LoadedScripts::LOG); }" local

$ cat suricata-wrapper.sh 
#!/bin/bash -e
exec /usr/local/bin/suricata -r /dev/stdin

If the user's intent is to just run brimcap load or brimcap analyze on pcap file paths on their local workstation (as I expect will be most common), this extra layer of indirection isn't buying them much. What follows is just a straw man proposal, but I imagined we could add some kind of option in the YAML so the full analyzer command line could be brought in, but with some kind of substitution of the provided file path, e.g.:

substitute the provided file path, e.g.:

analyzers:
  - cmd: /opt/zeek/bin -C -r %PCAPPATH% --exec "event zeek_init() { Log::disable_stream(PacketFilter::LOG); Log::disable_stream(LoadedScripts::LOG); }" local
    inputmode: filepath
  - cmd: /usr/local/bin/suricata -r %PCAPPATH%
    inputpmode: filepath

The possible advantages I see with offering this approach:

  1. It keeps the config consolidated by avoiding the proliferation of wrapper scripts
  2. For analyzers that aren't prepared to accept input on stdin (such as the NetFlow example shown in the same article, or off-the-shelf Suricata on Windows, for which we maintain a separate build exclusively to add the stdin support), the user would avoid needing to create wrapper scripts that push stdin to a tmpfile just to pass it off to the analyzer

I bounced some of this off @mattnibs, and he had some valid rebuttals about why we'd not want to make this our only approach. One of the advantages he pointed out about being stream-focused is that it offers the user the ability to analyze pcaps large enough that they'd be unwieldy to download in full before analysis. For instance, if my Brim app is running locally, this is a way to turn an S3-stored pcap into Zeek+Suricata logs and load those logs directly to a Pool in the Zed Lake behind my app, all without an explicit download of the pcap to to a local file:

$ aws s3 cp s3://brim-sampledata/wrccdc.pcap - | brimcap analyze - | zapi load -p wrccdc -
1tgSXaWvlzFDG4dcKfeI2nWo3Ax committed

He also noted the efficiency of a single pcap stream being forked to multiple analyzers, rather than each having to open and analyze a file separately.

All that said, I do still see value in avoiding the proliferation of wrapper scripts if a user is truly working with local pcaps and doesn't need the full efficiency benefits of the streamed approach, so I'm filing this one to possibly reconsider in the future.

@philrz philrz added this to the Data MVP1 milestone Jun 21, 2021
@philrz philrz removed this from the ETL Lake milestone Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant