Skip to content
This repository has been archived by the owner on Nov 5, 2020. It is now read-only.

Read directly from network device #13

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

blipp
Copy link

@blipp blipp commented Jul 29, 2015

I did some initial development to enable pac-driver to read directly from a network device using libpcap. I would like to propose this as a pull request. In the following, I summarize the work I did, and at the end I list some questions regarding this implementation that should probably be discussed before considering a merge.

The reason I implemented this was because I need to analyse Ethernet packets different from TCP, UDP and ICMP, which are the only protocols that can be analyzed with the Bro plugin. On the long run, it would surely be better to adapt Bro, but for now this seemed too much to dig in.

I introduce a new parameter -n that expects the network device as argument. If it is set, then instead of the normal parseSingleInput call, libpcap is used with a callback function processPacket for every packet that arrives. The code of this function is inspired by the existing code of pac-driver.

I did not implement support for -e and -i because this had no immediate benefit for me, but I think it could be possible. I added this restriction to the help output.

Because grammars usually need to know the length of the captured content, this is done by prepending this information as uint32. Thus, this leads us to some kind of a header before the beginning of the actual Ethernet packet, like in pcap.pac2. I added an example grammar with networkinterface.pac2. After the header it immediately refers to the pcap grammar for the handling of Ethernet packets.

I added a test for this new functionality, which uses tcpreplay to replay dns.trace, which is already included in the source tree. The implementation of the test is kind of ugly, because the grammar does not compile with hilti-build because of #3. As soon as this gets fixed, there is no need for this ugly timer-based approach.

pac-driver seems to be able to process packets quite fast with this addition, although I have no statistics to show.

Questions:

  • At the end of processPacket, there is the commented line GC_DTOR(input, hlt_bytes, ctx);. This line is also present in parseSingleInput and that's why I copied it. The problem is that pac-driver segfaults at the processing of the second packet if this line is executed and that's why I commented it out. I suspect this of being a memory leak. What do you think?
  • Generally, what do you think of processPacket? Do you see possible optimizations?
  • If debugging output is enabled, all captured bytes are printed, which is quite a lot of output. Should this be changed to print just the first x bytes? Or introduce another command line parameter?

Uses libpcap and hands each packet to the parser.
Grammars often depend on the knowledge of the length of the content.
This is especially the case for Ethernet packets. Thus, the modified
pac-driver needs to provide this information to Hilti. The caplen
provided by libpcap as uint32_t is added at the very beginning to the
input of Hilti. In a grammar, this is the first data to be interpreted.
By default, a grammar expects big endian; thus, caplen is converted
accordingly.
@blipp
Copy link
Author

blipp commented Jul 31, 2015

Although this could be derived from the test I wrote, I should provide the commands to test this feature:

pac-driver -n lo -g tests/binpac/parsers/networkinterface/test.pac2 libbinpac/parsers/networkinterface.pac2

tcpreplay --intf1=lo bro/tests/Traces/dns.trace

Execute the first one and wait a little bit until --- pac-driver: opened network device 'lo'. appears. Then execute the second one in another shell. While tcprelay is generating packets, 6 lines with IPV4 should appear one after another in pac-driver's shell.

@rsmmr
Copy link
Owner

rsmmr commented Jul 31, 2015

Nice patch, thanks. Couple thoughts:

  • would be nice to have an offline version too that reads from a trace; that's better suited for testing than going through tcpreplay.
  • I see why you added the Packet type receiving the capture length, but I'm wondering if there's a better way to do that that doesn't require prepending a header to the data. Actually I'm wondering why EthernetPacket requires the capture length in the first place; that's a question for @0xxon. :)

Regarding your questions:

  • Yes, I would expect the GC_DTOR to have to run to release the input object; that DTOR would be the release corresponding to the hlt_bytes_new call. But it depends on what happens in between; something else might take ownership of the object (I haven't look closely enough yet). If it's a memory leak, the test case should tell you, if you hadn't disabled it by setting HILTI_DEBUG. :-)
  • for debugging output, printing just a few bytes per packets sounds good.
  • haven't looked more closely at ProcessPacket() yet

@0xxon
Copy link
Contributor

0xxon commented Aug 2, 2015

Just a bit of context about EtherPacket and why it requires a length. Basically -- the Ethernet frame itself does not contain any length information about its payload. And the payload of the protocols it contains (like, e.g. IP) does not have to be equivalent to the actual payload of the packet (e.g. when someone adds frame-checksums or timestamp information like Arista can do to it). In those case, you would not read the entire packet if you don't have the outside information that tells you the packet length :)

Note - currently I think that length is not really used to make sure that really all data has been read - but it can and should be used for that in the future.

Chosen such that it should only take up one line,
this keeps the output clearer.
Add a new command line option -f. The file is read using libpcap. This
shares a lot of code with listening on a network interface.

The grammar networkinterface.pac2 was renamed to libpcap.pac2 because it
is now used for both reading from pcap files and network devices.

There is now an ambiguity with parsers/libpcap.pac2 and
parsers/pcap.pac2, there should be a way to resolve this.
@blipp
Copy link
Author

blipp commented Aug 3, 2015

Thanks for your feedback!

  • I added the possibility to read directly from a pcap file with a new command line option -f. Thanks for this idea, I think this is really useful. Now I wonder why I did not think of it... It uses the same code than for listening on a network interface, just the specification of the source is different. Example usage:
    • pac-driver -f bro/tests/Traces/dns.trace -g tests/binpac/parsers/libpcap/pcapfile.pac2 libbinpac/parsers/libpcap.pac2 (note that I changed some file names!)
    • If - is given as file name, libpcap reads from stdin: cat bro/tests/Traces/dns.trace | pac-driver -f - -g tests/binpac/parsers/libpcap/pcapfile.pac2 libbinpac/parsers/libpcap.pac2.
  • With @0xxon's comment I think it's clear that the caplen should stay. Thanks for the information!
  • I changed the debug output to only print the first 14 bytes. This way the output should fit into one line.
  • I renamed networkinterface.pac2 to libpcap.pac2 because this grammar is now used by both reading from pcap files and listening on network interfaces. This name is very similar to pcap.pac2 which could lead to confusion. I am not sure though how to resolve this. Both files have their reason, one for parsing pcap files as they are (pcap.pac2), and the other for things read by libpcap (libpcap.pac2). Any ideas? Maybe rename pcap.pac2 to pcapfile.pac2?
  • Regarding memory leaks, I did not disable the test for leaks when starting to write the test and it showed memory leaks. But this was at the time when I tried to get the test running at the first place, so I disabled it. Still have to look at it. If you have some time I would be glad if you could look at it, because I am not very experienced with Hilti's memory management yet.

When using libpcap to read from a network interface or a pcap file,
hand over the packet capture time and also the total length of the
packet; not only the captured length.

Currently, the timestamps are handled as being 64 bit. This has to be
generalized to cover platforms where time_t and suseconds_t are 32 bit,
as well.
@blipp
Copy link
Author

blipp commented Aug 21, 2015

I enhanced the patch so now the packet capture time and the actual length of the packet is given to the grammar as well. Tests and grammars still have to be adapted.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants