Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropped packets when capturing from multiple interfaces #1220

Open
solemnwarning opened this issue Sep 7, 2023 · 7 comments
Open

Dropped packets when capturing from multiple interfaces #1220

solemnwarning opened this issue Sep 7, 2023 · 7 comments

Comments

@solemnwarning
Copy link

I'm having issues with capturing from multiple devices in the same process using libpcap on Linux.

Unfortunately I haven't been able to reduce this to a simple test case - I can only reproduce it as part of a big regression test suite for some other software, but as far as I've been able to figure out, when pcap (via the Net::Pcap Perl module) is opened on multiple devices concurrently, not all of the captures actually receive packets.

This was introduced in libpcap 1.5.0 and is still present on master, specifically this commit:

commit 8ada1d5b98ac62c4ae9acbecb0639beeebd8a359 (refs/bisect/bad)
Author: Gabor Tatarka <gabor.tatarka@ericsson.com>
Date:   Thu Oct 17 15:12:09 2013 +0200

    Added TPACKET_V3 support.

So I'm not really sure where the problem is - whether its the TPACKET_V3 support in the kernel, the TPACKET_V3 support in libpcap or some other behaviour in the library that was changed by the same commit.

I'm hoping someone more familiar with pcap might know what's happening here. Happy to test any patches/theories.

@infrastation
Copy link
Member

Since you mention TPACKET_V3, this must be Linux. Does the setup use the any pseudo-interface or parallel independent captures, each on a separate interface? If it is the latter, does the software drain the buffers in a multi-threaded or a single-threaded fashion?

I also wonder if the immediate delivery mode and the buffer size are factors here.

@solemnwarning
Copy link
Author

Its all single-threaded. It uses a separate capture for each interface, kicks off the processes which will generate the traffic, sleeps to let things settle and then reads in each capture's buffer with pcap_dispatch().

On Linux, with previous releases of libpcap, capture devices are always in immediate mode; however, in 1.5.0 and later, they are, by default, not in immediate mode, so if pcap_set_immediate_mode() is available, it should be used.

That sounds like it could be relevant, weird that it doesn't seem to affect all the capture interfaces at a time but I'll do some testing with it tonight.

@guyharris
Copy link
Member

ts all single-threaded. It uses a separate capture for each interface,

Presumably means it opens a separate pcap_t (or whatever Net::Pcap object has a pcap_t) for each interface.

and then reads in each capture's buffer with pcap_dispatch().

Does this mean it does something such as

for (each capture device handle)
    pcap_dispatch(that handle);

i.e., that it proceeds sequentially through all the interfaces, processing them one at a time?

@solemnwarning
Copy link
Author

@guyharris yes to both

@guyharris
Copy link
Member

yes to both

Libpcap doesn't guarantee that will work.

In particular, the pcap_dispatch() call could block for a long period of time if no packets arrive on that interface for a long period of time.

What you should do is, first, to put all of the pcap_ts into non-blocking mode.

Then, do something such as (C-style pseudo-code):

create an empty set of file descriptors.
for (each capture device handle) {
    get the result of `pcap_selectable_fd()` on that handle;
    if (that result is not -1)
        add it to the set of file descriptors;
}
for (;;) {
    set a `struct timeval` to a huge timeout;
    for (each capture device handle) {
        get the results of `pcap_get_required_select_timeout()` on that handle;
        if it's not NULL {
            if it's less than the value in the aforementiond `struct timeval`
                set that `struct timeval` to this value;
        }

        do a `select()`/`poll()`/`epoll()`/etc. on the specified set of file descriptors, checking for readability, and using the `struct timeval`'s value as the timeout if it's not the very large amount of time you set it to;
        if a timeout occurred
           call `pcap_dispatch()` on all the capture device handles that returned a non-null required select timeout;
        else {
            for (all file descriptors that are readable)
                call `pcap_dispatch()` on the capture device handle with that descriptor as its selectable file descriptor;
        }
    }
}

@solemnwarning
Copy link
Author

@infrastation thanks for the pointer, it was the immediate delivery mode. I patched a call to pcap_set_immediate_mode() into Net::Pcap and all is well. Net::Pcap only exposes the pcap_open_live() API atm, so I'll take this up over there.

@guyharris what doesn't libpcap guarantee here? The capture is already in non-blocking mode so that isn't a problem, and the buffer is more than large enough to accomodate the whole capture.

@guyharris
Copy link
Member

guyharris commented Sep 9, 2023

[UPDATED: fixed the last sentence to say pcap_setnonblock() rather than pcap_set_immediate_mode().)

I patched a call to pcap_set_immediate_mode() into Net::Pcap and all is well.

There's a tradeoff between immediate and non-immediate mode. Immediate mode delivers packets immediately, so you get one wakeup per packet, so that's one system call per packet and, in systems where packet data is copied from the kernel to userland (Linux isn't such a system unless you have a really really old kernel and an old version of libpcap), that's one kernel-to-user copy per packet. Without immediate mode, packets can be delivered in batches, with one wakeup and one system call (and, without memory-mapped capture, one copy) per batch, which is more efficient.

This means that running in immediate mode could increase the chances of packet drops if you're getting high traffic.

The capture is already in non-blocking mode

In other words, you've called, for each of the capture handles, whatever Net::Pcap call results in calling pcap_setnonblock() on that handle?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants