Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple filter of 1000 hosts takes O(n²) to compile #1255

Open
ole-tange opened this issue Jan 20, 2024 · 3 comments
Open

Simple filter of 1000 hosts takes O(n²) to compile #1255

ole-tange opened this issue Jan 20, 2024 · 3 comments
Labels
optimizer Issues with the filter compiler's optimizer

Comments

@ole-tange
Copy link

ole-tange commented Jan 20, 2024

When giving tcpdump a filter of 1000 hosts to ignore, the compilation of the filter takes at least O(n²) time and RAM.

I would have expected it to take O(n) or worst case O(n log n) for such a simple filter.

The memory usage is also drastic in the compilation phase (8GB for 3000 hosts, 1 GB for 1000 hosts).

After compilation of the filter both CPU usage and memory usage are as expected.

Build

$ git clone https://github.com/the-tcpdump-group/libpcap
$ cd libpcap
$ ./autogen.sh 
$ ./configure 
$ make -j5
$ sudo make install
$ cd ..

$ git clone https://github.com/the-tcpdump-group/tcpdump
$ cd tcpdump
$ ./autogen.sh 
$ ./configure 
$ make -j5
$ sudo make install
$ cd ..

Versions

$ tcpdump  --version
tcpdump version 5.0.0-PRE-GIT
libpcap version 1.11.0-PRE-GIT (with TPACKET_V3)
OpenSSL 3.0.2 15 Mar 2022
$ uname -a
Linux travel 6.5.0-14-lowlatency #14.1~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov 22 16:24:11 UTC x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

MCVE

$ filter() {
    # $1 = How many entries?                                                             
    perl -e '                                                                            
        $pre = "not (";                                                                  
        $post = ")";                                                                     
        $join = " or ";                                                                 
        sub hostport {                                                                   
          $host = sprintf "%d.%d.%d.%d", rand()*255,rand()*255,rand()*255,rand()*255;    
          $port = sprintf "%d", rand()*65535;                                            
          return "(host $host and port $port)";                                          
        }                                                                                
        print $pre, join($join,map { hostport() } 1..shift), $post;                      
    ' $1
}
$ export -f filter
$ sudo time -v tcpdump -c 1 `filter 300`
# Around 1s 100M
$ sudo time -v tcpdump -c 1 `filter 1000`
# Around 15s 1000M
$ sudo time -v tcpdump -c 1 `filter 3000`
# Around 160s 8000M

Measure the time and memory to compile the filter for 300, 1000, 3000 hosts.

@guyharris guyharris transferred this issue from the-tcpdump-group/tcpdump Jan 20, 2024
@infrastation
Copy link
Member

Reproduces as described. Consumption of RAM seems to be specific to the BPF optimizer (on by default).

@guyharris guyharris added the optimizer Issues with the filter compiler's optimizer label Jan 21, 2024
@mcr
Copy link
Member

mcr commented Jan 21, 2024

You are certainly right that it could be better: send code!
eBPF would let you put the list of hosts into some kind of map, but we don't have that in userspace, or assume it exists.

@ole-tange
Copy link
Author

ole-tange commented Jan 25, 2024

Reproduces as described. Consumption of RAM seems to be specific to the BPF optimizer (on by default).

Disabling the optimizer (-O) does not solve the issue. See graph 2 on https://unix.stackexchange.com/questions/767162/tcpdump-takes-on%C2%B2-time-to-parse-filter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Issues with the filter compiler's optimizer
Development

No branches or pull requests

4 participants