Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP packet drops and strange core usage behavior #152

Open
vigodeltoro opened this issue Mar 31, 2023 · 16 comments
Open

UDP packet drops and strange core usage behavior #152

vigodeltoro opened this issue Mar 31, 2023 · 16 comments
Labels
performance Related to application performance

Comments

@vigodeltoro
Copy link

Hi Louis,

we have some strange behavior we can't explain to us, may be you can help.
We have missing IPFix data in comparison with another system in our company getting the data with another type of IPFix collector.

Further investigations lead us to UDP receive errors / drops at udp_receive_queue on our Goflow2 IPFix collector server. (monitored with drop watch)

8 drops at tpacket_rcv+5f (0xffffffff973642df)
464 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
988 drops at tpacket_rcv+5f (0xffffffff973642df)
5 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
10 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
3 drops at tpacket_rcv+5f (0xffffffff973642df)
8 drops at tpacket_rcv+5f (0xffffffff973642df)
7 drops at tpacket_rcv+5f (0xffffffff973642df)
11 drops at tpacket_rcv+5f (0xffffffff973642df)
7 drops at tpacket_rcv+5f (0xffffffff973642df)
7 drops at tpacket_rcv+5f (0xffffffff973642df)
6 drops at tpacket_rcv+5f (0xffffffff973642df)
8 drops at tpacket_rcv+5f (0xffffffff973642df)
7 drops at tpacket_rcv+5f (0xffffffff973642df)
1049 drops at tpacket_rcv+5f (0xffffffff973642df)
434 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
5 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
3 drops at tcp_v4_rcv+87 (0xffffffff972cc147)

The system is a 12 Core Server with 16 GB RAM
Oracle Linux 3.10.0-1160.71.1.el7.x86_64
net.core.rmem_max=16777216
net.core.rmem_default=212992
net.core.wmem_max=16777216
net.core.rmem_default=212992
net.core.netdev_max_backlog = 8000

Our goflow2 is running in a docker container and it's the branch with the Nokia fix you did for us last year (#105 , #106)

The strange behavior starts that if I run the container I can see only 8 cores used by goflow2 and I see the mentioned drops. Core usage of that 8 cores are avg 55-60% at 130-150k messages ipfix (protobuf to kafka)

Our compose config:

version: "3"
services:
goflow:
build:
context: ../../
dockerfile: Dockerfile
network_mode: host
ports:
- IP:8080:8080
- IP:2055:2055/udp
restart: always
command:
- -reuseport
- -format=pb
- -format.protobuf.fixedlen=true
- -listen=netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055
- -mapping=/etc/mapping/mapping.yml
- -transport=kafka
- -transport.kafka.brokers=IP:PORT
- -transport.kafka.topic=ipfix
- -transport.kafka.hashing=true
- -format.hash=SrcMac
volumes:
- ./mapping:/etc/mapping
- ./logs:/tmp/logs

Our next idea was to update the goflow2 version but because of the camelcase fixes the last version I can use is commit f542b64

So we did it as a docker container and a direct compiled goflow2 without docker.
With the "new" docker container I have the same core usage issue and less usage percentage and much more drops and less read IPfix packages (see in Grafana goflow2 metrics around 80-100k ) and 10 times more drops

With the direct compiled process I get all cores used !! That is interesting .. but same numbers of drops and read packages ( 80-100k read and 10 times more drops ). To reach that I have to set workers to 12
Without 12 workers is less..

command to run:
./goflow2 -reuseport -format=pb -format.protobuf.fixedlen=true -listen=netflow://IP:2055?count=12 -mapping=/root/goflow2/compose/kcg/mapping/mapping.yml -transport=kafka -transport.kafka.brokers=IP:9094 -transport.kafka.topic=ipfix -transport.kafka.hashing=true -format.hash=SrcMac -workers=12

Maybe you have another idea what could explain that strange behavior..

Thanks lot and best regards
Christian

@lspgn
Copy link
Member

lspgn commented Mar 31, 2023

Hi @vigodeltoro
I am working on major optimizations at the moment (app/refactor branch, see #150) which I am hoping will simplify the sockets and workers setup. Additionally reducing the amount of buffer allocations via sync.Pool.

Could you confirm with ss -ulpn that there are the exact amount of sockets opened?

If I'm reading correctly, a single machine is currently processing 150k flows? Any possibility to scale horizontally (eg: with ECMP)?

Are you running the docker version with host networking? Or is it a build with docker? Because the Dockerfile uses alpine, an issue could be coming from musl library.

With IPFIX it could be the templates lock if it bursts regularly?
Could you try to double net.core.rmem_max?

@vigodeltoro
Copy link
Author

vigodeltoro commented Mar 31, 2023

Hi Louis, I'm in holiday for one week since today evening.. but I pinged my colleagues..
If they have time they will take care of that. But I will have a look as well when I'm back.

Thanks :) for the fast reply
best
Christian

@vigodeltoro
Copy link
Author

Hi Louis,

Okay.. I can't hold my fingers.. ;)

I am working on major optimizations at the moment (app/refactor branch, see
#150) which I am hoping will simplify the sockets and workers setup. > Additionally reducing the amount of buffer allocations via sync.Pool.

That sounds very promising..

Yes at the moment we try to run 150k on one server. But horizontal scaling is an option but need a little bit more time to provide.
And yes we are using the docker version with host network.
ss command says ( 12 listener, 12 workers config, docker container with host network):
State Recv-Q Send-Q Local Address:Port Peer Address:Port
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=93011,fd=19))
UNCONN 2176 0 IP:2055 : users:(("goflow2",pid=93011,fd=18))
UNCONN 180608 0 IP:2055 : users:(("goflow2",pid=93011,fd=17))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=93011,fd=13))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=93011,fd=16))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=93011,fd=15))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=93011,fd=14))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=93011,fd=12))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=93011,fd=11))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=93011,fd=9))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=93011,fd=8))
UNCONN 60928 0 IP:2055 : users:(("goflow2",pid=93011,fd=7))

So, 12 sockets are opened.. and I'm seeing now 11 processes of goflow2 (htop) but core usage (~70%) at 8 cores. Drop amount:

382 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
1 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
2 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
263 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
1 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
2 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
306 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
4 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
8 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
2 drops at tpacket_rcv+5f (0xffffffff973642df)
1 drops at tcp_v4_rcv+87 (0xffffffff972cc147)
2462 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
1 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
3 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
1425 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
2 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
5 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
258 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
2 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
4 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
3620 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
2 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
5 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
10901 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
1 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)

If I change the net.core.rmem_max to double I see 12 goflow2 processes:
image

dropwatch:
4 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
1 drops at __udp4_lib_rcv+bb (0xffffffff972d7e2b)
391 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
2 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
4 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
464 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
8 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
17 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
237 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
2 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
4 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
1 drops at __udp4_lib_rcv+bb (0xffffffff972d7e2b)
403 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
3 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
6 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
211 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
3 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
6 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
231 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
2 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
4 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
319 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
2 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
4 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
351 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
6 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
12 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
226 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
1 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
2 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)
292 drops at udp_queue_rcv_skb+3df (0xffffffff972d7c6f)
3 drops at tcp_rcv_state_process+1bc (0xffffffff972c096c)
7 drops at tcp_v4_do_rcv+80 (0xffffffff972cafc0)

Last test without docker .. native goflow2 ( Commit : f542b64)

./goflow2 -reuseport -format=pb -format.protobuf.fixedlen=true -listen=netflow://IP:2055?count=12 -mapping=/root/goflow2/compose/kcg/mapping/mapping.yml -transport=kafka -transport.kafka.brokers=IP:9094 -transport.kafka.topic=ipfix -transport.kafka.hashing=true -format.hash=SrcMac -workers=12

net.core.rmem_max=33554432
12 cores used and load at all cores but
image

image

ss -ulpn
State Recv-Q Send-Q Local Address:Port Peer Address:Port
UNCONN 211072 0 IP:2055 : users:(("goflow2",pid=96865,fd=8))

Only one socket that's interesting..

Trying that command:

./goflow2 -reuseport -format=pb -format.protobuf.fixedlen=true -listen=netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055,netflow://IP:2055 -mapping=/root/goflow2/compose/kcg/mapping/mapping.yml -transport=kafka -transport.kafka.brokers=IP:9094 -transport.kafka.topic=ipfix -transport.kafka.hashing=true -format.hash=SrcMac -workers=12

$ ss -ulpn
State Recv-Q Send-Q Local Address:Port Peer Address:Port
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=98959,fd=19))
UNCONN 21760 0 IP:2055 : users:(("goflow2",pid=98959,fd=18))
UNCONN 8704 0 IP:2055 : users:(("goflow2",pid=98959,fd=17))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=98959,fd=16))
UNCONN 32640 0 IP:2055 : users:(("goflow2",pid=98959,fd=15))
UNCONN 47872 0 IP:2055 : users:(("goflow2",pid=98959,fd=14))
UNCONN 17408 0 IP:2055 : users:(("goflow2",pid=98959,fd=13))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=98959,fd=11))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=98959,fd=12))
UNCONN 4352 0 IP:2055 : users:(("goflow2",pid=98959,fd=9))
UNCONN 0 0 IP:2055 : users:(("goflow2",pid=98959,fd=8))
UNCONN 19584 0 IP:2055 : users:(("goflow2",pid=98959,fd=7))

image
image

dropwatch

image

Seems to be less drops..

That is the the most promising setup until now..

best
Christian

@lspgn
Copy link
Member

lspgn commented Apr 16, 2023

Would you be able to run the tests using #150 ?

./goflow2 -listen sflow://:6343?count=10,netflow://:2055?count=10

@vigodeltoro
Copy link
Author

vigodeltoro commented Apr 20, 2023

Hi,

I tried to run the test but I struggle to install the new release. After compiling the new flow.proto file with "make proto" and the goflow2 binary with "make build", I try to start goflow2 but I get:

INFO[0000] Starting GoFlow2
INFO[0000] Starting collection count=1 hostname=XXX.XXX.XXX.XXX port=2057 scheme=netflow
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x95e7f6]

goroutine 38 [running]:
sync.(*RWMutex).RLock(...)
/usr/lib/golang/src/sync/rwmutex.go:61
github.com/netsampler/goflow2/utils.(*StateNetFlow).DecodeFlow(0xc0002c2280, {0xac6480?, 0xc000122360?})
/root/goflow2-1.3.1/goflow2-1.3.1/utils/netflow.go:92 +0x216
github.com/netsampler/goflow2/decoders.Worker.Start.func1()
/root/goflow2-1.3.1/goflow2-1.3.1/decoders/decoder.go:48 +0x138
created by github.com/netsampler/goflow2/decoders.Worker.Start
/root/goflow2-1.3.1/goflow2-1.3.1/decoders/decoder.go:39 +0xb7

I'm on CentOS 7.9 Kernel: 3.10.0-1160.71.1.el7.x86_64
go version go1.18.9 linux/amd64

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/golang"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.9"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2372343291=/tmp/go-build -gno-record-gcc-switches"

The previous goflow versions were functional.

If I use the prebuilt RPM from this repo I get:

goflow2: /lib64/libc.so.6: version GLIBC_2.32' not found (required by goflow2) goflow2: /lib64/libc.so.6: version GLIBC_2.34' not found (required by goflow2)

I tried it with the knowledge that this a test only because I need to add our changed flow.proto for a functional running goflow2.

Do you have a clue what's wrong ?

Best and thanks
Christian

@lspgn lspgn mentioned this issue Apr 20, 2023
@lspgn
Copy link
Member

lspgn commented Apr 20, 2023

Hi @vigodeltoro,
I introduced a regression in #49 but fixed in #156. Sorry for this!
Could you test with 1.3.2?

@vigodeltoro
Copy link
Author

Hi @lspgn
ah.. no problem. I bring it up and running :).. thanks.
I will be back for more feedback if our pipeline is running again.
We are facing issues with the protobuf schema changes :/ which generated us much work. I hope in future that won't happen to often..

@lspgn
Copy link
Member

lspgn commented Apr 27, 2023

Thank you for confirming :)

We are facing issues with the protobuf schema changes :/ which generated us much work. I hope in future that won't happen to often..

I tried to keep as many fields as possible while clearing out ones that could be replaced by a custom mapping.
Was it the time ones that went from seconds to nanoseconds?

@vigodeltoro
Copy link
Author

Hi,

so it took a little bit longer.. sorry for that. But we had major problems to fix our pipeline with the new protobuf schema.
But it is working now and I have to say.. we had to rollback the update.

With goflow2-f542b64 commit and 12 cores we have nearly 0 udp-receive errors with 200-250k messages/s but with the new 1.3.3 release we loose around 40% of the messages because of udp-receive errors. I don't have a clue why, may be you have an idea.

That are our kernel values which are running smoothy with the old commit:

net.core.netdev_budget = 300
net.core.netdev_max_backlog = 8000
net.core.optmem_max = 20480
net.core.rmem_default = 102400000
net.core.rmem_max = 102400000
net.core.rps_sock_flow_entries = 0
net.core.somaxconn = 128
net.core.warnings = 1
net.core.wmem_default = 102400000
net.core.wmem_max = 102400000

We split ( with both goflow versions the stream to 2 sockets ) and start goflow2 with the following parameters:

old commit ( start command ):
./goflow2 -reuseport -format=pb -format.protobuf.fixedlen=true -listen=netflow://xxx.xxx.xxx.xxx:2056,netflow://xxx.xxx.xxx.xxx:2056,netflow://xxx.xxx.xxx.xxx:2056,netflow://xxx.xxx.xxx.xxx:2056,netflow://xxx.xxx.xxx.xxx:2056,netflow://xxx.xxx.xxx.xxx:2056,netflow://xxx.xxx.xxx.xxx:2056,netflow://xxx.xxx.xxx.xxx:2056 -mapping=/etc/goflow2/mapping.yml -transport=kafka -transport.kafka.brokers=xxx.xxx.xxx.xxx:9094 -transport.kafka.topic=ipfix -transport.kafka.hashing=true -transport.kafka.version=3.4.0 -format.hash=SrcMac -metrics.addr=xxx.xxx.xxx.xxx:8082 -workers=8

./goflow2 -reuseport -format=pb -format.protobuf.fixedlen=true -listen=netflow://xxx.xxx.xxx.xxx:2055,netflow://xxx.xxx.xxx.xxx:2055,netflow://xxx.xxx.xxx.xxx:2055,netflow://xxx.xxx.xxx.xxx:2055 -mapping=/etc/goflow2/mapping.yml -transport=kafka -transport.kafka.brokers=xxx.xxx.xxx.xxx:xxx -transport.kafka.topic=ipfix -transport.kafka.hashing=true -transport.kafka.version=3.4.0 -format.hash=SrcMac -metrics.addr=xxx.xxx.xxx.xxx:8081

v1.3.3 start command:

./goflow2 -reuseport -format=pb -format.protobuf.fixedlen=true -listen=netflow://xxx.xxx.xxx.xxx:2056?count=8 -mapping=/etc/goflow2/mapping.yml -transport=kafka -transport.kafka.brokers=xxx.xxx.xxx.xxx:xxx-transport.kafka.topic=ipfix -transport.kafka.hashing=true -transport.kafka.version=3.4.0 -format.hash=SrcMac -metrics.addr=xxx.xxx.xxx.xxx:8082 -workers 8

./goflow2 -reuseport -format=pb -format.protobuf.fixedlen=true -listen=netflow://xxx.xxx.xxx.xxx:2055?count=4 -mapping=/etc/goflow2/mapping.yml -transport=kafka -transport.kafka.brokers=xxx.xxx.xxx.xxx:xxx -transport.kafka.topic=ipfix -transport.kafka.hashing=true -transport.kafka.version=3.4.0 -format.hash=SrcMac -metrics.addr=xxx.xxx.xxx.xxx:8081 -workers 4

BTW: In case of sending the ipfix packages read by goflow to /dev/null I also have the udp receive errors.. so I can exclude a back pressure issue with Kafka..

Best and thanks

@lspgn
Copy link
Member

lspgn commented May 10, 2023

Hi @vigodeltoro
Really sorry for what you're experiencing :(

Would you be able to test another versions?
For instance 5529d49

In the case of v1.3.3, have you tried without count=4 and just repeating the listen parameters again?

In regards to v2, were the issues with ClickHouse? I shuffled the schema quite a bit, is it the timestamp posing an issue?

@vigodeltoro
Copy link
Author

vigodeltoro commented May 10, 2023

Hi @lspgn

no problem.. we have a lot of traffic.. so it's hard to test with that amount of load.
I will test, but I have to wait for an additional server. Hope that I will get it next week.
When I have the additional system I will test again. I'm having only the productive IPFix collector at the moment and waiting for a testing server with the same specs.

stay tuned.. I will come back with new results..

@lspgn
Copy link
Member

lspgn commented Jun 8, 2023

Hi @vigodeltoro !
Let me know if you're still experiencing the issue

@vigodeltoro
Copy link
Author

Hi @lspgn
I was not able to work on it because of the lack of server and other things.. but I'm back to topic now..
I will test the 1.3.4 and the commit you mentioned in the next days :)

@vigodeltoro
Copy link
Author

Hi @lspgn

I tested on my new server ( 8 cores, 8 GiB Ram) with the following kernel parameter configurations:

net.core.rmem_max=2048000000 net.core.rmem_default=2048000000
net.core.wmem_max=2048000000 net.core.wmem_default=2048000000

I'm on it but I need to ask for more performance for the testing server ( like the one we are using for production now)
Because I'm not able to run the old version ( Commit goflow2-f542b64) on the new VM without packetloss. The CPU of the new server is slower and we have less cores :/

I write to dev/null for testing
./goflow2 -reuseport -format=pb -format.protobuf.fixedlen=true -listen=netflow://xxx.xxx.xxx.xxx:2055,netflow://xxx.xxx.xxx.xxx:2055,netflow://xxx.xxx.xxx.xxx:2055,netflow://xxx.xxx.xxx.xxx:2055,netflow://xxx.xxx.xxx.xxx:2055,netflow://xxx.xxx.xxx.xxx:2055,netflow://xxx.xxx.xxx.xxx:2055,netflow://xxx.xxx.xxx.xxx:2055 -mapping=/etc/default/goflow2-mapping.yml -transport=file -transport.file=/dev/null -format.hash=SrcMac -workers=8

A first carefully result...
goflow2-f542b64 has less packetloss than goflow2-1.1.1-8-g5529d49-linux-x86_64
goflow2-1.1.1-8-g5529d49-linux-x86_64 than goflow2-1.3.4-0-ge5696f1-linux-x86_64

But handle that results carefully, I will repeat the tests when I got more performance and I will be able to run goflow2-f542b64 without packetloss

Best regards
and thanks

@vigodeltoro
Copy link
Author

Hi @lspgn

I'm having 16 cores Intel(R) Xeon(R) CPU E5-2630L v4 @ 1.80GHz now.
net.core.rmem_max=3072000000 net.core.rmem_default=3072000000
net.core.wmem_max=3072000000 net.core.wmem_default=3072000000

But still packetloss.. even with the old version

We are trying to split traffic with IPTables and natting.. to split traffic per switch. Let's see ..

BTW: What is the worker parameter in detail ? The documentation is not really clear about that parameter .. ?

Thanks a lot..

best

@lspgn
Copy link
Member

lspgn commented Jun 23, 2023

@vigodeltoro

BTW: What is the worker parameter in detail ? The documentation is not really clear about that parameter .. ?

I use a worker pool design where a worker is allocated for decoding of a sample (in v2 there is one worker per "socket").
You can try increasing the count.

Regarding the performance: the only way forward might be to provide patches and live testing.
Could also attempt pprof but not sure it will show where the function mostly waits.

@lspgn lspgn added the performance Related to application performance label Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Related to application performance
Projects
None yet
Development

No branches or pull requests

2 participants