Less copying, more batch optimisation #820

droe · 2024-03-13T22:12:56Z

Switch from writing each packet in a single buffer, and copying that buffer into the batch, to directly letting the probe modules write into the batch. To that end, split probe module's thread_initialize callback into thread_initialize and prepare_packet, allowing the latter to be called on each buffer in a batch.
Remove the almost unused ip field from batch->packets[], as it was only used on an error path of BSD's send code, and the IP can also be read from the packet data. While removing the ip field, align the packet data in batch such that the IP headers start at a 32-bit boundary, speeding up 32-bit header field access.

These combined get me ~ 2.9 % send rate improvement, two thirds of which coming from the first change. Again, I realize some or all of this may be arguable.

The change in probe module interface could also be a step towards potential future work of letting probe modules write directly into mapped NIC memory in netmap mode (tho I suspect that might be more intrusive than it's worth).

This is a revised version of #818, now without breaking --dryrun. Tested on macOS Sonoma, FreeBSD 14 and Ubuntu 23.10; smoke tested various probe modules; test-shard.sh and test_big_group.sh succeed.

Introduce new prepare_packet callback for the initialization of send buffers; contrary to the per-thread initialization callback where this was done previously, prepare_packet can be called multiple times, once for each send buffer. Make use of this to prepare packets to send directly in the batch buffers instead of copying them over.

The ip field was only still used in send-bsd and only on a failure path for logging, which does not seem like a strong justification for keeping it around, especially given that it can always be read from packet data. While removing the ip field, align buf such that the IP header is going to start at a 32 bit aligned address, improving perf of IP header field access.

zakird

Looks sane to me. @phillip-stephens can you take a deeper pass?

phillip-stephens

LGTM! I wasn't able to test the --iplayer option but that's due to issue #821 , so I'm good to merge.

droe added 2 commits March 13, 2024 19:29

zakird approved these changes Mar 13, 2024

View reviewed changes

phillip-stephens approved these changes Mar 14, 2024

View reviewed changes

phillip-stephens merged commit 68792f4 into zmap:main Mar 14, 2024
11 checks passed

phillip-stephens mentioned this pull request Mar 15, 2024

Fix being unable to open an IP layer socket on Linux #824

Merged

3 tasks

droe deleted the droe/send-less-copying branch March 25, 2024 23:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less copying, more batch optimisation #820

Less copying, more batch optimisation #820

droe commented Mar 13, 2024

zakird left a comment

phillip-stephens left a comment

Less copying, more batch optimisation #820

Less copying, more batch optimisation #820

Conversation

droe commented Mar 13, 2024

zakird left a comment

Choose a reason for hiding this comment

phillip-stephens left a comment

Choose a reason for hiding this comment