Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less copying, more batch optimisation #820

Merged
merged 2 commits into from Mar 14, 2024

Conversation

droe
Copy link
Contributor

@droe droe commented Mar 13, 2024

  • Switch from writing each packet in a single buffer, and copying that buffer into the batch, to directly letting the probe modules write into the batch. To that end, split probe module's thread_initialize callback into thread_initialize and prepare_packet, allowing the latter to be called on each buffer in a batch.
  • Remove the almost unused ip field from batch->packets[], as it was only used on an error path of BSD's send code, and the IP can also be read from the packet data. While removing the ip field, align the packet data in batch such that the IP headers start at a 32-bit boundary, speeding up 32-bit header field access.

These combined get me ~ 2.9 % send rate improvement, two thirds of which coming from the first change. Again, I realize some or all of this may be arguable.

The change in probe module interface could also be a step towards potential future work of letting probe modules write directly into mapped NIC memory in netmap mode (tho I suspect that might be more intrusive than it's worth).

This is a revised version of #818, now without breaking --dryrun. Tested on macOS Sonoma, FreeBSD 14 and Ubuntu 23.10; smoke tested various probe modules; test-shard.sh and test_big_group.sh succeed.

Introduce new prepare_packet callback for the initialization of send
buffers; contrary to the per-thread initialization callback where this
was done previously, prepare_packet can be called multiple times, once
for each send buffer.

Make use of this to prepare packets to send directly in the batch
buffers instead of copying them over.
The ip field was only still used in send-bsd and only on a failure path
for logging, which does not seem like a strong justification for keeping
it around, especially given that it can always be read from packet data.

While removing the ip field, align buf such that the IP header is going
to start at a 32 bit aligned address, improving perf of IP header field
access.
Copy link
Member

@zakird zakird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane to me. @phillip-stephens can you take a deeper pass?

Copy link
Contributor

@phillip-stephens phillip-stephens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I wasn't able to test the --iplayer option but that's due to issue #821 , so I'm good to merge.

@phillip-stephens phillip-stephens merged commit 68792f4 into zmap:main Mar 14, 2024
11 checks passed
@droe droe deleted the droe/send-less-copying branch March 25, 2024 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants