Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reg range support #381

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

borislavmatvey
Copy link
Contributor

While following issue #355 I saw this comment from @giuseppelettieri which matched an idea I was contemplating for a while - to be able to register a range of rings with NetMap. Such change will achieve:

  • easy monitoring of multiple RX queues, instead of creating a fd per ring and managing all of them.
  • easy implementation of spreading of rings between threads in pkt-gen - you just need to register a range instead of just one ring when you call nm_open and the threading code should just work from there on.

Due to the clever design of NetMap the needed patch to support this feature is surprisingly small and this PR is an attempt to achieve this functionality. It contains the following changes:

  • netmap_interp_ringid parses a new registration option NR_REG_RANGE_NIC which will update accordingly np_qfirst and np_qlast fields.
  • The range start is passed as it is now using nr_ringid and the end of the range is passed through the 12 MSBs in nr_flags
  • nm_open can parse an interface in the following format ethX-2-5 which will register rings [2, 5] to the descriptor.

I played a while with pkt-gen sending and receiving traffic from different ranges of queues, so may be a natural step is the modification mentioned in #355. But first as I'm not too familiar with the code of NetMap I would like to inquire NetMap's devs whether:

  • A range registration is something which will make NetMap easier to use and will be accepted?
  • There's a better way to pass the end of the range to the kernel?
  • Are there any obvious issues which could arise with the current code?

Add new reg NR_REG_RANGE_NIC option which registers a range [start, end]
of rings to the fd on which NIOCREGIF ioctl is issued. _start_ is
provided by the nr_ringid field and _end_ is gotten from the 12 MSBs in
nr_flags. The latter was chosen as it results in the least change to
pass the _end_ of the range.
This suffix will register range [s, e] of rings on interface ethX. In
such way pkt-gen and all nm_open users can request working with a range
of rings on one fd instead of using just 1 or all rings.
@giuseppelettieri
Copy link
Collaborator

giuseppelettieri commented Oct 19, 2017

Hi @borislavmatvey. Thanks for the code.

We have considered this possibility before, and I think @tbarbette once mentioned that he has an implementation of it.

Unfortunately, there is a FreeBSD limitation that prevents an efficient implementation of this: in the poll/select callback (netmap_poll() in our case) you can put one thread on at most two wait queues (i.e., you can call nm_os_selrecord() at most two times). Since you may need to do that once for read events and another time for write events, you are left with just one wait queue for direction, meaning that you can choose to be woken up for just one kind of "event".

In netmap, threads may choose to be woken up if any ring is ready, or if a specific ring is ready. This can be implemented easily allocating one wait-queue per ring and a single wait-queue for all rings (for each direction). In the current code, the latter queue is used whenever you want to listen on more than one ring (see nm_si_user()). For your implementation, this means that all the threads listening on ranges (of size at least 2) would be registering on the same queue, and therefore they would wake-up whenever a ring becomes ready in any range. Not incorrect, but certainly wasteful and undesirable. This is why we only allow a thread to register for all rings, or for a single ring.

Note that the queue-number limitation does not exist on Linux, but the current code would still do the same thing even there.

I once thought of a different solution, involving a new kind of netmap port (in addtion to pipes, VALE ports, monitors, etc.): a "subset" port, created on-the-fly as needed, whose rings are a subset of the rings of another port. This would also give you more flexibility, such as the ability to attach single rings to VALE switches. This never went beyond the whiteboard, though, for lack of use cases that are really worth the trouble.

@borislavmatvey
Copy link
Contributor Author

Thank you for the elaborate answer.

It's sad - I played a little bit more and really like how natural it is to start different processes between multiple RX queues and you get it out of the box with nm_open. But that's life - sometimes it's not so beautiful :)

@borislavmatvey
Copy link
Contributor Author

borislavmatvey commented Feb 14, 2018

Hello, @giuseppelettieri,

I'm still playing with this change from time to time and I really like it. I'm willing to write an efficient implementation for the wake up of a range of rings on Linux.

May be to not hinder BSD performance this whole feature should be disabled on non-Linux.

If I succeed in doing this effort for Linux, would you reconsider merging the whole change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants