Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC output with URring #2357

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

POC output with URring #2357

wants to merge 17 commits into from

Conversation

ser-0xff
Copy link
Contributor

Use URing on Linux.

URing on Linux provides an API to demultiplex events and perform socket operations.
NIO originally designed to demultiplex events and perform all IO in a single thread, while URing socket operations are asynchronous. It requires more significant changes in the NIO taking into account the requirement to additionally manage memory buffers for pending read and write operations.

The one idea was to try to minimise PR as possible, so that PR send outbound data using URing, but still read inbound data using base socket API. Reading inbound data with URing is a next step.

Some highlights regarding changes in the PR

  1. The URing API using on Linux can be enabled by the conditional compilation.
    In some places I saw '#if defined(SWIFTNIO_USE_IO_URING) && (os(Linux) || os(Android))
    As of now we have no possibility even to compile it for Android, not saying to test, so all URing related code is under just os(Linux). Let's decide later what can we do with Android.

  2. NIO used some buffers stored in the EventLoop (iovecs, storageRefs etc) which are shared across all channels registered in the event loop. That approach does not work with URing because of its asynchronous nature. There are not too many ways to workaround it for URing, so I moved them to the channel, so with URing each channel has own iovecs and storageRefs buffers. Probably we could cache them... not clear what is better... let's discuss...

  3. URing has no analogue for the sendfile() system call.
    Instead they suggest to splice() the file via intermediate pipe. It works, but requires an intermediate pipe. There are not too many options how to manage intermediate pipes, but I can't see a good one. So, as of now we just cache intermediate pipes and reuse them...

  4. URing has no analogue for sendmmsg() system call.
    Unfortunately did not found a good way to workaround that. Currently just send messages one by one. One possible optimisation here is to collect all messages with the same destination address and same ancillary data and send them with one sendmsg(), but did not found such obvious optimisation in the NIO, so probably it is not a case.

  5. I had to disable few tests which checks the order of system calls. Have no idea how we could make them work for URing, because URing is asynchronous. Let's discuss them later. All other tests works on my environment, but I will not be surprised if they will not work somewhere else.

Will add some information later within comments to PR if will recall something important.

@swift-server-bot
Copy link

Can one of the admins verify this patch?

11 similar comments
@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this patch! I've left a few structural notes in the diff. Note that I haven't done a detailed code review yet, I'm just laying out some thoughts. As we work on the structure I'll continue to re-review the implementation. I've also not deeply looked into the changes to SelectorUring and LinuxUring yet, those are my next round of review. Generally though, this is looking really solid!

Sources/NIOPosix/SelectorGeneric.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/PendingWritesManager.swift Show resolved Hide resolved
let storageRefs = eventLoop.storageRefs
let controlMessageStorage = eventLoop.controlMessageStorage
#endif
self.pendingWrites = PendingDatagramWritesManager(msgs: msgs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love having the pending writes managers take ownership of pointers allocated outside their scope. Can we have two different initialisers, one where they allocate the pointers themselves and one where they don't, and then keep track of what happened with a flag? Then we can drop most of the compile time conditionals, which will make our lives a lot easier.

Copy link
Contributor Author

@ser-0xff ser-0xff Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not love that either...
Had an impression having a different initialisers will require bigger changes in tests...
But may be we could use something different?
For example we could introduce a cache of IOVector buffers in the EventLoop, and then each channel could use that cache. Without URing that cache will always have a one buffer. With URing - probably more than one but less than total number of channels. We could save some memory in that case. And the buffer management logic will not require any conditional compilation at all in that case. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m certainly open to that approach. I wonder if it’s worth landing that cache as a separate PR by itself, actually.

Copy link
Contributor Author

@ser-0xff ser-0xff Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could probably try to implement that cache in a separate branch,
then we will merge it, and then I will merge into uring-out branch, and then we will continue with it.
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seems like a good idea to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that's #2358 - just commenting to tie it together with this comment chain.

}
}

private func _debugPrint(_ s: @autoclosure () -> String) {
#if SWIFTNIO_IO_URING_DEBUG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we need this conditional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That condition was before I started to work on it, just left it...

Copy link
Contributor

@hassila hassila Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a remnant of the original bringup to not having to define out every debugPrint that is in place IIRC (~40).

There are additionally three more specific defines for more in-depth debugging, but they were in few places and are completely defined out (https://github.com/search?q=repo%3Aapple%2Fswift-nio%20SWIFTNIO_IO_URING_DEBUG&type=code) .

Perhaps there's a better way to do this (maybe with the macro support coming up? Basically, one would like to be able to build with debug logging during development but to completely remove traces of it for production as this may be a fairly performance sensitive thing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, ok.

let storageRefs = eventLoop.storageRefs
let controlMessageStorage = eventLoop.controlMessageStorage
#endif
self.pendingWrites = PendingDatagramWritesManager(msgs: msgs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m certainly open to that approach. I wonder if it’s worth landing that cache as a separate PR by itself, actually.

Sources/NIOPosix/PendingWritesManager.swift Show resolved Hide resolved
Sources/NIOPosix/SelectorGeneric.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/LinuxUring.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/LinuxUring.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/LinuxUring.swift Outdated Show resolved Hide resolved
@ser-0xff
Copy link
Contributor Author

ser-0xff commented Feb 9, 2023

I got the point regarding the conditional compilation.
How do you think should we move the conditional compilation in that place more deeply leaving the async API available even for a case if the backend is synchronous?

@Lukasa
Copy link
Contributor

Lukasa commented Feb 13, 2023

I'm inclined to suggest that we have most backends implement the functions with fatalError, and only have the uring selector implement them. Then we can ensure that at runtime we only call those functions when we know we have the uring selector, which we should know either as a compile-time static or as event loop state.

@ser-0xff
Copy link
Contributor Author

Could you have a look at PR?
Is it appropriate way to remove conditional compilation in that particular case from your point of view?

@Lukasa
Copy link
Contributor

Lukasa commented Feb 16, 2023

Yes, I think that works fine.

Remove conditional compilation.
private var bufferPool: Pool<PooledBuffer>
private var buffer: PooledBuffer?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the local buffer reference?

Copy link
Contributor Author

@ser-0xff ser-0xff Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 reasons for that:

  1. we will need to put that pooled buffer back to the pool when async operation will be completed
  2. we should keep a memory block which is used for the async operation while the operation will not be completed

And we also have 2 options how to do it:

  1. tie that pooled buffer to the asynchronous write request
  2. keep it locally, so when async request will be completed use local stored object to put it back to the pool

@ser-0xff
Copy link
Contributor Author

ser-0xff commented Feb 17, 2023

Could you have a look at PR?
I tried to disable possibility to send files with URing, but end up with a lot of conditional compilation in tests.
As far as I understand test runner parse swift code and generate a code calling tests, so I have no better idea how could disable those tests when running with URing.
What do you think?

@Lukasa
Copy link
Contributor

Lukasa commented Feb 24, 2023

I'm entirely happy with us adding this conditional compilation into the tests, as it's small and clear. 👍

@hassila
Copy link
Contributor

hassila commented Sep 15, 2023

So what would be good next steps if we want to move this forward?

@ser-0xff
Copy link
Contributor Author

ser-0xff commented Sep 22, 2023

I merged most recent changes from the main into the feature branch and stabilised tests, so all tests (except skipped are passed now).
Had a lot of fun with the test stabilisation,
it still seems to me the idea to have both SYNC and ASYNC code at the same time in the source code is not good and it would be much better separate them with #ifdefs. The point here is that on runtime the only one approach should work, but codebase has a lot of conditional branches, and sometimes it happen that SYNC IO is triggered when ASYNC is supposed to because source code was not fixed properly. It would not even happen if we would completely move the SYNC functionality under #ifdef. Not sure I found all places where SYNC IO can be triggered in ASYNC mode.
But still, all tests pass now, would be nice if you could have a look at PR.

@ser-0xff ser-0xff marked this pull request as ready for review September 22, 2023 08:08
@hassila hassila mentioned this pull request Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants