New zero alloc, no_std compatible implementation #19

h33p · 2023-03-30T17:24:56Z

This is the same as #14, but I originally pushed to master on my fork, so I renamed it.

My intention is to benchmark this against master, and if results are promising, make a proposal to rust standard library to make the way this PR works safe to use.

PR description copied from #14:

Thanks a lot for this great crate! I wanted to see if one could make pollster work in no_std environments, since for one of my upcoming async libraries having option to run without standard library is key, and I noticed that there already is a PR getting rid of complex synchronization mechanisms like condition variables in #9.

The use of thread_locals to avoid locking was really interesting, but that made pollster even harder to support no_std, so I took inspiration from that implementation and looked for a way to have the best of all worlds - no dependency on std or alloc.

This implementation relies on the fact that the Thread is cloneable and layout-compatible with a pointer (in fact, it's an Arc), thus one may just build a Waker out of the thread without unnecessary wrapping behind Arcs.

Is this safe? It depends on you, but my view is that it's safe enough, because:

a) Thread relies on having constant memory location and I see no reason why the layout should change anytime soon.
b) Even if the layout changed, it would not lead to unsafety. Instead, the compiler would refuse to compile the crate with std feature enabled due to sizes in mem::transmute mismatching.

But I'm not sure what you think, it's a hack, and it would ideally be nice to work towards support for taking a "thread pointer" and construct a thread handle from that in the standard library, but right now there are no such facilities available, thus it is as good as it gets.

h33p · 2023-03-30T18:05:05Z

Okay, I added benchmarks in #20 and the results are significant:

Benchmarking poll_ready
Benchmarking poll_ready: Warming up for 3.0000 s
Benchmarking poll_ready: Collecting 100 samples in estimated 5.0000 s (265M iterations)
Benchmarking poll_ready: Analyzing
poll_ready              time:   [18.763 ns 18.785 ns 18.813 ns]
                        change: [-25.293% -23.515% -22.146%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

Benchmarking wakeup_and_pending
Benchmarking wakeup_and_pending: Warming up for 3.0000 s
Benchmarking wakeup_and_pending: Collecting 100 samples in estimated 5.0000 s (149M iterations)
Benchmarking wakeup_and_pending: Analyzing
wakeup_and_pending      time:   [33.639 ns 33.716 ns 33.807 ns]
                        change: [-52.716% -52.534% -52.359%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  10 (10.00%) high severe

Benchmarking wait_for_thread
Benchmarking wait_for_thread: Warming up for 3.0000 s
Benchmarking wait_for_thread: Collecting 100 samples in estimated 5.0027 s (2.0M iterations)
Benchmarking wait_for_thread: Analyzing
wait_for_thread         time:   [2.4659 µs 2.4837 µs 2.5059 µs]
                        change: [-14.645% -13.540% -12.475%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  10 (10.00%) high severe

I think having ability to get raw pointer to thread in standard library would help a lot with simple executors like pollster.

zesterer · 2023-03-31T20:05:55Z

These are some impressive results! I think there's a good case for std such an approach here given the benchmark numbers.

zesterer · 2023-03-31T20:09:59Z

Regarding your question in the description:

Is this safe?

No, it's not. And not in a 'maybe, depending on your perspective' way. It is simply UB, but UB that happens to work today with current versions of std and the current way the compiler chooses to apply padding to the struct. Unfortunately this isn't something I'd be willing to merge as-is, and it would be irresponsible to do so given the number of users that this crate has. While this happens to work today, there's no reason why future versions of std couldn't change the representation of Thread, breaking this crate entirely for users of future versions of Rust: in terms of semver (the crate may suddenly fail to compile due to a size mismatch), in terms of functionality (users may experience segfaults, panics, etc.) and in terms of security (once UB is invoked, there's no telling what will happen: you're giving the compiler permission to throw your firstborn off a cliff).

That said, I don't want to detract from the work you've put into it: I think that's a great case for adding a safe way to do this into std and I really hope it encourages action on that front. When this can be made safe (when I say 'safe' I don't mean 'without the unsafe keyword', I mean 'not producing undefined behaviour') I'd be very happy to merge it.

h33p · 2023-04-01T19:04:30Z

Regarding your question in the description:

Is this safe?

No, it's not.

Oh, totally, I completely agree with this! It was just copied and pasted from the previous PR. Now, I think introduction of Parkable is still a valuable addition for helping one use pollster in no_std envs. So here's what I did:

Target Arc<Thread> as default for std targets.

This change brings slight performance regression in Poll::ready cases, and still significant speedup in cases where code is actually async. Here's benchmark ran against #20 as baseline:

     Running benches/main.rs (target/release/deps/main-270be7310c1a1c15)
Benchmarking poll_ready
Benchmarking poll_ready: Warming up for 3.0000 s
Benchmarking poll_ready: Collecting 100 samples in estimated 5.0001 s (168M iterations)
Benchmarking poll_ready: Analyzing
poll_ready              time:   [29.710 ns 29.805 ns 29.905 ns]
                        change: [+21.732% +22.298% +22.828%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking wakeup_and_pending
Benchmarking wakeup_and_pending: Warming up for 3.0000 s
Benchmarking wakeup_and_pending: Collecting 100 samples in estimated 5.0000 s (117M iterations)
Benchmarking wakeup_and_pending: Analyzing
wakeup_and_pending      time:   [41.906 ns 41.987 ns 42.073 ns]
                        change: [-41.377% -41.189% -41.004%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  1 (1.00%) high severe

Benchmarking wait_for_thread
Benchmarking wait_for_thread: Warming up for 3.0000 s
Benchmarking wait_for_thread: Collecting 100 samples in estimated 5.0090 s (1.9M iterations)
Benchmarking wait_for_thread: Analyzing
wait_for_thread         time:   [2.5141 µs 2.5394 µs 2.5695 µs]
                        change: [-19.440% -16.494% -13.789%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

To be fair, I'm not entirely sure why Arc::new(Thread::current()) is slower than Arc::new(Signal::new())...

Add experimental --cfg pollster_unstable flag, to target Thread as default Parkable type for std targets.

This has to be manually opted in by the user. They need to pass RUSTFLAGS="--cfg pollster_unstable", or set rustflags in their .cargo/config.toml file. This takes inspiration from how tokio handles unstable features: https://docs.rs/tokio/latest/tokio/#unstable-features

If pollster_unstable is not set, no UB code is emitted during compilation. Would that be sufficient for inclusion? I can remove it altogether if you wish, though. Once Thread has necessary functions for safe transmutation, it can be stabilized.

By the way, tracking of the rust std changes:
ACP - rust-lang/libs-team#200
Implementation - rust-lang/rust#97524

So that's about it. What do you think?

zesterer · 2023-04-03T12:21:22Z

Thanks for the details.

If pollster_unstable is not set, no UB code is emitted during compilation. Would that be sufficient for inclusion?

Unfortunately I still wouldn't be happy to include it. UB is pervasive and dangerous. We might as well add a --cfg please-miscompile-my-code flag. There is no circumstance (at all, not even in cases where safety "doesn't matter") where this is a correct and valid thing to do. I can understand if you're coming from the C/C++ worlds that this might seem like a very strong position for me to take, but 'almost watertight' submarines kill people, and UB comes with similar risks due to its pervasive nature. If one small corner of your codebase contains UB, your entire codebase contains UB. I'm particularly careful about UB because, as a (hobby) compiler developer, I've implemented optimisers that leverage it and I know just how dangerous it can be to abuse it.

That said, I'm happy for the code to exist in the codebase, provided it's commented out for now, perhaps with a comment explaining the motivation for this. In time, and if std grows the ability to translate Thread into an Arc/pointer-like, then we can add support for that and I'll be very happy.

This implementation adds generic `Parkable` trait that can be implemented on any type. This enables to opt out from the default `Parkable` implementation on `Thread` with `default-features = false`.

h33p · 2023-04-03T17:50:30Z

Thanks for the details.

If pollster_unstable is not set, no UB code is emitted during compilation. Would that be sufficient for inclusion?

Unfortunately I still wouldn't be happy to include it. UB is pervasive and dangerous. We might as well add a --cfg please-miscompile-my-code flag. There is no circumstance (at all, not even in cases where safety "doesn't matter") where this is a correct and valid thing to do. I can understand if you're coming from the C/C++ worlds that this might seem like a very strong position for me to take, but 'almost watertight' submarines kill people, and UB comes with similar risks due to its pervasive nature. If one small corner of your codebase contains UB, your entire codebase contains UB. I'm particularly careful about UB because, as a (hobby) compiler developer, I've implemented optimisers that leverage it and I know just how dangerous it can be to abuse it.

That said, I'm happy for the code to exist in the codebase, provided it's commented out for now, perhaps with a comment explaining the motivation for this. In time, and if std grows the ability to translate Thread into an Arc/pointer-like, then we can add support for that and I'll be very happy.

Coolio, I've removed pollster_unstable, and I added a comment above Arc<Thread> implementation that references this and the rust-lang PR. I think it should be ready for review now.

zesterer · 2023-04-04T08:21:57Z

Thanks! I'll try to find the time to review this over the next few days.

zesterer · 2023-04-18T07:19:24Z

src/lib.rs

+/// The interface models that of `std::thread`, and many functions, such as
+/// [`current`](Parkable::current), [`park`](Parkable::park), and [`unpark`](Parkable::unpark)
+/// map to `std::thread` equivalents.
+pub trait Parkable: Sized + Clone {


It's not clear to me that this is an API worth revealing, at least for now. The two implementations it has differ only in whether std is present, which probably implies that it's better to have an std feature to selectively enable the two implementations.

zesterer · 2023-04-18T07:20:36Z

src/lib.rs

-impl Wake for Signal {
-    fn wake(self: Arc<Self>) {
-        self.notify();
+impl Parkable for () {


I would probably use a newtype here like

pub struct Spin;

zesterer · 2023-04-18T07:21:37Z

src/lib.rs

+// - https://github.com/rust-lang/libs-team/issues/200
+// - https://github.com/rust-lang/rust/pull/97524
+#[cfg(feature = "std")]
+impl Parkable for Arc<Thread> {


Revealing this in a public API seems like a mistake: it's something we'd need to support into the future, even if the associated std feature is merged.

notgull

I don't think spinning on futures should be exposed, at least not without ample documentation explaining why you shouldn't do it and an API that makes it non-trivial to instantiate (e.g. Spin::new_consider_something_else()). Even then, I would still consider removing it entirely.

Spinning on a future is almost never what you want. On actual operating systems it leads to priority inversion issues, and even on embedded systems there is usually a specific instruction or other that lets you spin/block in a less CPU-intensive way.

I've avoiding adding a spinning block_on to futures-lite for this reason, and direct people to the spin_on crate if they really need it. I don't think it's a same pattern to expose.

zesterer · 2023-04-24T07:48:58Z

I'd tend to agree. We also provide a similar warning in the README of spin for these reasons.

That said, I do see utility here. There are several applications in embedded and OS development that effectively just require spinning to wait for some hardware to initialise and an interrupt to be triggered, or the like. In these cases, pollster's utility is limited, but it does save a fair bit of verbose manual polling. Provided we carefully document this and heavily dissuade other uses, I think I'd be happy to accept it, on the assumption that the API is designed to make it difficult to accidentally use the wrong implementation.

and even on embedded systems there is usually a specific instruction or other that lets you spin/block in a less CPU-intensive way.

This is what core::hint::spin_loop provides, and is available in an embedded context. Hence, this spin implementation here would be using it too, so that's not a reason not to include this, by itself.

h33p · 2023-11-17T21:11:50Z

Hey, sorry for dropping this for too long. After working on a similar system simply using park/unpark, I found out that the system is not robust using FuturesUnordered, which also means it would subtly break in other systems. Basically, the following happens:

Spawn 5 unordered futures.
1 of them completes, calls thread::unpark().
Some blocking operation occurs, clearing the thread's token.
3 more futures complete, but they no longer call thread::unpark(), because FuturesUnordered already called the waker before, in the same poll!
There's one more future left, on the next poll, it would be ready, and FuturesUnordered called the waker, so we should poll immediately.
The runtime instead blocks forever, because at step 3 we performed a blocking operation, clearing the thread's token.

Taken this into account, I'm closing the PR. This also means #8 is broken, and probably only #9 is viable.

EDIT: actually, #9 is not viable either, it would need to expose a condvar.

h33p mentioned this pull request Mar 30, 2023

ACP: add Thread::into_raw and Thread::from_raw rust-lang/libs-team#200

Open

h33p force-pushed the zero-alloc branch from 46c36a9 to 983d859 Compare April 1, 2023 18:48

New no_std compatible implementation

2a76a93

This implementation adds generic `Parkable` trait that can be implemented on any type. This enables to opt out from the default `Parkable` implementation on `Thread` with `default-features = false`.

h33p force-pushed the zero-alloc branch from 983d859 to 2a76a93 Compare April 3, 2023 17:48

zesterer reviewed Apr 18, 2023

View reviewed changes

notgull reviewed Apr 18, 2023

View reviewed changes

h33p closed this Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New zero alloc, no_std compatible implementation #19

New zero alloc, no_std compatible implementation #19

h33p commented Mar 30, 2023

h33p commented Mar 30, 2023

zesterer commented Mar 31, 2023

zesterer commented Mar 31, 2023 •

edited

h33p commented Apr 1, 2023 •

edited

zesterer commented Apr 3, 2023 •

edited

h33p commented Apr 3, 2023

zesterer commented Apr 4, 2023

zesterer Apr 18, 2023

zesterer Apr 18, 2023

zesterer Apr 18, 2023

notgull left a comment

zesterer commented Apr 24, 2023

h33p commented Nov 17, 2023 •

edited

New zero alloc, no_std compatible implementation #19

New zero alloc, no_std compatible implementation #19

Conversation

h33p commented Mar 30, 2023

h33p commented Mar 30, 2023

zesterer commented Mar 31, 2023

zesterer commented Mar 31, 2023 • edited

h33p commented Apr 1, 2023 • edited

zesterer commented Apr 3, 2023 • edited

h33p commented Apr 3, 2023

zesterer commented Apr 4, 2023

zesterer Apr 18, 2023

Choose a reason for hiding this comment

zesterer Apr 18, 2023

Choose a reason for hiding this comment

zesterer Apr 18, 2023

Choose a reason for hiding this comment

notgull left a comment

Choose a reason for hiding this comment

zesterer commented Apr 24, 2023

h33p commented Nov 17, 2023 • edited

zesterer commented Mar 31, 2023 •

edited

h33p commented Apr 1, 2023 •

edited

zesterer commented Apr 3, 2023 •

edited

h33p commented Nov 17, 2023 •

edited