Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop re-implementing slice iterators in vec::IntoIter #124421

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

scottmcm
Copy link
Member

Today, vec::IntoIter has essentially an inlined copy of an out-of-date version of slice::Iter, including all the distinct T::IS_ZST checks and everything. That's because it can't actually use slice::Iter<'a, T> due to the lifetime and the references.

But it's silly that it needs to reimplement all that logic, and that we thus end up needing things like #114205 to redo various optimizations on it after they're done on slice iterators.

This PR addresses that as follows:

  • Makes a new internal slice::NonNullIter<T> implemented using the same iterator! macro that implements Iterator for Iter and IterMut, so that it'll have all the optimizations without needing to duplicate code, but without the problematic lifetime and SB/TB implications of references.
  • Wraps that into a slice::DrainRaw<T> that moves/drops the items in the iterated range.
  • Implements both array::Drain<'a, T> and vec::IntoIter<T> using DrainRaw, rather than them both needing to have the "read the pointer on next, drop_in_place on drop" logic.

Thus vec::IntoIter's implementation gets a bunch shorter since it no longer has to worry about things like the best way to iterate a slice of ZSTs. It can just focus on ensuring that the deallocating happens in the correct place & order.

(This intentionally leaves slice::Iter and slice::ItemMut almost entirely alone, as the details there are very hot.)

@rustbot
Copy link
Collaborator

rustbot commented Apr 26, 2024

r? @m-ou-se

rustbot has assigned @m-ou-se.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 26, 2024

/// Iterator over all the `NonNull<T>` pointers to the elements of a slice.
#[must_use = "iterators are lazy and do nothing unless consumed"]
pub struct NonNullIter<T> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be useful to expose one day, like @dtolnay's #91390 explored before, but for this PR it's just pub(in core::slice) since it's never needed outside there and doesn't have an ACP.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@scottmcm scottmcm force-pushed the slice-drain-iter branch 3 times, most recently from 16a13b6 to 01617bb Compare April 27, 2024 01:49
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

Comment on lines +213 to +215
// FIXME: for some reason, just `self.drain.__iterator_get_unchecked(i)`
// never worked for me. If you know a way to fix that, please do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the Copy bound? try_get_unchecked would probably fix that... which is not public.
But reimplementing the pointer logic seems simple enough.

Copy link
Member Author

@scottmcm scottmcm Apr 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried impl TRANC for IntoIter where DrainRaw: TRANC which I thought would have made it work, and saved the extra NonDrop copy, but it didn't :(

I agree it's no big deal since the logic is simple.

Copy link
Member

@the8472 the8472 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd make sense to run the microbenchmarks we have for alloc and core here. They're quite noisy so you'll have to quiet your system (disabling turbo boost can help) but they cover more than the codegen tests.

And could DrainRaw help with vec::Drain and vec::Splice too?

Comment on lines 245 to 246
// SAFETY: If they're accessing things in random order we don't know when to drop
// things, so only allow this for `Copy` things where that doesn't matter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the random order that's causing it, it's that the accesses don't update the pointers, which the TRA contract does not require. Like the NonDrop already says.

Comment on lines +155 to +183
if len >= N {
// SAFETY: If we have more elements than were requested, they can be
// read directly because arrays need no extra alignment.
Ok(unsafe { to_copy.cast::<[T; N]>().read() })
} else {
let mut raw_ary = MaybeUninit::uninit_array();
// SAFETY: If we don't have enough elements left, then copy all the
// ones we do have into the local array, which cannot overlap because
// new locals are always distinct storage.
Err(unsafe {
MaybeUninit::<T>::slice_as_mut_ptr(&mut raw_ary)
.copy_from_nonoverlapping(to_copy.as_mut_ptr(), len);
array::IntoIter::new_unchecked(raw_ary, 0..len)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked the codegen of this? When I wrote the vec::IntoIter one I found that copying into the same uninit array in both arms helped a little. But it's possible that it's no longer necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like both are codegen'ing way better than I remember, actually. I think the new dead_on_unwind flag on the sret is helping a ton for things like this. (It's in https://releases.llvm.org/18.1.0/docs/LangRef.html but not in https://releases.llvm.org/17.0.1/docs/LangRef.html, so new in nightly as of #120055 in Feb.)

I'll add a codegen test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, this PR produces the following:

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn uwtable
define void @vec_iter_next_chunk_long(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([152 x i8]) align 8 dereferenceable(152) %_0, ptr noalias nocapture noundef align 8 dereferenceable(32) %it) unnamed_addr #3 personality ptr @__CxxFrameHandler3 {
start:
  %_2 = getelementptr inbounds i8, ptr %it, i64 8
  tail call void @llvm.experimental.noalias.scope.decl(metadata !54)
  tail call void @llvm.experimental.noalias.scope.decl(metadata !57)
  %self1.i.i = getelementptr inbounds i8, ptr %it, i64 16
  %end.i.i = load ptr, ptr %self1.i.i, align 8, !alias.scope !59, !noalias !54, !nonnull !5, !noundef !5
  %subtracted.i.i = load ptr, ptr %_2, align 8, !alias.scope !57, !noalias !54, !nonnull !5, !noundef !5
  %0 = ptrtoint ptr %end.i.i to i64
  %1 = ptrtoint ptr %subtracted.i.i to i64
  %2 = sub nuw i64 %0, %1
  %_0.sroa.0.0.sroa.speculated.i.i = tail call noundef i64 @llvm.umin.i64(i64 %2, i64 123)
  %_13.i.i = getelementptr inbounds i8, ptr %subtracted.i.i, i64 %_0.sroa.0.0.sroa.speculated.i.i
  store ptr %_13.i.i, ptr %_2, align 8, !alias.scope !62, !noalias !54
  %_7.i = icmp ugt i64 %2, 122
  br i1 %_7.i, label %bb2.i, label %bb3.i

bb3.i:                                            ; preds = %start
  %_11.sroa.5.0..sroa_idx.i = getelementptr inbounds i8, ptr %_0, i64 24
  tail call void @llvm.memcpy.p0.p0.i64(ptr nonnull align 8 %_11.sroa.5.0..sroa_idx.i, ptr nonnull align 1 %subtracted.i.i, i64 %2, i1 false), !noalias !57
  %3 = getelementptr inbounds i8, ptr %_0, i64 8
  store i64 0, ptr %3, align 8, !alias.scope !54, !noalias !57
  %_11.sroa.4.0..sroa_idx.i = getelementptr inbounds i8, ptr %_0, i64 16
  store i64 %2, ptr %_11.sroa.4.0..sroa_idx.i, align 8, !alias.scope !54, !noalias !57
  br label %"_ZN96_$LT$core..slice..drain..DrainRaw$LT$T$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$10next_chunk17h4db823eaa1311d9bE.exit"

bb2.i:                                            ; preds = %start
  %4 = getelementptr inbounds i8, ptr %_0, i64 1
  tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(123) %4, ptr noundef nonnull align 1 dereferenceable(123) %subtracted.i.i, i64 123, i1 false), !noalias !57
  br label %"_ZN96_$LT$core..slice..drain..DrainRaw$LT$T$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$10next_chunk17h4db823eaa1311d9bE.exit"

"_ZN96_$LT$core..slice..drain..DrainRaw$LT$T$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$10next_chunk17h4db823eaa1311d9bE.exit": ; preds = %bb3.i, %bb2.i
  %.sink.i = phi i8 [ 0, %bb2.i ], [ 1, %bb3.i ]
  store i8 %.sink.i, ptr %_0, align 8, !alias.scope !54, !noalias !57
  ret void
}

which is nearly identical to https://rust.godbolt.org/z/zEddbzcYG.

The difference is that now LLVM has decided to hoist the "move the start pointer forward" into the header instead of having each arm do it. Seems at least plausibly better -- a cmov for something we already have a data dependency on anyway is probably better than each branch having a different store.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, maybe it's way newer than I'd realized: #121298

Comment on lines 90 to 96
#[inline]
pub(crate) fn make_shortlived_slice<'b>(&'b self) -> &'b [T] {
// SAFETY: Everything expanded with this macro is readable while
// the iterator exists and is unchanged, so we can force a shorter
// lifetime than `'a` to make this safe to call.
unsafe { self.make_nonnull_slice().as_ref() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why the explicit lifetime here is necessary. &self borrows usually already are shorter than Self<'a> lifetimes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not necessary. I just wanted to be explicit about it when as_ref returns an unbounded lifetime and there's an 'a in scope from the impl. I've updated the comment to hopefully communicate better.

Rather than encoding things as method calls or token expansions.

This isn't directly useful for Iter/IterMut, but sets this up for the next commit that adds another invocation of the macro.
@scottmcm scottmcm force-pushed the slice-drain-iter branch 2 times, most recently from ffaafa4 to f4fbb48 Compare April 27, 2024 19:45
@scottmcm
Copy link
Member Author

scottmcm commented Apr 27, 2024

And could DrainRaw help with vec::Drain and vec::Splice too?

Good idea. Looking into that a bit I end up at

            // drop_ptr comes from a slice::Iter which only gives us a &[T] but for drop_in_place
            // a pointer with mutable provenance is necessary.

which looks highly promising that this will help simplify things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants