Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interest for an Iterators.nth(x, n) API? #54454

Open
ghyatzo opened this issue May 13, 2024 · 4 comments
Open

Interest for an Iterators.nth(x, n) API? #54454

ghyatzo opened this issue May 13, 2024 · 4 comments
Labels
kind:feature Indicates new feature / enhancement requests

Comments

@ghyatzo
Copy link
Contributor

ghyatzo commented May 13, 2024

Hello,

After searching far and wide both in issues, PR and on the discourse, I could not find any discussion about adding an Iterators.nth(x, n) API just for ease of use and simplicity. This is the only other reference about this possibility I could find.

I have played a little bit with it in the past during various projects and ended up with a slight evolution over the basic version mentioned by @stevengj in the linked post, which I am carrying around when needed:

_inbounds_nth(itr, n) = getindex(iterate(Base.Iterators.drop(itr, n-1)), 1)
_safe_nth(itr, n) = begin
	y = iterate(Base.Iterators.drop(itr, n-1))
	isnothing(y) ? nothing : getindex(y, 1)
end
nth(itr, n; skip_checks=false) = skip_checks ? _inbounds_nth(itr, n) : _safe_nth(itr, n)

simple_nth(itr, n) = first(Iterators.drop(itr, n-1))

which offers the ability to skip bounds checking at the expense of a crash (opposed to just returning nothing).

julia> itr = collect(1:10000)
julia> _safe_nth(itr, 10001)

julia> _inbounds_nth(itr, 10001)
ERROR: MethodError: no method matching getindex(::Nothing, ::Int64)
Stacktrace:
 [1] _inbounds_nth(itr::Vector{Int64}, n::Int64)
   @ Main .\REPL[198]:1
 [2] top-level scope
   @ REPL[205]:1

but that offers decent performance benefits, although we can't escape the O(n) complexity without extra assumptions (not that I know of at least)

julia> @btime _inbounds_nth(itr, 9999) setup=(itr=collect(1:10000))
  151.222 ns (0 allocations: 0 bytes)
9999

julia> @btime _safe_nth(itr, 9999) setup=(itr=collect(1:10000))
  4.400 μs (0 allocations: 0 bytes)
9999

julia> @btime simple_nth(itr, 9999) setup=(itr=collect(1:10000))
  4.414 μs (0 allocations: 0 bytes)
9999

(btw simple_nth also errors out when called out of bounds).

Instead of straight up opening a PR I wanted to check if there was any desire for this kind of little QOL pieces of code.
And more importantly, check with much more knowledgeable people a couple of doubts:

  • is it better to just throw an error or return nothing for these kinds of APIs?
  • maybe instead of a keyword argument skip_checks it is possible to "retrofit" the @inbounds macro to have something like
    @inbounds Iterators.nth(itr, n) kind of calls, is that even a good idea?
  • On the topic of the question above, is there a better way to avoid the branch in the call to begin with? is it automatically optimized away?
  • Should such API just return the whole iteration tuple with the state, and let the user deal with that (doesn't feel right to me though)?
  • I don't know enough about all possible edge cases of what "an iterable" is like, therefore my naive inbounds version only offers performance benefits in this particular case with a vector, so: is making such distinction even worth at all?
  • Maybe useful for make KeyIterator and ValueIterator more array-like #10092?
@Tortar
Copy link
Contributor

Tortar commented May 14, 2024

I just note that

_safe_nth(itr, n) = begin
    y = iterate(Base.Iterators.drop(itr, n-1))
    ifelse(isnothing(y), nothing, getindex(y, 1))
end

is as fast as your _inbounds_nth.

julia> @btime _safe_nth(itr, 9999) setup=(itr=collect(1:10000))
  161.977 ns (0 allocations: 0 bytes)
9999

Actually I'm a bit confused by the fact that the normal branching has a so high cost.

@ghyatzo
Copy link
Contributor Author

ghyatzo commented May 14, 2024

That is great, didn't know about ifelse!
The performance disparity might be due to the fact that ifelse is a normal function call, so it evaluates all arguments beforehand which might help with eliminating the branching altogether?

At this point there isn't really a reason to have a "safe" and "unsafe" version. might as well always check for nothing and have the best of both worlds.

@Tortar
Copy link
Contributor

Tortar commented May 14, 2024

Actually I think the performance gain is just some kind of edge case optimization, consider this with your original version:

julia> itr = Iterators.filter(x -> x != 10, 1:10000);

julia> @btime _inbounds_nth($itr, 9999);
  7.086 μs (0 allocations: 0 bytes)

julia> @btime _safe_nth($itr, 9999);
  7.083 μs (0 allocations: 0 bytes)

In any case I think that returning only the element and not a new iterator starting from there is not ideal because usually one wants to go on with the iteration afterwards so I would consider something like:

julia> nth(itr, n) = Iterators.peel(Iterators.drop(itr, n-1))

julia> @btime nth($itr, 9999);
  7.086 μs (0 allocations: 0 bytes)

but at the same time it is just a one-liner so I'm not sure it is worth it

@inkydragon inkydragon added the kind:feature Indicates new feature / enhancement requests label May 15, 2024
@ghyatzo
Copy link
Contributor Author

ghyatzo commented May 15, 2024

I actually think that a function such as nth(itr,n) is more of an endpoint in the lifetime of an iterator.
Therefore, when you are calling nth you get the end result and not the continuation of the iterator. Plus it matched the intuitive action of "get me the nth element", without forcing the user to deal with the rest or status at every callsite of the nth function. Following a bit the principle of least surprise.

For many intents and purposes, I see nth(itr, n) as a generalisation of the first(itr) function in Base:

nth(itr, n) = begin
    y = iterate(Base.Iterators.drop(itr, n-1))
    ifelse(isnothing(y), nothing, getindex(y, 1))
end

function first(itr)
    x = iterate(itr)
    x === nothing && throw(ArgumentError("collection must be non-empty"))
    x[1]
end

# it could become just this 
# (not backward compatibile and slower, i know, it's just to showcase)
first(itr) = nth(itr, 1)

in my opinion the number of lines of code shouldn't matter when talking about APIs, if it's just a one-liner all the better, but it shouldn't be a justification for not putting something in, just for reference, this is the implementation of first(itr, n) and last(itr, n) in Base:

first(itr, n::Integer) = collect(Iterators.take(itr, n))
last(itr, n::Integer) = reverse!(collect(Iterators.take(Iterators.reverse(itr), n)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature Indicates new feature / enhancement requests
Projects
None yet
Development

No branches or pull requests

3 participants