Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf improvements to MultinomialSampler #1834

Closed
wants to merge 8 commits into from

Conversation

adienes
Copy link
Contributor

@adienes adienes commented Jan 31, 2024

from slack:

regarding https://github.com/JuliaStats/Distributions.jl/pull/1831/files
I realized that the portion

# Use an alias table
fill!(x, zero(eltype(x)))
a = s.alias
for i = 1:n
    x[rand(rng, a)] += 1
end

in _rand!(rng::AbstractRNG, s::MultinomialSampler, x::AbstractVector{<:Real})
can be further improved by making the internal call to rand(rng, s.alias) generate the indices & acceptance probabilities as a batch, something more like

function _rand2!(rng::AbstractRNG, x, s::AliasTable, n, r_idx, r_acc)
    fill!(x, zero(eltype(x)))
    rand!(rng, r_idx, 1:length(s.alias))
    rand!(rng, r_acc)

    @inbounds for i = 1:n
        i2 = r_idx[i]
        x[ifelse(r_acc[i] < s.accept[i2], i2, s.alias[i2])] += 1
    end
end

since Base.rand! itself is more efficient to fill in batch rather than called repeatedly in a loop
however this will either introduce a new allocation (doesn't really impact performance, but going from 0 --> >0 feels bad), or require scratch space in (placeholder) r_idx and r_acc.
would it be appropriate here to include some keyword to designate scratch space? where should that go? I'd prefer to be able to add API to call the alias table sampling in a loop directly, since currently it seems the branch for multinom_rand! is never worth it unless n >> k
in case this is motivating information, with both changes together multinomial sampling can be about 4x faster than what is currently live (edited)

@adienes adienes changed the title more improvements to batch randomness perf improvements to MultinomialSampler Jan 31, 2024
@devmotion
Copy link
Member

Can't we just use Random.Sampler(..., Val(Inf)) to get similar optimizations for repeated sampling as rand! calls into a vector (see https://docs.julialang.org/en/v1/stdlib/Random/#Random.Sampler)?

@adienes
Copy link
Contributor Author

adienes commented Jan 31, 2024

in this case, it's not really "repeated sampling" in the same sense as most other samplers, as this represents a single draw from a Multinomial , which just so happens to be implemented by repeated sampling from a Categorical vis a vis an AliasTable. if you call rand! on a MultinomialSampler it wants a vector of vectors, one for each draw.

in my main use-case for this I will be calling _rand! directly ---- I think the interfaces are not really well-prepared to have the random return values themselves be mutable, as most are scalar.

@adienes
Copy link
Contributor Author

adienes commented Jan 31, 2024

if the idea looks ok I'll go fix all the tests and such, just wanted to hold off on that if this cannot be merged for other reasons

@adienes adienes marked this pull request as draft February 2, 2024 23:16
@adienes
Copy link
Contributor Author

adienes commented Feb 2, 2024

image

benchmarks of 4 different attempts to multinomial sampling. status quo is more or less equivalent to binom column here, which calls multinom_rand!

so there is definitely opportunity to do a lot better at some ranges of input. marking this as draft so I can do more thorough work

note that direct_binsearch includes the time to compute cdf(probs). if somehow this can be computed in advance, it's much faster. similarly sample_at has to construct an AliasTable

@adienes
Copy link
Contributor Author

adienes commented Feb 5, 2024

a challenge here is that the "best" choice can be any of 4-5 different algorithms (with significant performance implications) depending not only on n and k but also on the number of times (say, M) the sampler will be drawn from, as the cost of constructing the sampler itself can be large.

It seems ordinarily the convention is to assume any cost of construction is small compared to cost of repeated sampling, i.e. M >> n,k . But I think this is unlikely to be the case, especially when one of n or k is large.

what's a good path forward? maybe to implement each approach separately, document the performance tradeoffs, and throw heuristics into a MultinomialPolySampler ?

@adienes adienes marked this pull request as ready for review February 19, 2024 17:52
@adienes
Copy link
Contributor Author

adienes commented Feb 19, 2024

cleaned up the test failures, so this is ready for review. just as a reminder of the motivation, performance is improved across the board for all parameter choices, and for some parts of the rangen < k, this implementation is up to 20-100x faster than status quo

I think I should walk back a bit some of the attempt at more algorithm heuristics, as in the case when the number of samples is large (as would be expected from a Sampler) one of the two choices of AliasTable vs Binomial sampling will be optimal

so, the overall functionality now is basically identical to status quo, except that the parameter to switch between alias table & binomial sampling is better tuned, and there is a specific optimization to make use of vectorized rand calls

the only downside is that now sampler(::Multinomial) is type-unstable, but imo that is well worth it.

@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2024

Codecov Report

Attention: Patch coverage is 78.94737% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 85.97%. Comparing base (818814f) to head (5f39ae0).

Files Patch % Lines
src/samplers/multinomial.jl 70.00% 3 Missing ⚠️
src/multivariate/multinomial.jl 88.88% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1834   +/-   ##
=======================================
  Coverage   85.96%   85.97%           
=======================================
  Files         144      144           
  Lines        8647     8653    +6     
=======================================
+ Hits         7433     7439    +6     
  Misses       1214     1214           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@adienes
Copy link
Contributor Author

adienes commented Feb 28, 2024

@devmotion I'm sorry to pester, I know your attention is probably stretched thin being basically one of only a small number of maintainers for a package as popular as this

but does a change like this seem desired at all? if there is not a path to merge I am happy to close the PR and just put the logic into my local project

@ViralBShah
Copy link
Contributor

ViralBShah commented Feb 28, 2024

@andreasnoack is probably also equally stretched. Pinging him here as well. @devmotion is everywhere and certainly stretched thin, I would imagine!

We should also see if we can grow the number of maintainers for this package, and if there are specific folks we should give commit access to, I would love to at least facilitate the conversation.

@devmotion
Copy link
Member

Ah, I didn't realize that the status of the PR was changed to "ready for review".

Some high-level comments right away:

  • I'm all for better performance if the performance improvement is demonstrated (which seems to be the case?) and justifies the possible increased code complexity
  • sampler(::Distribution) is already type unstable for a few other distributions (exactly to choose an optimized sampling algorithm), so this seems fine to me (in rand we have a function barrier that minimizes the practical effect of the type instability)
  • IIRC the typical pattern is to branch in sampler(::Distribution) - the abstract super type and its "constructor" seems to be an approach that is not used by other samplers in Distributions. I suggest following the example of the other samplers and removing the abstract super type.

@adienes
Copy link
Contributor Author

adienes commented Feb 28, 2024

the abstract super type and its "constructor" seems to be an approach that is not used by other samplers in Distributions.

fair point, happy to address this

only question being: what should happen to the existing name MultinomialSampler ? I guess it is neither exported nor has a docstring, so it's ok to remove?

@devmotion
Copy link
Member

If it's not exported, it should be fine to remove it. But probably a bit safer would it to deprecate the type with e.g.

Base.@deprecate_binding MultinomialSampler MultinomialSamplerSequential false

(by default the macro exports the deprecated type; with false it is not exported). See, e.g., https://invenia.github.io/blog/2022/06/17/deprecating-in-julia/

@adienes
Copy link
Contributor Author

adienes commented Mar 3, 2024

thanks for the comments -- let me know if the most recent commit addresses them appropriately

@adienes
Copy link
Contributor Author

adienes commented Mar 16, 2024

friendly bump

alias::AliasTable
scratch_idx::Vector{Int}
scratch_acc::Vector{T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acc because of "accept"? Maybe the name could be a bit clearer.

function _rand!(rng::AbstractRNG, s::MultinomialSamplerSequential, x::AbstractVector{<:Real})
fill!(x, zero(eltype(x)))
at = s.alias
rand!(rng, s.scratch_idx, 1:length(at.alias))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more generically (and safer given that you use @inbounds below):

Suggested change
rand!(rng, s.scratch_idx, 1:length(at.alias))
rand!(rng, s.scratch_idx, eachindex(at.alias, at.accept))

rand!(rng, s.scratch_idx, 1:length(at.alias))
rand!(rng, s.scratch_acc)

@inbounds for i = 1:s.n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't

Suggested change
@inbounds for i = 1:s.n
@inbounds for (i, acc) in zip(s.scratch_idx, s.scratch_acc)

be simpler?

rand!(rng, s.scratch_acc)

@inbounds for i = 1:s.n
i2 = s.scratch_idx[i] % Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.scratch_idx[i] is always an Int, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. should be. though I had just followed what was existing in the rand(::AliasTable) implementation

i = rand(rng, 1:length(s.alias)) % Int

will remove this % Int

@adienes
Copy link
Contributor Author

adienes commented Mar 31, 2024

I hope I have addressed all comments --- the benchmarks still look good on my end so nothing regressed (as expected)

@adienes
Copy link
Contributor Author

adienes commented Apr 20, 2024

took advantage of latest changes to AliasTable

the approach

    rand!(rng, s.scratch_alias_rng)

    for r in s.scratch_alias_rng
        x[AliasTables.sample(r, s.alias.at)] += 1
    end

to batch sample the seed is slightly faster (5-10%) than the existing x[rand(s.alias.at)] loop

@adienes
Copy link
Contributor Author

adienes commented May 2, 2024

another friendly bump :)

@adienes
Copy link
Contributor Author

adienes commented May 19, 2024

I'm sorry to be a nag, but bumping again.

@adienes adienes closed this May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants