perf improvements to MultinomialSampler #1834

adienes · 2024-01-31T20:32:22Z

from slack:

regarding https://github.com/JuliaStats/Distributions.jl/pull/1831/files
I realized that the portion

# Use an alias table
fill!(x, zero(eltype(x)))
a = s.alias
for i = 1:n
    x[rand(rng, a)] += 1
end

in _rand!(rng::AbstractRNG, s::MultinomialSampler, x::AbstractVector{<:Real})
can be further improved by making the internal call to rand(rng, s.alias) generate the indices & acceptance probabilities as a batch, something more like

function _rand2!(rng::AbstractRNG, x, s::AliasTable, n, r_idx, r_acc)
    fill!(x, zero(eltype(x)))
    rand!(rng, r_idx, 1:length(s.alias))
    rand!(rng, r_acc)

    @inbounds for i = 1:n
        i2 = r_idx[i]
        x[ifelse(r_acc[i] < s.accept[i2], i2, s.alias[i2])] += 1
    end
end

since Base.rand! itself is more efficient to fill in batch rather than called repeatedly in a loop
however this will either introduce a new allocation (doesn't really impact performance, but going from 0 --> >0 feels bad), or require scratch space in (placeholder) r_idx and r_acc.
would it be appropriate here to include some keyword to designate scratch space? where should that go? I'd prefer to be able to add API to call the alias table sampling in a loop directly, since currently it seems the branch for multinom_rand! is never worth it unless n >> k
in case this is motivating information, with both changes together multinomial sampling can be about 4x faster than what is currently live (edited)

devmotion · 2024-01-31T20:45:49Z

Can't we just use Random.Sampler(..., Val(Inf)) to get similar optimizations for repeated sampling as rand! calls into a vector (see https://docs.julialang.org/en/v1/stdlib/Random/#Random.Sampler)?

adienes · 2024-01-31T20:57:34Z

in this case, it's not really "repeated sampling" in the same sense as most other samplers, as this represents a single draw from a Multinomial , which just so happens to be implemented by repeated sampling from a Categorical vis a vis an AliasTable. if you call rand! on a MultinomialSampler it wants a vector of vectors, one for each draw.

in my main use-case for this I will be calling _rand! directly ---- I think the interfaces are not really well-prepared to have the random return values themselves be mutable, as most are scalar.

adienes · 2024-01-31T21:04:21Z

if the idea looks ok I'll go fix all the tests and such, just wanted to hold off on that if this cannot be merged for other reasons

adienes · 2024-02-02T23:16:49Z

benchmarks of 4 different attempts to multinomial sampling. status quo is more or less equivalent to binom column here, which calls multinom_rand!

so there is definitely opportunity to do a lot better at some ranges of input. marking this as draft so I can do more thorough work

note that direct_binsearch includes the time to compute cdf(probs). if somehow this can be computed in advance, it's much faster. similarly sample_at has to construct an AliasTable

adienes · 2024-02-05T17:31:05Z

a challenge here is that the "best" choice can be any of 4-5 different algorithms (with significant performance implications) depending not only on n and k but also on the number of times (say, M) the sampler will be drawn from, as the cost of constructing the sampler itself can be large.

It seems ordinarily the convention is to assume any cost of construction is small compared to cost of repeated sampling, i.e. M >> n,k . But I think this is unlikely to be the case, especially when one of n or k is large.

what's a good path forward? maybe to implement each approach separately, document the performance tradeoffs, and throw heuristics into a MultinomialPolySampler ?

adienes · 2024-02-19T17:55:46Z

cleaned up the test failures, so this is ready for review. just as a reminder of the motivation, performance is improved across the board for all parameter choices, and for some parts of the rangen < k, this implementation is up to 20-100x faster than status quo

I think I should walk back a bit some of the attempt at more algorithm heuristics, as in the case when the number of samples is large (as would be expected from a Sampler) one of the two choices of AliasTable vs Binomial sampling will be optimal

so, the overall functionality now is basically identical to status quo, except that the parameter to switch between alias table & binomial sampling is better tuned, and there is a specific optimization to make use of vectorized rand calls

the only downside is that now sampler(::Multinomial) is type-unstable, but imo that is well worth it.

codecov-commenter · 2024-02-19T18:09:13Z

Codecov Report

Attention: Patch coverage is 78.94737% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 85.97%. Comparing base (818814f) to head (5f39ae0).

Files	Patch %	Lines
src/samplers/multinomial.jl	70.00%	3 Missing ⚠️
src/multivariate/multinomial.jl	88.88%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1834   +/-   ##
=======================================
  Coverage   85.96%   85.97%           
=======================================
  Files         144      144           
  Lines        8647     8653    +6     
=======================================
+ Hits         7433     7439    +6     
  Misses       1214     1214

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

adienes · 2024-02-28T17:35:31Z

@devmotion I'm sorry to pester, I know your attention is probably stretched thin being basically one of only a small number of maintainers for a package as popular as this

but does a change like this seem desired at all? if there is not a path to merge I am happy to close the PR and just put the logic into my local project

ViralBShah · 2024-02-28T19:19:31Z

@andreasnoack is probably also equally stretched. Pinging him here as well. @devmotion is everywhere and certainly stretched thin, I would imagine!

We should also see if we can grow the number of maintainers for this package, and if there are specific folks we should give commit access to, I would love to at least facilitate the conversation.

devmotion · 2024-02-28T19:43:18Z

Ah, I didn't realize that the status of the PR was changed to "ready for review".

Some high-level comments right away:

I'm all for better performance if the performance improvement is demonstrated (which seems to be the case?) and justifies the possible increased code complexity
sampler(::Distribution) is already type unstable for a few other distributions (exactly to choose an optimized sampling algorithm), so this seems fine to me (in rand we have a function barrier that minimizes the practical effect of the type instability)
IIRC the typical pattern is to branch in sampler(::Distribution) - the abstract super type and its "constructor" seems to be an approach that is not used by other samplers in Distributions. I suggest following the example of the other samplers and removing the abstract super type.

adienes · 2024-02-28T20:40:00Z

the abstract super type and its "constructor" seems to be an approach that is not used by other samplers in Distributions.

fair point, happy to address this

only question being: what should happen to the existing name MultinomialSampler ? I guess it is neither exported nor has a docstring, so it's ok to remove?

devmotion · 2024-02-29T02:12:16Z

If it's not exported, it should be fine to remove it. But probably a bit safer would it to deprecate the type with e.g.

Base.@deprecate_binding MultinomialSampler MultinomialSamplerSequential false

(by default the macro exports the deprecated type; with false it is not exported). See, e.g., https://invenia.github.io/blog/2022/06/17/deprecating-in-julia/

adienes · 2024-03-03T01:40:32Z

thanks for the comments -- let me know if the most recent commit addresses them appropriately

adienes · 2024-03-16T22:12:22Z

friendly bump

devmotion · 2024-03-19T22:22:24Z

src/samplers/multinomial.jl

    alias::AliasTable
+    scratch_idx::Vector{Int}
+    scratch_acc::Vector{T}


acc because of "accept"? Maybe the name could be a bit clearer.

devmotion · 2024-03-19T22:26:46Z

src/samplers/multinomial.jl

+function _rand!(rng::AbstractRNG, s::MultinomialSamplerSequential, x::AbstractVector{<:Real})
+    fill!(x, zero(eltype(x)))
+    at = s.alias
+    rand!(rng, s.scratch_idx, 1:length(at.alias))


Maybe more generically (and safer given that you use @inbounds below):

Suggested change

rand!(rng, s.scratch_idx, 1:length(at.alias))

rand!(rng, s.scratch_idx, eachindex(at.alias, at.accept))

devmotion · 2024-03-19T22:48:38Z

src/samplers/multinomial.jl

+    rand!(rng, s.scratch_idx, 1:length(at.alias))
+    rand!(rng, s.scratch_acc)
+
+    @inbounds for i = 1:s.n


Wouldn't

Suggested change

@inbounds for i = 1:s.n

@inbounds for (i, acc) in zip(s.scratch_idx, s.scratch_acc)

be simpler?

devmotion · 2024-03-19T22:49:04Z

src/samplers/multinomial.jl

+    rand!(rng, s.scratch_acc)
+
+    @inbounds for i = 1:s.n
+        i2 = s.scratch_idx[i] % Int


s.scratch_idx[i] is always an Int, isn't it?

yes. should be. though I had just followed what was existing in the rand(::AliasTable) implementation

Distributions.jl/src/samplers/aliastable.jl

Line 17 in a9b0e3c

i = rand(rng, 1:length(s.alias)) % Int

will remove this % Int

adienes · 2024-03-31T18:16:54Z

I hope I have addressed all comments --- the benchmarks still look good on my end so nothing regressed (as expected)

adienes · 2024-04-20T16:30:53Z

took advantage of latest changes to AliasTable

the approach

    rand!(rng, s.scratch_alias_rng)

    for r in s.scratch_alias_rng
        x[AliasTables.sample(r, s.alias.at)] += 1
    end

to batch sample the seed is slightly faster (5-10%) than the existing x[rand(s.alias.at)] loop

adienes · 2024-05-02T18:52:18Z

another friendly bump :)

adienes · 2024-05-19T14:12:04Z

I'm sorry to be a nag, but bumping again.

more improvements to batch randomness

d721094

adienes changed the title ~~more improvements to batch randomness~~ perf improvements to MultinomialSampler Jan 31, 2024

adienes marked this pull request as draft February 2, 2024 23:16

clean up test failures

6c4d19a

adienes marked this pull request as ready for review February 19, 2024 17:52

similar for acc

a12a1a0

address review comments

22bcfb2

ViralBShah requested a review from devmotion March 19, 2024 20:49

devmotion reviewed Mar 19, 2024

View reviewed changes

adienes added 2 commits March 31, 2024 14:05

apply code review

6821dec

typo

95dbff1

adienes mentioned this pull request Apr 12, 2024

Use a faster implementation of AliasTables #1848

Merged

adienes added 2 commits April 20, 2024 12:26

faster multinomial sampling

5945b45

Merge branch 'multinomial_sampler_perf2' into multinomial_sampler_perf

5f39ae0

adienes closed this May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf improvements to MultinomialSampler #1834

perf improvements to MultinomialSampler #1834

adienes commented Jan 31, 2024

devmotion commented Jan 31, 2024

adienes commented Jan 31, 2024 •

edited

adienes commented Jan 31, 2024

adienes commented Feb 2, 2024 •

edited

adienes commented Feb 5, 2024 •

edited

adienes commented Feb 19, 2024 •

edited

codecov-commenter commented Feb 19, 2024 •

edited

adienes commented Feb 28, 2024 •

edited

ViralBShah commented Feb 28, 2024 •

edited

devmotion commented Feb 28, 2024

adienes commented Feb 28, 2024

devmotion commented Feb 29, 2024

adienes commented Mar 3, 2024

adienes commented Mar 16, 2024

devmotion Mar 19, 2024

devmotion Mar 19, 2024

devmotion Mar 19, 2024

devmotion Mar 19, 2024

adienes Mar 31, 2024

adienes commented Mar 31, 2024

adienes commented Apr 20, 2024

adienes commented May 2, 2024

adienes commented May 19, 2024

	rand!(rng, s.scratch_idx, 1:length(at.alias))
	rand!(rng, s.scratch_idx, eachindex(at.alias, at.accept))

	@inbounds for i = 1:s.n
	@inbounds for (i, acc) in zip(s.scratch_idx, s.scratch_acc)

perf improvements to MultinomialSampler #1834

perf improvements to MultinomialSampler #1834

Conversation

adienes commented Jan 31, 2024

devmotion commented Jan 31, 2024

adienes commented Jan 31, 2024 • edited

adienes commented Jan 31, 2024

adienes commented Feb 2, 2024 • edited

adienes commented Feb 5, 2024 • edited

adienes commented Feb 19, 2024 • edited

codecov-commenter commented Feb 19, 2024 • edited

Codecov Report

adienes commented Feb 28, 2024 • edited

ViralBShah commented Feb 28, 2024 • edited

devmotion commented Feb 28, 2024

adienes commented Feb 28, 2024

devmotion commented Feb 29, 2024

adienes commented Mar 3, 2024

adienes commented Mar 16, 2024

devmotion Mar 19, 2024

Choose a reason for hiding this comment

devmotion Mar 19, 2024

Choose a reason for hiding this comment

devmotion Mar 19, 2024

Choose a reason for hiding this comment

devmotion Mar 19, 2024

Choose a reason for hiding this comment

adienes Mar 31, 2024

Choose a reason for hiding this comment

adienes commented Mar 31, 2024

adienes commented Apr 20, 2024

adienes commented May 2, 2024

adienes commented May 19, 2024

adienes commented Jan 31, 2024 •

edited

adienes commented Feb 2, 2024 •

edited

adienes commented Feb 5, 2024 •

edited

adienes commented Feb 19, 2024 •

edited

codecov-commenter commented Feb 19, 2024 •

edited

adienes commented Feb 28, 2024 •

edited

ViralBShah commented Feb 28, 2024 •

edited