Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computed field len and rand now returns type T in Uniform{T} #1796

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

f-ij
Copy link

@f-ij f-ij commented Nov 6, 2023

Added computed field len which is the length of the interval to be sampled from. Also replaced all instances of (d.b-d.a) by the new field d.len.

Fixed an issue where rand(::Uniform{Float32}) (and other, non Float64 types) did not in fact return a Float32, but a Float64, which caused some performance issues. This was done by modifiying the method rand(rng::AbstractRNG, d::Uniform) to include the subtype of Uniform and pass it to rand.

This was discussed on discourse

Added computed field len which is the length of the interval to be sampled from. Also replaced all instances of (d.b-d.a) by the new field d.len.
@devmotion
Copy link
Member

The rand method follows the current design, currently almost all distributions return samples of type Float64 and Int and intentionally the random variate type is independent of the parameter types (there are many issues and discussions in the repo). I would like to improve this design, and I've discussed a better design with some other contributors, but this requires more and different changes. We should not introduce an additional inconsistency by making rand of Uniform special.

@devmotion
Copy link
Member

Regarding len: IMO we should not add such a field since it is not an actual parameter, could easily cause inconsistencies and I think your example should be fixed differently. As in Random, you should construct a sampler when you perform repeated sampling (unfortunately, the syntax is a bit different in Distributions, possibly because it predates (?) the interface in Random). len should only be computed when constructing such a sampler.

@@ -151,7 +152,7 @@ Base.:*(c::Real, d::Uniform) = Uniform(minmax(c * d.a, c * d.b)...)

#### Sampling

rand(rng::AbstractRNG, d::Uniform) = d.a + (d.b - d.a) * rand(rng)
rand(rng::AbstractRNG, d::Uniform{T}) where T = d.a + d.len * rand(T, rng)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rand(rng::AbstractRNG, d::Uniform{T}) where T = d.a + d.len * rand(T, rng)
rand(rng::AbstractRNG, d::Uniform{T}) where T = d.a + d.len * rand(float(T), rng)

You need to promote T to a floating-point type.

@stevengj
Copy link

stevengj commented Nov 6, 2023

In particular, this corresponds to issue #1783.

@devmotion
Copy link
Member

With the current design, the "correct" fix would be to always return Float64 as the variates are supposed to be of type Float64. But as said, I'd rather fix this design. I'll try to write up our suggestions and draft a PR within the next few days.

@simsurace
Copy link
Contributor

Why would one expect a Float64 result when sampling a distribution with parameters only specified to Float32 precision, especially if the parameters are from the same space as the random variable?

I think that API to specify the output type would be nice, but in the absence of such an initiative and given the fact that there is no promise regarding Float64 output, the proposed change looks reasonable to me.

@f-ij
Copy link
Author

f-ij commented Nov 6, 2023

The rand method follows the current design, currently almost all distributions return samples of type Float64 and Int and intentionally the random variate type is independent of the parameter types (there are many issues and discussions in the repo). I would like to improve this design, and I've discussed a better design with some other contributors, but this requires more and different changes. We should not introduce an additional inconsistency by making rand of Uniform special.

I haven't used Distributions extensively, but currently calling rand on Uniform{Float32} does also return a Float32. Regarding your comment on the len field, I don't see why this would be a problem. For most design patterns in Julia we may assume a user won't directly change any struct fields, right? If so then inconsistencies shouldn't be a problem.

@stevengj
Copy link

stevengj commented Nov 6, 2023

currently calling rand on Uniform{Float32} does also return a Float32

@f-ij, was this a typo? It returns a Float64 currently:

julia> u = Uniform(1.0f0, 2.0f0)
Uniform{Float32}(a=1.0f0, b=2.0f0)

julia> rand(u)
1.8944514300368434

julia> typeof(ans)
Float64

@f-ij
Copy link
Author

f-ij commented Nov 6, 2023

currently calling rand on Uniform{Float32} does also return a Float32

@f-ij, was this a typo? It returns a Float64 currently:

julia> u = Uniform(1.0f0, 2.0f0)
Uniform{Float32}(a=1.0f0, b=2.0f0)

julia> rand(u)
1.8944514300368434

julia> typeof(ans)
Float64

Sorry that was indeed a typo! I meant Normal{Float32}..

@devmotion
Copy link
Member

Normal is the inconsistent one that does not follow the design. eltype is supposed to specify the element type of the random variates, partype is the type of the parameters. By design, they are intended to be distinct and eltype is Float64 for continuous distributions and Int for discrete distributions by default.

Changing rand for Uniform as suggested in this PR is not a fix since it just introduces another inconsistent distribution.

There are clear ideas for a better design and a push for making these changes.

@simsurace
Copy link
Contributor

What is the issue/PR to track/contribute to? Sorry @f-ij for sending you in the wrong direction. I was under the false impression that this would be uncontroversial.

@devmotion
Copy link
Member

There are two different aspects: The len precomputation for repeated calls of rand should be addressed separately from the general rand interface design question.

IMO the most natural way to address the first one is to create a Distributions.Sampler type for Uniform that can be used for repeated sampling, since the same approach is also used in Random: If you want to perform repeated sampling with the same object, you use

spl = sampler(dist)
x = rand(spl)
for _ in 1:1_000_000
    x += rand(spl)
end

instead of

x = rand(dist)
for _ in 1:1_000_000
    x += rand(dist)
end

(FYI these samplers are already used internally when you create an array of samples; if there exists no sampler for a distribution, sampler(dist) just falls back to dist, ie. no special pre-computations are performed.) AFAIK there exists no issue or other PR for this possible performance improvement yet.

The design question is discussed in multiple issues and even PRs (#1433), but it's a bit scattered since multiple duplicate issues have been opened and discussions started from scratch or recircled to already existing explanations.

@simsurace
Copy link
Contributor

I did not know about sampler yet, but that's a very good point.

@f-ij
Copy link
Author

f-ij commented Nov 7, 2023

What is the issue/PR to track/contribute to? Sorry @f-ij for sending you in the wrong direction. I was under the false impression that this would be uncontroversial.

Not a problem at all, it was not much work and I gained me some insight out of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants