Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fitting does not respect type parameters #1544

Open
gdalle opened this issue May 7, 2022 · 11 comments · May be fixed by #1823
Open

Fitting does not respect type parameters #1544

gdalle opened this issue May 7, 2022 · 11 comments · May be fixed by #1823

Comments

@gdalle
Copy link

gdalle commented May 7, 2022

Hi!
In my current project, I would like to fit distributions in reduced precision to speed up my code. I was therefore wondering if the following behavior is intentional:

julia> using Distributions

julia> fit(Normal{Float32}, [1, 2, 3])
Normal{Float64}=2.0, σ=0.816496580927726)
@devmotion
Copy link
Member

It's known, the type only depends on the data. If you use Float32[1, 2, 3] the type should be Normal{Float32}.

@devmotion
Copy link
Member

Of course, the question is still if this should be changed. I think it would be reasonable.

@gdalle
Copy link
Author

gdalle commented May 7, 2022

Thanks for your answer. Indeed, the behavior you describe seems logical too, since sufficient statistics are computed from the sample.
But I agree that both options can make sense, and it is hard to decide which one should prevail

@gdalle
Copy link
Author

gdalle commented May 7, 2022

Hum, after testing it doesn't quite go the way you said it would 😅

julia> using Distributions

julia> x, w = Float32[1, 2, 3], Float32[4, 5, 6];

julia> fit(Normal{Float32}, x)
Normal{Float64}=2.0, σ=0.816496580927726)

julia> fit(Normal{Float32}, x, w)
ERROR: suffstats is not implemented for (Normal{Float32}, Vector{Float32}, Vector{Float32}).
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] suffstats(::Type{Normal{Float32}}, ::Vector{Float32}, ::Vector{Float32})
   @ Distributions ~/.julia/packages/Distributions/O4ZJg/src/genericfit.jl:5
 [3] fit_mle(dt::Type{Normal{Float32}}, x::Vector{Float32}, w::Vector{Float32})
   @ Distributions ~/.julia/packages/Distributions/O4ZJg/src/genericfit.jl:28
 [4] fit(::Type{Normal{Float32}}, ::Vector{Float32}, ::Vector{Float32})
   @ Distributions ~/.julia/packages/Distributions/O4ZJg/src/genericfit.jl:34
 [5] top-level scope
   @ REPL[62]:1

@devmotion
Copy link
Member

Well, it seems there are some hardcoded Float64 arguments, e.g. in https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/normal.jl#L238. It would be easy to change these type restrictions at least in a first step.

@gdalle
Copy link
Author

gdalle commented May 7, 2022

I think one would also need to make sufficient stats parametric, for instance here:

struct NormalStats <: SufficientStats
s::Float64 # (weighted) sum of x
m::Float64 # (weighted) mean of x
s2::Float64 # (weighted) sum of (x - μ)^2
tw::Float64 # total sample weight
end

@matbesancon
Copy link
Member

yeah there are still a lot of F64 hardcoded sprinkled in the package :/

@timholy
Copy link
Contributor

timholy commented Jan 13, 2024

In case it helps resolve difficult design decisions,

help?> fit_mle
search: fit_mle

  fit_mle(D, x)

  Fit a distribution of type D to a given data set x.

From this, there's no doubt that the type parameters of D should take precedence over the eltype of the data. Either the docs need changing or the behavior should match the docs.

@gdalle
Copy link
Author

gdalle commented Jan 13, 2024

I had given it a first shot months ago in this PR:

#1560

Looking back at it, my main issue is that we need to accommodate both

  • fit(Normal, x), in which case the eltype of the data is the right choice
  • fit(Normal{Float64}, x), in which case the eltype of the distribution is the right choice (maybe?)

timholy added a commit to timholy/Distributions.jl that referenced this issue Jan 13, 2024
This redesigns the constructors of `MvNormal` to:
- check argument consistency in the inner constructor
- support "coercion," for example `MvNormal{Float32}(μ, Σ)`
  will enforce that the result has eltype `Float32`
- cascade coercion among the multiple type parameters

The benefits of this design are subtle but significant, as it
allows the user to specify intent, allows the constructors to
accept input arguments that are not (yet) consistent with
the desired output types (e.g., `MvNormal{Float64}(Any[1,2], I)`),
and makes it easy for developers to "propagate" consistent
types across the code base (e.g., fixes JuliaStats#1544).
@timholy timholy linked a pull request Jan 13, 2024 that will close this issue
@timholy
Copy link
Contributor

timholy commented Jan 13, 2024

I think something like #1823 would make it an easy fix, as the final line of any code that builds distributions just becomes D(args...) and if the user specified the eltype of D then it will be respected. I.e.,

function fit(::Type{D}, x) where D<:Normal
    μ = mean(x)
    σ = std(x)
    return D(μ, σ)
end

Just Works™.

@gdalle
Copy link
Author

gdalle commented Jan 13, 2024

I think your attempt and mine fix different parts of the problem?
Even once we have internal coherence in the types and coercion thanks to #1823, we will still need to ensure that sufficient statistics are also type-generic, as well as all the code in between. We also have to enforce the conversion at the end of fit, which #1560 does

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants