Native implementations for distrs #37

Nosferican · 2018-03-13T21:21:07Z

dmbates · 2018-03-14T18:11:05Z

In conversion from libRmath how should invalid parameter values be handled? The libRmath code returns NaN in most of these cases. Julia code could do the same or could throw an ArgumentError, which I think is the better choice.

Nosferican · 2018-03-14T18:12:14Z

I am team @assert cond "Condition violated: cond".

dmbates · 2018-03-14T18:19:58Z

For example, I would write

function poislogpdf(λ::T, x::T) where {T <: Real}
    λ < 0 && throw(ArgumentError("λ = $λ should be non-negative"))
    isinteger(x) && 0 ≤ x || return T(-Inf)
    iszero(λ) && return iszero(x) ? zero(T) : T(-Inf)
    xlogy(x, λ) - λ - lgamma(x + 1)
end

Is the change in behavior justified?

dmbates · 2018-03-14T18:26:43Z

@Nosferican I think of an assertion as stating a condition that must be true if the algorithm/code is functioning properly. Passing an out-of-bounds parameter value is, to me, pretty much the definition of an ArgumentError. It is not that a condition that should not occur has occurred. It is just an invalid value for the argument.

Also, a programmer may wish to catch the error and take appropriate action. I don't think you can catch an assertion.

nalimilan · 2018-03-14T18:36:37Z

DomainError would be even more appropriate for these cases.

dmbates · 2018-03-14T18:37:43Z

@nalimilan Yes, of course. Thank you.

Nosferican · 2018-03-14T18:46:55Z

function poispdf(λ::Real, k::Real)
    if λ < zero(λ)
        throw(DomainError("Invalid support: λ = $λ should be non-negative real number."))
    elseif !isinteger(k)
        return zero(λ)
    elseif iszero(λ)
        if iszero(k)
            return one(λ)
        else
            return zero(λ)
        end
    else
        return λ^k * exp(-λ) / factorial(k) # Replaced with refined implementation.
    end
end

However, doing a quick run, I noticed DomainError does not print the informative message :( at least in Julia 0.6. Is that the desired behavior? I believe that will change in Julia 0.7 through Julia PR 22751

dmbates · 2018-03-14T19:05:05Z

@Nosferican I disagree with throwing a DomainError for non-integer x. To me there is not a problem with returning 0 (or T(-Inf) for poislogpdf) for all x except the non-negative integers.

I guess it is a matter of how closely you want to follow measure-theoretic probability and the idea that a probability mass function is a density with respect to counting measure. Because we tend to promote types I think of the pdf functions as being defined for all real x.

By the way, in general we should evaluate probabilities on the logarithmic scale first then, if desired, transform back to the probability or probability density scale. And you definitely don't want to call factorial(x) because it overflows for even moderate values of x

julia> factorial(21)
ERROR: OverflowError()
Stacktrace:
 [1] factorial_lookup(::Int64, ::Array{Int64,1}, ::Int64) at ./combinatorics.jl:31
 [2] factorial(::Int64) at ./combinatorics.jl:39

I think the general approach should be to use promote to collect arguments as the same type then evaluate logpdf then, if desired, exponentiate to pdf. Like

poispdf(λ::Real, x::Real) = exp(poislogpdf(λ, x))

function poislogpdf(λ::T, x::T) where {T <: Real}
    λ < 0 && throw(DomainError("λ = $λ must be non-negative"))
    isinteger(x) && 0 ≤ x || return T(-Inf)
    iszero(λ) && return iszero(x) ? zero(T) : T(-Inf)
    xlogy(x, λ) - λ - lgamma(x + 1)
end    

poislogpdf(λ::Number, x::Number) = poislogpdf(promote(float(λ), x)...)

Nosferican · 2018-03-14T19:10:47Z

I agree. I was thinking of dispatching on whether the arguments were in the support or not à la what I believe Distributions.jl does (this could be handled with obtaining a DomainError which leads to zero). The factorial was just a dummy for sake of completion, the idea was to determine how to handle the invalid arguments. If promote leads to more efficient implementations and is feasible I am all for it. As for first computing the logpdf, again if it leads to better implementations I all for it as well.

dmbates · 2018-03-14T19:14:31Z

@Nosferican I see that you are correct about DomainError not taking an explanatory string. In that case I think it may be better to use ArgumentErrror. @nalimilan what do you think? The options are

julia> poislogpdf(-1.f0, 1.f0)
ERROR: DomainError:
Stacktrace:
 [1] poislogpdf(::Float32, ::Float32) at /home/bates/.julia/v0.6/StatsFuns/src/distrs/pois.jl:19

and

julia> poislogpdf(-1.f0, 1.f0)
ERROR: ArgumentError: λ = -1.0 must be non-negative
Stacktrace:
 [1] poislogpdf(::Float32, ::Float32) at /home/bates/.julia/v0.6/StatsFuns/src/distrs/pois.jl:19

nalimilan · 2018-03-14T20:23:05Z

That's fixed on 0.7 thanks to JuliaLang/julia#22751. But until then ArgumentError is reasonable too.

Ellipse0934 · 2018-03-16T14:00:54Z

I'd also like to bring up the discussion on unit tests. Current tests will not hold up. We could at first compare our implementation to the Rmath one but eventually I think we need to write our own tests to get an understanding of the function accuracy and precision and inform users of the same.

jonathanBieler · 2018-03-16T15:24:44Z

Another thing to keep in mind when typing the function is that it would be nice for them to be compatible with ForwardDiff and such, which I think means T <: Real.

This seems too restrictive for example:

https://github.com/JuliaStats/StatsFuns.jl/pull/33/files#diff-eb7a8b837bc7832e705af52fca59cc7bR25

dmbates · 2018-03-16T15:47:38Z

@Ellipse0934 Keep in mind that the code in src/distrs/* is temporary. At least I believe this is the case - @simonbyrne or @andreasnoack or @ararslan are more authoritative voices on this.

I think that once a complete set of libRmath-free "d-p-q-r" functions for a distribution is available, the implementation should be moved to the Distributions package the the distr*pdf and distr*cdf functions in this package deprecated. See #42 for example.

Thus extensive unit tests on coverage and accuracy should be part of the Distributions tests. Another type of test, checking the native Julia implementations versus libRmath implementations could, I think, go into a separate package that requires Rmath and Distributions.

Testing a native implementation versus a libRmath implementation in StatsFuns is awkward because the libRmath implementation has a rather general method signature. They tend to be a bit greedy, with unintended consequences. For example, one of the best ways of checking accuracy on, say, a Float64 method is to use a generic method with BigFloat then convert the result to Float64. Except this doesn't work in many cases because the libRmath version grabs the BigFloat argument and converts it to Float64 before you even get started. The tests/generic.jl file uses ForwardDiff.Dual to get around this but that approach has its own problems. For one thing, it introduces a spurious test dependency on the ForwardDiff package.

bdeonovic · 2018-05-14T15:15:22Z

Any updates on this?

Currently the fallback implementation of beta pdf has different behavior than Rmath (returning NaN instead of -Inf for out of domain values) which causes frustrations in some of my code.

There are commits in the promotions branch which address this and fix it, but I'm worried its just going to sit in this branch.

andreasnoack · 2022-02-18T12:33:33Z

With #113 we are now pretty far here. What remains is mostly the [x]invlog(c)cdf definitions. We'd probably need versions of the inverse log regularized incomplete gamma and beta functions. Maybe they belong in SpecialFunctions.

simsurace · 2023-09-07T14:55:08Z

Could someone post an updated list?
EDIT: I'll do it

simsurace · 2023-09-07T15:07:17Z

dmbates mentioned this issue Mar 16, 2018

Import implementations of norm*pdf and norm*cdf from StatsFuns. JuliaStats/Distributions.jl#708

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native implementations for distrs #37

Native implementations for distrs #37

Nosferican commented Mar 13, 2018 •

edited

dmbates commented Mar 14, 2018

Nosferican commented Mar 14, 2018

dmbates commented Mar 14, 2018

dmbates commented Mar 14, 2018 •

edited

nalimilan commented Mar 14, 2018

dmbates commented Mar 14, 2018

Nosferican commented Mar 14, 2018 •

edited

dmbates commented Mar 14, 2018

Nosferican commented Mar 14, 2018 •

edited

dmbates commented Mar 14, 2018

nalimilan commented Mar 14, 2018

Ellipse0934 commented Mar 16, 2018

jonathanBieler commented Mar 16, 2018

dmbates commented Mar 16, 2018

bdeonovic commented May 14, 2018

andreasnoack commented Feb 18, 2022

simsurace commented Sep 7, 2023 •

edited

simsurace commented Sep 7, 2023

Native implementations for distrs #37

Native implementations for distrs #37

Comments

Nosferican commented Mar 13, 2018 • edited

dmbates commented Mar 14, 2018

Nosferican commented Mar 14, 2018

dmbates commented Mar 14, 2018

dmbates commented Mar 14, 2018 • edited

nalimilan commented Mar 14, 2018

dmbates commented Mar 14, 2018

Nosferican commented Mar 14, 2018 • edited

dmbates commented Mar 14, 2018

Nosferican commented Mar 14, 2018 • edited

dmbates commented Mar 14, 2018

nalimilan commented Mar 14, 2018

Ellipse0934 commented Mar 16, 2018

jonathanBieler commented Mar 16, 2018

dmbates commented Mar 16, 2018

bdeonovic commented May 14, 2018

andreasnoack commented Feb 18, 2022

simsurace commented Sep 7, 2023 • edited

simsurace commented Sep 7, 2023

Nosferican commented Mar 13, 2018 •

edited

dmbates commented Mar 14, 2018 •

edited

Nosferican commented Mar 14, 2018 •

edited

Nosferican commented Mar 14, 2018 •

edited

simsurace commented Sep 7, 2023 •

edited