Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native implementations for distrs #37

Open
1 of 14 tasks
Nosferican opened this issue Mar 13, 2018 · 18 comments
Open
1 of 14 tasks

Native implementations for distrs #37

Nosferican opened this issue Mar 13, 2018 · 18 comments

Comments

@Nosferican
Copy link

Nosferican commented Mar 13, 2018

Making this issue to track the progress in implementing the code for distrs natively

@dmbates
Copy link
Contributor

dmbates commented Mar 14, 2018

In conversion from libRmath how should invalid parameter values be handled? The libRmath code returns NaN in most of these cases. Julia code could do the same or could throw an ArgumentError, which I think is the better choice.

@Nosferican
Copy link
Author

I am team @assert cond "Condition violated: cond".

@dmbates
Copy link
Contributor

dmbates commented Mar 14, 2018

For example, I would write

function poislogpdf::T, x::T) where {T <: Real}
    λ < 0 && throw(ArgumentError("λ =  should be non-negative"))
    isinteger(x) && 0  x || return T(-Inf)
    iszero(λ) && return iszero(x) ? zero(T) : T(-Inf)
    xlogy(x, λ) - λ - lgamma(x + 1)
end

Is the change in behavior justified?

@dmbates
Copy link
Contributor

dmbates commented Mar 14, 2018

@Nosferican I think of an assertion as stating a condition that must be true if the algorithm/code is functioning properly. Passing an out-of-bounds parameter value is, to me, pretty much the definition of an ArgumentError. It is not that a condition that should not occur has occurred. It is just an invalid value for the argument.

Also, a programmer may wish to catch the error and take appropriate action. I don't think you can catch an assertion.

@nalimilan
Copy link
Member

DomainError would be even more appropriate for these cases.

@dmbates
Copy link
Contributor

dmbates commented Mar 14, 2018

@nalimilan Yes, of course. Thank you.

@Nosferican
Copy link
Author

Nosferican commented Mar 14, 2018

function poispdf(λ::Real, k::Real)
    if λ < zero(λ)
        throw(DomainError("Invalid support: λ = $λ should be non-negative real number."))
    elseif !isinteger(k)
        return zero(λ)
    elseif iszero(λ)
        if iszero(k)
            return one(λ)
        else
            return zero(λ)
        end
    else
        return λ^k * exp(-λ) / factorial(k) # Replaced with refined implementation.
    end
end

However, doing a quick run, I noticed DomainError does not print the informative message :( at least in Julia 0.6. Is that the desired behavior? I believe that will change in Julia 0.7 through Julia PR 22751

@dmbates
Copy link
Contributor

dmbates commented Mar 14, 2018

@Nosferican I disagree with throwing a DomainError for non-integer x. To me there is not a problem with returning 0 (or T(-Inf) for poislogpdf) for all x except the non-negative integers.

I guess it is a matter of how closely you want to follow measure-theoretic probability and the idea that a probability mass function is a density with respect to counting measure. Because we tend to promote types I think of the pdf functions as being defined for all real x.

By the way, in general we should evaluate probabilities on the logarithmic scale first then, if desired, transform back to the probability or probability density scale. And you definitely don't want to call factorial(x) because it overflows for even moderate values of x

julia> factorial(21)
ERROR: OverflowError()
Stacktrace:
 [1] factorial_lookup(::Int64, ::Array{Int64,1}, ::Int64) at ./combinatorics.jl:31
 [2] factorial(::Int64) at ./combinatorics.jl:39

I think the general approach should be to use promote to collect arguments as the same type then evaluate logpdf then, if desired, exponentiate to pdf. Like

poispdf::Real, x::Real) = exp(poislogpdf(λ, x))

function poislogpdf::T, x::T) where {T <: Real}
    λ < 0 && throw(DomainError("λ =  must be non-negative"))
    isinteger(x) && 0  x || return T(-Inf)
    iszero(λ) && return iszero(x) ? zero(T) : T(-Inf)
    xlogy(x, λ) - λ - lgamma(x + 1)
end    

poislogpdf::Number, x::Number) = poislogpdf(promote(float(λ), x)...)

@Nosferican
Copy link
Author

Nosferican commented Mar 14, 2018

I agree. I was thinking of dispatching on whether the arguments were in the support or not à la what I believe Distributions.jl does (this could be handled with obtaining a DomainError which leads to zero). The factorial was just a dummy for sake of completion, the idea was to determine how to handle the invalid arguments. If promote leads to more efficient implementations and is feasible I am all for it. As for first computing the logpdf, again if it leads to better implementations I all for it as well.

@dmbates
Copy link
Contributor

dmbates commented Mar 14, 2018

@Nosferican I see that you are correct about DomainError not taking an explanatory string. In that case I think it may be better to use ArgumentErrror. @nalimilan what do you think? The options are

julia> poislogpdf(-1.f0, 1.f0)
ERROR: DomainError:
Stacktrace:
 [1] poislogpdf(::Float32, ::Float32) at /home/bates/.julia/v0.6/StatsFuns/src/distrs/pois.jl:19

and

julia> poislogpdf(-1.f0, 1.f0)
ERROR: ArgumentError: λ = -1.0 must be non-negative
Stacktrace:
 [1] poislogpdf(::Float32, ::Float32) at /home/bates/.julia/v0.6/StatsFuns/src/distrs/pois.jl:19

@nalimilan
Copy link
Member

That's fixed on 0.7 thanks to JuliaLang/julia#22751. But until then ArgumentError is reasonable too.

@Ellipse0934
Copy link

I'd also like to bring up the discussion on unit tests. Current tests will not hold up. We could at first compare our implementation to the Rmath one but eventually I think we need to write our own tests to get an understanding of the function accuracy and precision and inform users of the same.

@jonathanBieler
Copy link

Another thing to keep in mind when typing the function is that it would be nice for them to be compatible with ForwardDiff and such, which I think means T <: Real.

This seems too restrictive for example:

https://github.com/JuliaStats/StatsFuns.jl/pull/33/files#diff-eb7a8b837bc7832e705af52fca59cc7bR25

@dmbates
Copy link
Contributor

dmbates commented Mar 16, 2018

@Ellipse0934 Keep in mind that the code in src/distrs/* is temporary. At least I believe this is the case - @simonbyrne or @andreasnoack or @ararslan are more authoritative voices on this.

I think that once a complete set of libRmath-free "d-p-q-r" functions for a distribution is available, the implementation should be moved to the Distributions package the the distr*pdf and distr*cdf functions in this package deprecated. See #42 for example.

Thus extensive unit tests on coverage and accuracy should be part of the Distributions tests. Another type of test, checking the native Julia implementations versus libRmath implementations could, I think, go into a separate package that requires Rmath and Distributions.

Testing a native implementation versus a libRmath implementation in StatsFuns is awkward because the libRmath implementation has a rather general method signature. They tend to be a bit greedy, with unintended consequences. For example, one of the best ways of checking accuracy on, say, a Float64 method is to use a generic method with BigFloat then convert the result to Float64. Except this doesn't work in many cases because the libRmath version grabs the BigFloat argument and converts it to Float64 before you even get started. The tests/generic.jl file uses ForwardDiff.Dual to get around this but that approach has its own problems. For one thing, it introduces a spurious test dependency on the ForwardDiff package.

@bdeonovic
Copy link

Any updates on this?

Currently the fallback implementation of beta pdf has different behavior than Rmath (returning NaN instead of -Inf for out of domain values) which causes frustrations in some of my code.

There are commits in the promotions branch which address this and fix it, but I'm worried its just going to sit in this branch.

@andreasnoack
Copy link
Member

With #113 we are now pretty far here. What remains is mostly the [x]invlog(c)cdf definitions. We'd probably need versions of the inverse log regularized incomplete gamma and beta functions. Maybe they belong in SpecialFunctions.

@simsurace
Copy link

simsurace commented Sep 7, 2023

Could someone post an updated list?
EDIT: I'll do it

@simsurace
Copy link

  • beta 8/10 (Didonato and Morris 1992)
  • binom 6/10
  • chisq
  • fdist
  • gamma 8/10
  • hyper 0/10
  • nbeta 0/10
  • nbinom 0/10
  • nchisq 0/10
  • nfdist 0/10
  • norm
  • ntdist 0/10
  • pois 6/10
  • srdist 0/8
  • tdist (via fdist, ndist)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants