Extend `gradlogpdf` to MixtureModels #1827

rmsrosa · 2024-01-21T01:19:40Z

This extends gradlogpdf to MixtureModels, both univariate and multivariate, at least for those whose components have gradlogpdf implemented.

I haven't implemented the inplace and the component-wise methods yet, but this should be a good start.

I should say I am having trouble with the docs, thought. I have added docstrings and added the methods to the docs src, but they are not showing up. I am not sure what to do.

codecov-commenter · 2024-01-21T01:36:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (c1705a3) 85.94% compared to head (53e0977) 86.16%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1827      +/-   ##
==========================================
+ Coverage   85.94%   86.16%   +0.22%     
==========================================
  Files         144      144              
  Lines        8658     8704      +46     
==========================================
+ Hits         7441     7500      +59     
+ Misses       1217     1204      -13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sethaxen · 2024-01-21T09:22:11Z

Relates #1788

rmsrosa · 2024-01-21T10:05:29Z

Relates #1788

Oh, I missed that. Sorry.

I thought about using pweights, but I was afraid it would allocate. I should benchmark both to make sure.

src/common.jl

src/mixtures/mixturemodel.jl

rmsrosa · 2024-01-23T15:35:18Z

But I feel this new version is not optimized. It essentially computes the PDF twice. I should unroll the loop.

src/mixtures/mixturemodel.jl

rmsrosa · 2024-01-27T12:58:52Z

But I am not sure about type stability. Even pdf itself is not always type stable, e.g.

julia> @code_warntype pdf(MixtureModel([Normal(1, 2), Beta(2, 3)], [6/10, 4/10]), 1.0)
MethodInstance for Distributions.pdf(::MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, ::Float64)
  from pdf(d::UnivariateMixture, x::Real) @ Distributions ~/.julia/packages/Distributions/UaWBm/src/mixtures/mixturemodel.jl:362
Arguments
  #self#::Core.Const(Distributions.pdf)
  d::MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}
  x::Float64
Body::Any
1 ─ %1 = Distributions._mixpdf1(d, x)::Any
└──      return %1

with

@code_warntype Distributions._mixpdf1(MixtureModel([Normal(1, 2), Beta(2, 3)], [6/10, 4/10]), 1.0)
MethodInstance for Distributions._mixpdf1(::MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, ::Float64)
  from _mixpdf1(d::AbstractMixtureModel, x) @ Distributions ~/.julia/packages/Distributions/UaWBm/src/mixtures/mixturemodel.jl:286
Arguments
  #self#::Core.Const(Distributions._mixpdf1)
  d::MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}
  x::Float64
Locals
  #607::Distributions.var"#607#609"
  #606::Distributions.var"#606#608"{MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, Float64}
  p::Vector{Float64}
Body::Any
1 ─       (p = Distributions.probs(d))
│   %2  = Distributions.:(var"#606#608")::Core.Const(Distributions.var"#606#608")
│   %3  = Core.typeof(d)::Core.Const(MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}})
│   %4  = Core.typeof(x)::Core.Const(Float64)
│   %5  = Core.apply_type(%2, %3, %4)::Core.Const(Distributions.var"#606#608"{MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, Float64})
│         (#606 = %new(%5, d, x))
│   %7  = #606::Distributions.var"#606#608"{MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, Float64}
│         (#607 = %new(Distributions.:(var"#607#609")))
│   %9  = #607::Core.Const(Distributions.var"#607#609"())
│   %10 = Distributions.enumerate(p)::Base.Iterators.Enumerate{Vector{Float64}}
│   %11 = Base.Filter(%9, %10)::Base.Iterators.Filter{Distributions.var"#607#609", Base.Iterators.Enumerate{Vector{Float64}}}
│   %12 = Base.Generator(%7, %11)::Base.Generator{Base.Iterators.Filter{Distributions.var"#607#609", Base.Iterators.Enumerate{Vector{Float64}}}, Distributions.var"#606#608"{MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, Float64}}
│   %13 = Distributions.sum(%12)::Any
└──       return %13

rmsrosa · 2024-01-28T13:24:38Z

Hmm, I thought I had added another comment on benchmark, but maybe the connection broke and it didn't go through. Here it goes again.

I also implemented another version with the for loop detaching the indices of both prior and components.

function gradlogpdf(d::UnivariateMixture, x::Real)
    ps = probs(d)
    cs = components(d)
    ps1 = first(ps)
    cs1 = first(cs)
    pdfx1 = pdf(cs1, x)
    pdfx = ps1 * pdfx1
    glp = pdfx * gradlogpdf(cs1, x)
    if iszero(ps1) || iszero(pdfx)
        glp = zero(glp)
    end
    @inbounds for (psi, csi) in Iterators.drop(zip(ps, cs), 1)
        if !iszero(psi)
            pdfxi = pdf(csi, x)
            if !iszero(pdfxi)
                pipdfxi = psi * pdfxi
                pdfx += pipdfxi
                glp += pipdfxi * gradlogpdf(csi, x)
            end
        end
    end
    if !iszero(pdfx) # else glp is already zero
        glp /= pdfx
    end 
    return glp
end

When mixing distributions of the same type, gradlogpdf with the while loop is a tad faster, but, when mixing different types of distributions, then pdf and logpdf are type unstable, making gradlogpdf also type unstable and in this case gradlogpdf with the for loop is faster. (Same thing for the multivariate versions of the functions). (I used @btime and @benchmark to make sure.) Here are some examples.

d1 = MixtureModel([Normal(1, 2), Normal(2, 3), Normal(1.5), Normal(1.0, 0.2)], [0.3, 0.2, 0.3, 0.2])

d2 = MixtureModel([Beta(1, 2), Beta(2, 3), Beta(4, 2)], [3/10, 4/10, 3/10])

d3 = MixtureModel([Normal(1, 2), Beta(2, 3), Exponential(3/2)], [3/10, 4/10, 3/10]) # type unstable

n = 10
m = 20
d4 = MixtureModel([MvNormal(rand(n), rand(n, n) |> A -> (A + A')^2) for _ in 1:m], rand(m) |> w -> w / sum(w))

d5 = MixtureModel([MvNormal([1.0, 2.0], [0.4 0.2; 0.2 0.5]), MvTDist(5., [1., 2.], [1. 0.1; 0.1 1.])], [0.4, 0.6]) # type unstable

We get

[ Info: d1: gradlogpdf with for loop
  75.841 ns (0 allocations: 0 bytes)
[ Info: d1: gradlogpdf with while loop
  73.626 ns (0 allocations: 0 bytes)

[ Info: d2: gradlogpdf with for loop
  295.552 ns (0 allocations: 0 bytes)
[ Info: d2: gradlogpdf with while loop
  296.826 ns (0 allocations: 0 bytes)

[ Info: d3: gradlogpdf with for loop
  686.398 ns (19 allocations: 304 bytes)
[ Info: d3: gradlogpdf with while loop
  974.368 ns (21 allocations: 368 bytes)

[ Info: d4: gradlogpdf with for loop
  33.258 μs (136 allocations: 16.62 KiB)
[ Info: d4: gradlogpdf with while loop
  33.233 μs (136 allocations: 16.62 KiB)

[ Info: d5: gradlogpdf with for loop
  3.282 μs (26 allocations: 1.42 KiB)
[ Info: d5: gradlogpdf with while loop
  3.471 μs (28 allocations: 1.53 KiB)

Ok, I think this is it. I will leave it with the while loop for your review.

devmotion

The performance difference is a bit surprising since for loops are rewritten using the iterate anyway (you can see this e.g. by inspecting the output of @code_lowered).

My gut feeling is also that one cannot expect too much performance wise in the type unstable case anyway. In this use-case it might be better to work with tuples (which IIRC currently is not supported).

src/mixtures/mixturemodel.jl

devmotion · 2024-01-28T19:56:17Z

src/mixtures/mixturemodel.jl

+    if !iszero(pdfx) # else glp is already zero
+        glp /= pdfx
+    end 


Is it correct to return a gradlogpdf of zero if x is not in the support of the mixture distribution?

I wondered about that, but decided to follow the current behavior already implemented. For example:

julia> insupport(Beta(0.5, 0.5), -1) false julia> logpdf(Beta(0.5, 0.5), -1) -Inf julia> gradlogpdf(Beta(0.5, 0.5), -1) 0.0

I don't know. If it is constant -Inf, then the derivative is zero (except that (-Inf) - (-Inf) is not defined, but what matters is that the rate of change is zero...)

Most implementations return zero outside the support:
https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/beta.jl#L126
https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/chi.jl#L99
https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/chisq.jl#L94
https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/exponential.jl#L97

and so on (Frechet, Gamma, ...), but some (Arcsin and Kumaraswamy) seem to just extend what happens at the edges of the support:
https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/arcsine.jl#L97
https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/kumaraswamy.jl#L91

while LogitNormal seems to be the only one that yields NaN:
https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/laplace.jl#L101

Meanwhile, Laplace seems to be the only one to error at a singular point:
https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/laplace.jl#L101

while Uniform, for example, is zero everywhere including the singular points.
https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/uniform.jl#L86

I think giving an error is a bit drastic, but yielding either zero or NaN (outside the support or at the singular points) seems reasonable to me.

Maybe we should make this a separate issue and hopefully standardize the behavior.

src/mixtures/mixturemodel.jl

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

rmsrosa · 2024-01-29T16:53:16Z

Concerning if iszero(psi) || iszero(pdfx), I meant it to be if iszero(psi) || iszero(pdfx1), but that is actually a delicate edge case.

I now think the case in which pdfx1 is zero should be handled implicitly by what the gradlogpdf of the component yields, because x might be in the exterior of the support or on the boundary, and in this case the behavior would depend on the component distribution itself, so I will change it to simply if iszero(psi).

However, if pdfx1 is zero and the Mixture is a single distribution (e.g. MixtureModel([Beta(0.5, 0.5), [1.0])) then this would end up begin different than gradlogpdf of the distribution itself.

I added some more tests for this edge case with single mixture that will fail. I marked the PR as draft so I can look into that more carefully.

rmsrosa added 6 commits January 20, 2024 12:59

gradlogpdf for mixtures

94cf6c0

make them work on array of multiple samples

e7dd8ef

use iszero instead of > 0

3930983

tests with a zero prior

65e3210

update docs

af7dde6

no docstring

6ed9476

devmotion reviewed Jan 21, 2024

View reviewed changes

src/common.jl Outdated Show resolved Hide resolved

src/mixtures/mixturemodel.jl Outdated Show resolved Hide resolved

rmsrosa added 2 commits January 23, 2024 12:51

remove implicit broadcasting of gradlogpdf

fe2f778

make gradlogpdf type stable... hopefully

56860a1

make gradlogpdf type stable

b50f5eb

devmotion reviewed Jan 26, 2024

View reviewed changes

src/mixtures/mixturemodel.jl Outdated Show resolved Hide resolved

use iterate

1d3c281

rmsrosa added 3 commits January 27, 2024 10:05

update docstring for gradlogpdf

7ccc7dc

increase coverage

2c42066

remove unnecessary test

c87d948

devmotion reviewed Jan 28, 2024

View reviewed changes

Apply suggestions from code review

53e0977

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

rmsrosa marked this pull request as draft January 29, 2024 11:08

add tests for single distribution "mixtures"

4d4af7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend `gradlogpdf` to MixtureModels #1827

Extend `gradlogpdf` to MixtureModels #1827

rmsrosa commented Jan 21, 2024

codecov-commenter commented Jan 21, 2024 •

edited

sethaxen commented Jan 21, 2024

rmsrosa commented Jan 21, 2024

rmsrosa commented Jan 23, 2024

rmsrosa commented Jan 27, 2024

rmsrosa commented Jan 28, 2024

devmotion left a comment

devmotion Jan 28, 2024

rmsrosa Jan 29, 2024 •

edited

rmsrosa Jan 29, 2024

rmsrosa Jan 29, 2024

rmsrosa commented Jan 29, 2024

Extend gradlogpdf to MixtureModels #1827

Are you sure you want to change the base?

Extend gradlogpdf to MixtureModels #1827

Conversation

rmsrosa commented Jan 21, 2024

codecov-commenter commented Jan 21, 2024 • edited

Codecov Report

sethaxen commented Jan 21, 2024

rmsrosa commented Jan 21, 2024

rmsrosa commented Jan 23, 2024

rmsrosa commented Jan 27, 2024

rmsrosa commented Jan 28, 2024

devmotion left a comment

Choose a reason for hiding this comment

devmotion Jan 28, 2024

Choose a reason for hiding this comment

rmsrosa Jan 29, 2024 • edited

Choose a reason for hiding this comment

rmsrosa Jan 29, 2024

Choose a reason for hiding this comment

rmsrosa Jan 29, 2024

Choose a reason for hiding this comment

rmsrosa commented Jan 29, 2024

Extend `gradlogpdf` to MixtureModels #1827

Extend `gradlogpdf` to MixtureModels #1827

codecov-commenter commented Jan 21, 2024 •

edited

rmsrosa Jan 29, 2024 •

edited