Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend gradlogpdf to MixtureModels #1827

Draft
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

rmsrosa
Copy link

@rmsrosa rmsrosa commented Jan 21, 2024

This extends gradlogpdf to MixtureModels, both univariate and multivariate, at least for those whose components have gradlogpdf implemented.

I haven't implemented the inplace and the component-wise methods yet, but this should be a good start.

I should say I am having trouble with the docs, thought. I have added docstrings and added the methods to the docs src, but they are not showing up. I am not sure what to do.

@codecov-commenter
Copy link

codecov-commenter commented Jan 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (c1705a3) 85.94% compared to head (53e0977) 86.16%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1827      +/-   ##
==========================================
+ Coverage   85.94%   86.16%   +0.22%     
==========================================
  Files         144      144              
  Lines        8658     8704      +46     
==========================================
+ Hits         7441     7500      +59     
+ Misses       1217     1204      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sethaxen
Copy link
Contributor

Relates #1788

@rmsrosa
Copy link
Author

rmsrosa commented Jan 21, 2024

Relates #1788

Oh, I missed that. Sorry.

I thought about using pweights, but I was afraid it would allocate. I should benchmark both to make sure.

src/common.jl Outdated Show resolved Hide resolved
src/mixtures/mixturemodel.jl Outdated Show resolved Hide resolved
@rmsrosa
Copy link
Author

rmsrosa commented Jan 23, 2024

But I feel this new version is not optimized. It essentially computes the PDF twice. I should unroll the loop.

@rmsrosa
Copy link
Author

rmsrosa commented Jan 27, 2024

But I am not sure about type stability. Even pdf itself is not always type stable, e.g.

julia> @code_warntype pdf(MixtureModel([Normal(1, 2), Beta(2, 3)], [6/10, 4/10]), 1.0)
MethodInstance for Distributions.pdf(::MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, ::Float64)
  from pdf(d::UnivariateMixture, x::Real) @ Distributions ~/.julia/packages/Distributions/UaWBm/src/mixtures/mixturemodel.jl:362
Arguments
  #self#::Core.Const(Distributions.pdf)
  d::MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}
  x::Float64
Body::Any
1%1 = Distributions._mixpdf1(d, x)::Any
└──      return %1

with

@code_warntype Distributions._mixpdf1(MixtureModel([Normal(1, 2), Beta(2, 3)], [6/10, 4/10]), 1.0)
MethodInstance for Distributions._mixpdf1(::MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, ::Float64)
  from _mixpdf1(d::AbstractMixtureModel, x) @ Distributions ~/.julia/packages/Distributions/UaWBm/src/mixtures/mixturemodel.jl:286
Arguments
  #self#::Core.Const(Distributions._mixpdf1)
  d::MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}
  x::Float64
Locals
  #607::Distributions.var"#607#609"
  #606::Distributions.var"#606#608"{MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, Float64}
  p::Vector{Float64}
Body::Any
1 ─       (p = Distributions.probs(d))
│   %2  = Distributions.:(var"#606#608")::Core.Const(Distributions.var"#606#608")
│   %3  = Core.typeof(d)::Core.Const(MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}})
│   %4  = Core.typeof(x)::Core.Const(Float64)
│   %5  = Core.apply_type(%2, %3, %4)::Core.Const(Distributions.var"#606#608"{MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, Float64})
│         (#606 = %new(%5, d, x))%7  = #606::Distributions.var"#606#608"{MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, Float64}
│         (#607 = %new(Distributions.:(var"#607#609")))%9  = #607::Core.Const(Distributions.var"#607#609"())%10 = Distributions.enumerate(p)::Base.Iterators.Enumerate{Vector{Float64}}%11 = Base.Filter(%9, %10)::Base.Iterators.Filter{Distributions.var"#607#609", Base.Iterators.Enumerate{Vector{Float64}}}%12 = Base.Generator(%7, %11)::Base.Generator{Base.Iterators.Filter{Distributions.var"#607#609", Base.Iterators.Enumerate{Vector{Float64}}}, Distributions.var"#606#608"{MixtureModel{Univariate, Continuous, Distribution{Univariate, Continuous}, Categorical{Float64, Vector{Float64}}}, Float64}}
│   %13 = Distributions.sum(%12)::Any
└──       return %13

@rmsrosa
Copy link
Author

rmsrosa commented Jan 28, 2024

Hmm, I thought I had added another comment on benchmark, but maybe the connection broke and it didn't go through. Here it goes again.

I also implemented another version with the for loop detaching the indices of both prior and components.

function gradlogpdf(d::UnivariateMixture, x::Real)
    ps = probs(d)
    cs = components(d)
    ps1 = first(ps)
    cs1 = first(cs)
    pdfx1 = pdf(cs1, x)
    pdfx = ps1 * pdfx1
    glp = pdfx * gradlogpdf(cs1, x)
    if iszero(ps1) || iszero(pdfx)
        glp = zero(glp)
    end
    @inbounds for (psi, csi) in Iterators.drop(zip(ps, cs), 1)
        if !iszero(psi)
            pdfxi = pdf(csi, x)
            if !iszero(pdfxi)
                pipdfxi = psi * pdfxi
                pdfx += pipdfxi
                glp += pipdfxi * gradlogpdf(csi, x)
            end
        end
    end
    if !iszero(pdfx) # else glp is already zero
        glp /= pdfx
    end 
    return glp
end

When mixing distributions of the same type, gradlogpdf with the while loop is a tad faster, but, when mixing different types of distributions, then pdf and logpdf are type unstable, making gradlogpdf also type unstable and in this case gradlogpdf with the for loop is faster. (Same thing for the multivariate versions of the functions). (I used @btime and @benchmark to make sure.) Here are some examples.

d1 = MixtureModel([Normal(1, 2), Normal(2, 3), Normal(1.5), Normal(1.0, 0.2)], [0.3, 0.2, 0.3, 0.2])

d2 = MixtureModel([Beta(1, 2), Beta(2, 3), Beta(4, 2)], [3/10, 4/10, 3/10])

d3 = MixtureModel([Normal(1, 2), Beta(2, 3), Exponential(3/2)], [3/10, 4/10, 3/10]) # type unstable

n = 10
m = 20
d4 = MixtureModel([MvNormal(rand(n), rand(n, n) |> A -> (A + A')^2) for _ in 1:m], rand(m) |> w -> w / sum(w))

d5 = MixtureModel([MvNormal([1.0, 2.0], [0.4 0.2; 0.2 0.5]), MvTDist(5., [1., 2.], [1. 0.1; 0.1 1.])], [0.4, 0.6]) # type unstable

We get

[ Info: d1: gradlogpdf with for loop
  75.841 ns (0 allocations: 0 bytes)
[ Info: d1: gradlogpdf with while loop
  73.626 ns (0 allocations: 0 bytes)

[ Info: d2: gradlogpdf with for loop
  295.552 ns (0 allocations: 0 bytes)
[ Info: d2: gradlogpdf with while loop
  296.826 ns (0 allocations: 0 bytes)

[ Info: d3: gradlogpdf with for loop
  686.398 ns (19 allocations: 304 bytes)
[ Info: d3: gradlogpdf with while loop
  974.368 ns (21 allocations: 368 bytes)

[ Info: d4: gradlogpdf with for loop
  33.258 μs (136 allocations: 16.62 KiB)
[ Info: d4: gradlogpdf with while loop
  33.233 μs (136 allocations: 16.62 KiB)

[ Info: d5: gradlogpdf with for loop
  3.282 μs (26 allocations: 1.42 KiB)
[ Info: d5: gradlogpdf with while loop
  3.471 μs (28 allocations: 1.53 KiB)

Ok, I think this is it. I will leave it with the while loop for your review.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance difference is a bit surprising since for loops are rewritten using the iterate anyway (you can see this e.g. by inspecting the output of @code_lowered).

My gut feeling is also that one cannot expect too much performance wise in the type unstable case anyway. In this use-case it might be better to work with tuples (which IIRC currently is not supported).

src/mixtures/mixturemodel.jl Outdated Show resolved Hide resolved
src/mixtures/mixturemodel.jl Outdated Show resolved Hide resolved
src/mixtures/mixturemodel.jl Outdated Show resolved Hide resolved
src/mixtures/mixturemodel.jl Show resolved Hide resolved
Comment on lines +401 to +403
if !iszero(pdfx) # else glp is already zero
glp /= pdfx
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to return a gradlogpdf of zero if x is not in the support of the mixture distribution?

Copy link
Author

@rmsrosa rmsrosa Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that, but decided to follow the current behavior already implemented. For example:

julia> insupport(Beta(0.5, 0.5), -1)
false

julia> logpdf(Beta(0.5, 0.5), -1)
-Inf

julia> gradlogpdf(Beta(0.5, 0.5), -1)
0.0

I don't know. If it is constant -Inf, then the derivative is zero (except that (-Inf) - (-Inf) is not defined, but what matters is that the rate of change is zero...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make this a separate issue and hopefully standardize the behavior.

src/mixtures/mixturemodel.jl Show resolved Hide resolved
src/mixtures/mixturemodel.jl Outdated Show resolved Hide resolved
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
@rmsrosa rmsrosa marked this pull request as draft January 29, 2024 11:08
@rmsrosa
Copy link
Author

rmsrosa commented Jan 29, 2024

Concerning if iszero(psi) || iszero(pdfx), I meant it to be if iszero(psi) || iszero(pdfx1), but that is actually a delicate edge case.

I now think the case in which pdfx1 is zero should be handled implicitly by what the gradlogpdf of the component yields, because x might be in the exterior of the support or on the boundary, and in this case the behavior would depend on the component distribution itself, so I will change it to simply if iszero(psi).

However, if pdfx1 is zero and the Mixture is a single distribution (e.g. MixtureModel([Beta(0.5, 0.5), [1.0])) then this would end up begin different than gradlogpdf of the distribution itself.

I added some more tests for this edge case with single mixture that will fail. I marked the PR as draft so I can look into that more carefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants