Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Wrapped distribution wrapper #1724

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Conversation

sethaxen
Copy link
Contributor

@sethaxen sethaxen commented May 18, 2023

Adds a Wrapped distribution that wraps an original distribution around some interval. Optionally a parameter k is used to indicate that it should be multiply-wrapped, i.e. that the resulting wrapped distribution should have a k-fold periodicity in the interval.

Fixes #1716 and #1715

@sethaxen
Copy link
Contributor Author

This version of Wrapped allows for any range and periodicity to be specified as fields, but it comes with the trade-off that for distributions like Wrapped Cauchy, where we have algorithms to fit it well, we're not able to because the usual fit(::Type{<:Distribution}, x) interface doesn't allow specifying the upper and lower bounds or k. It would be nice if there was an alternate interface for wrapper distributions like these.

@codecov-commenter
Copy link

Codecov Report

Patch coverage has no change and project coverage change: -5.14 ⚠️

Comparison is base (ef42afb) 85.89% compared to head (3da8e84) 80.76%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1724      +/-   ##
==========================================
- Coverage   85.89%   80.76%   -5.14%     
==========================================
  Files         139      144       +5     
  Lines        8389     7122    -1267     
==========================================
- Hits         7206     5752    -1454     
- Misses       1183     1370     +187     
Impacted Files Coverage Δ
src/Distributions.jl 100.00% <ø> (ø)
src/wrapped.jl 0.00% <0.00%> (ø)
src/wrapped/cauchy.jl 0.00% <0.00%> (ø)
src/wrapped/exponential.jl 0.00% <0.00%> (ø)
src/wrapped/normal.jl 0.00% <0.00%> (ø)

... and 131 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@sethaxen
Copy link
Contributor Author

Also #1665 adds a WrappedCauchy distribution. This implementation is a little more general because it also allows for k-wrapping and allows other intervals than those of length 2pi to be used.

@sethaxen
Copy link
Contributor Author

Before I spend more time polishing this and adding a test suite, a few questions:

  • Is the generalization from intervals of length 2π to arbitrary intervals welcome?
  • Is the generalization to build distributions with k-fold symmetry welcome?
  • For the Wrapped Normal, Wrapped Cauchy, and Wrapped Exponential, is this approach of overloading the functions for Wrapped preferred, or is it preferred to defined special named distributions as in Wrapped cauchy distribution #1665?
  • How to define fit methods? (see Add Wrapped distribution wrapper #1724 (comment))

@devmotion @ararslan

@ararslan
Copy link
Member

I should preface this by saying that I know almost nothing about wrapped distributions, so any opinions expressed here are not strongly held and may be uninformed.

Is the generalization from intervals of length 2π to arbitrary intervals welcome?

I don't see why not. ¯\_(ツ)_/¯ I assume that's something that's not uncommon in practice and/or would be difficult to achieve with a simple data transformation or similar?

Is the generalization to build distributions with k-fold symmetry welcome?

I don't see why not. ¯\_(ツ)_/¯ Same general question though.

For the Wrapped Normal, Wrapped Cauchy, and Wrapped Exponential, is this approach of overloading the functions for Wrapped preferred, or is it preferred to defined special named distributions as in #1665?

Defining methods for the Wrapped wrapper has prior art with e.g. Truncated. In fact, there used to be a TruncatedNormal that was deprecated in favor of Truncated{Normal}. So I think your approach here is more consistent and extensible.

How to define fit methods?

Not all distributions have fit (or rather, fit_mle) methods, notably including Truncated-wrapped distributions, so I personally think it would be okay to punt on a decision for now and add it at a later time after some more extended design discussion.

This version of Wrapped allows for any range and periodicity to be specified as fields, but it comes with the trade-off that for distributions like Wrapped Cauchy, where we have algorithms to fit it well, we're not able to because the usual fit(::Type{<:Distribution}, x) interface doesn't allow specifying the upper and lower bounds or k.

Could k be made a type parameter, similar to the dimensionality parameter for Array? Then the fit method could be something like fit(Wrapped{Cauchy,1}, x). That doesn't solve the bounds though.

l::T,
u::T;
tol::Real=sqrt(eps(float(T))),
characteristic::Bool=_wrapped_use_characteristic(d, l, u, tol),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an odd thing to store in a field and expose to the user. Naively, it seems like an implementation detail that an average user would not need to necessarily care about or modify. I would expect that whether the characteristic function is used would be up to methods defined for particular Ds in Wrapped{D}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this design seems non-ideal. But this field isn't so much for the user as it is for a developer who has some some analysis of Wrapped{D} for some specific D. In short, the two series representations of pdf have different convergence properties depending on D and its parameters. The one that sums the shifted pdf, etc of D converges quickly when the tails of the distribution quickly decay. The one using the characteristic function decays quickly when D is not very peaked. These conditions are often opposites.

Ideally, this would be automatically chosen for every distribution based on its parameters, but this requires someone to sit down and work out bounds for the number of terms needed for each series representation. Having a field allows this to be done at construction time instead of every time pdf is called. See e.g. Wrapped{Normal}, where we use known bounds to switch between them.

For cases where bounds are not known but the user knows one converges faster than the other for their specific D and its parameters, the field can be useful, but this is not the primary use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense, thanks. Would it be reasonable then, do you think, to not document the field and remove it altogether once someone has gone through to work out the numbers of terms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the question. There is no one number of terms. For a given wrapped distribution dist with values of params, one or both of the series will have fewer terms. But if you change any of the params values or change the dist, then the other series might now have fewer terms. The difference can be quite extreme. For wrapped normal, for example, if σ=1e+3, the series using cf needs only 1 term, while the series using pdf needs >2,000 terms. But for σ=1e-3, the series using cf needs >6,000 terms, while the one using pdf needs 1 term. So we need to choose whether to use the pdf series or cf series. Options:

  • always use one or the other. This is bad for the reasons noted above
  • store a field like we do here. The user can select which strategy they want to use, or a developer can overloaded wrapped(::SomeDistribution, ...) if they have a heuristic for choosing the strategy
  • Instead of storing a field, the strategy coul be picked at every evaluation of pdf, cdf, etc. Likely wasteful
  • Instead of choosing, every function evaluates both series in parallel until one of them converges. Potentially runs into issues when cf is not implemented, and does twice as much works as might be necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the question. There is no one number of terms.

Sorry, I was referring to what you had said in the preceding message:

this requires someone to sit down and work out bounds for the number of terms needed for each series representation.

_wrapped_use_characteristic(d::UnivariateDistribution, l, u, tol) = false

minimum(d::Wrapped) = d.lower
maximum(d::Wrapped) = d.upper - eps(float(d.upper))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Wrapped{D,S,Int}, this will produce different types for minimum and maximum, which seems weird to me. In other cases we use the limit, e.g. when it's unbounded, which I think (again naively) would be acceptable here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a wrapped distribution, d.lower and d.upper are identified as the same point, so for a discrete distribution, including them both will always give the wrong answer if d.upper is in the support of the wrapped distribution. Ideally maximum(d) would return the largest point strictly less than d.upper, but AFAICT there is no function for this in the Distributions API.

See for example the wrapped Poisson example in #1716 (comment). Because 0 and 15 are the same point, the PMF sums to greater than 1 because maximum(d) returned d.upper.

A compromise could be returning d.upper for continuous distributions and this result for discrete ones.

Copy link
Member

@ararslan ararslan May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally maximum(d) would return the largest point strictly less than d.upper

Distributions defines the support of a DiscreteUnivariateDistribution to be round(Int, minimum(d)):round(Int, maximum(d)), which means that it doesn't support supports (🙃) with non-unit steps. That implies that the largest point that's strictly less than d.upper is ceil(Int, d.upper) - 1. Would that work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributions defines the support of a DiscreteUnivariateDistribution to be round(Int, minimum(d)):round(Int, maximum(d)), which means that it doesn't support supports (upside_down_face) with non-unit steps.

Oh really? But this doesn't seem to be consistently enforced, e.g.

julia> d = Binomial(10, 0.5) * 1//2;

julia> support(d)
0//1:1//2:5//1

julia> d = DiscreteNonParametric(randn(5), normalize(rand(5), 1));

julia> support(d)
5-element Vector{Float64}:
 -0.9108476704164078
 -0.5194957770100691
 -0.024410217492521903
  0.15370458561285605
  1.338024968131234

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, amazing. 🤦 I was going based on the generic definition for DiscreteUnivariateDistribution, I had forgotten about LocationScale and I don't think I even knew DiscreteNonParametric existed. To be fair, I did say I was uninformed!

@sethaxen
Copy link
Contributor Author

Is the generalization from intervals of length 2π to arbitrary intervals welcome?

I don't see why not. ¯_(ツ)_/¯ I assume that's something that's not uncommon in practice and/or would be difficult to achieve with a simple data transformation or similar?

It's certainly useful. e.g. one might want to use [0, 2π) or [-π, π), or one might want to use degrees or days of the year. To support discrete distributions like wrapped Poisson it becomes necessary.

Is the generalization to build distributions with k-fold symmetry welcome?

I don't see why not. ¯_(ツ)_/¯ Same general question though.

I did some searching, and the only mention I can find of k-times wrapping is in Directional Statistics by Mardia and Jupp, where they give no references for papers that use it. I suspect the most likely version to use besides k=1 is k=2, which turns any circular distribution into an axial one.

Could k be made a type parameter, similar to the dimensionality parameter for Array? Then the fit method could be something like fit(Wrapped{Cauchy,1}, x). That doesn't solve the bounds though.

Yes I think this is a reasonable solution. For bounds, I propose the fallback wrapped(d::ContinuousDistribution) = wrapped(d, -π, π) (taking care to not promote types unnecessarily; too bad there's no irrational). Then if one has data with a different period than 2π, they can scale it before fitting. Not perfect, but works.

If we want Wrapped{<:Cauchy} and Wrapped{<:Normal} to be like VonMises, where the support is [μ-π, μ+π), we can change the default e.g. wrapped(d::Normal) = wrapped(d, d.μ-π, d.μ+π)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generically supporting wrapped distributions
3 participants