-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Wrapped distribution wrapper #1724
base: master
Are you sure you want to change the base?
Conversation
This version of |
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## master #1724 +/- ##
==========================================
- Coverage 85.89% 80.76% -5.14%
==========================================
Files 139 144 +5
Lines 8389 7122 -1267
==========================================
- Hits 7206 5752 -1454
- Misses 1183 1370 +187
☔ View full report in Codecov by Sentry. |
Also #1665 adds a |
Before I spend more time polishing this and adding a test suite, a few questions:
|
I should preface this by saying that I know almost nothing about wrapped distributions, so any opinions expressed here are not strongly held and may be uninformed.
I don't see why not. ¯\_(ツ)_/¯ I assume that's something that's not uncommon in practice and/or would be difficult to achieve with a simple data transformation or similar?
I don't see why not. ¯\_(ツ)_/¯ Same general question though.
Defining methods for the
Not all distributions have
Could |
l::T, | ||
u::T; | ||
tol::Real=sqrt(eps(float(T))), | ||
characteristic::Bool=_wrapped_use_characteristic(d, l, u, tol), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an odd thing to store in a field and expose to the user. Naively, it seems like an implementation detail that an average user would not need to necessarily care about or modify. I would expect that whether the characteristic function is used would be up to methods defined for particular D
s in Wrapped{D}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this design seems non-ideal. But this field isn't so much for the user as it is for a developer who has some some analysis of Wrapped{D}
for some specific D
. In short, the two series representations of pdf have different convergence properties depending on D
and its parameters. The one that sums the shifted pdf, etc of D
converges quickly when the tails of the distribution quickly decay. The one using the characteristic function decays quickly when D
is not very peaked. These conditions are often opposites.
Ideally, this would be automatically chosen for every distribution based on its parameters, but this requires someone to sit down and work out bounds for the number of terms needed for each series representation. Having a field allows this to be done at construction time instead of every time pdf
is called. See e.g. Wrapped{Normal}
, where we use known bounds to switch between them.
For cases where bounds are not known but the user knows one converges faster than the other for their specific D
and its parameters, the field can be useful, but this is not the primary use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense, thanks. Would it be reasonable then, do you think, to not document the field and remove it altogether once someone has gone through to work out the numbers of terms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the question. There is no one number of terms. For a given wrapped distribution dist
with values of params
, one or both of the series will have fewer terms. But if you change any of the params
values or change the dist
, then the other series might now have fewer terms. The difference can be quite extreme. For wrapped normal, for example, if σ=1e+3
, the series using cf
needs only 1 term, while the series using pdf needs >2,000 terms. But for σ=1e-3
, the series using cf
needs >6,000 terms, while the one using pdf
needs 1 term. So we need to choose whether to use the pdf
series or cf
series. Options:
- always use one or the other. This is bad for the reasons noted above
- store a field like we do here. The user can select which strategy they want to use, or a developer can overloaded
wrapped(::SomeDistribution, ...)
if they have a heuristic for choosing the strategy - Instead of storing a field, the strategy coul be picked at every evaluation of
pdf
,cdf
, etc. Likely wasteful - Instead of choosing, every function evaluates both series in parallel until one of them converges. Potentially runs into issues when
cf
is not implemented, and does twice as much works as might be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the question. There is no one number of terms.
Sorry, I was referring to what you had said in the preceding message:
this requires someone to sit down and work out bounds for the number of terms needed for each series representation.
_wrapped_use_characteristic(d::UnivariateDistribution, l, u, tol) = false | ||
|
||
minimum(d::Wrapped) = d.lower | ||
maximum(d::Wrapped) = d.upper - eps(float(d.upper)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Wrapped{D,S,Int}
, this will produce different types for minimum
and maximum
, which seems weird to me. In other cases we use the limit, e.g. when it's unbounded, which I think (again naively) would be acceptable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a wrapped distribution, d.lower
and d.upper
are identified as the same point, so for a discrete distribution, including them both will always give the wrong answer if d.upper
is in the support of the wrapped distribution. Ideally maximum(d)
would return the largest point strictly less than d.upper
, but AFAICT there is no function for this in the Distributions API.
See for example the wrapped Poisson example in #1716 (comment). Because 0 and 15 are the same point, the PMF sums to greater than 1 because maximum(d)
returned d.upper
.
A compromise could be returning d.upper
for continuous distributions and this result for discrete ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally
maximum(d)
would return the largest point strictly less thand.upper
Distributions defines the support of a DiscreteUnivariateDistribution
to be round(Int, minimum(d)):round(Int, maximum(d))
, which means that it doesn't support supports (🙃) with non-unit steps. That implies that the largest point that's strictly less than d.upper
is ceil(Int, d.upper) - 1
. Would that work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Distributions defines the support of a
DiscreteUnivariateDistribution
to beround(Int, minimum(d)):round(Int, maximum(d))
, which means that it doesn't support supports (upside_down_face) with non-unit steps.
Oh really? But this doesn't seem to be consistently enforced, e.g.
julia> d = Binomial(10, 0.5) * 1//2;
julia> support(d)
0//1:1//2:5//1
julia> d = DiscreteNonParametric(randn(5), normalize(rand(5), 1));
julia> support(d)
5-element Vector{Float64}:
-0.9108476704164078
-0.5194957770100691
-0.024410217492521903
0.15370458561285605
1.338024968131234
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, amazing. 🤦 I was going based on the generic definition for DiscreteUnivariateDistribution
, I had forgotten about LocationScale
and I don't think I even knew DiscreteNonParametric
existed. To be fair, I did say I was uninformed!
It's certainly useful. e.g. one might want to use
I did some searching, and the only mention I can find of k-times wrapping is in Directional Statistics by Mardia and Jupp, where they give no references for papers that use it. I suspect the most likely version to use besides
Yes I think this is a reasonable solution. For bounds, I propose the fallback If we want |
Adds a
Wrapped
distribution that wraps an original distribution around some interval. Optionally a parameterk
is used to indicate that it should be multiply-wrapped, i.e. that the resulting wrapped distribution should have ak
-fold periodicity in the interval.Fixes #1716 and #1715