-
-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDIs give unrealistic values #2207
Comments
I think the solution to this is not doing any interpolation along the "x" dimension. If we do any smoothing, it should be when computing the hdi in the similar was as computing the quantiles do (this is the discussion in the other issue you linked, the main con of this is it is not clear how to go about this for hdi, at least to me). I would support not smoothing by default. Also always in favour of adding better docs. The whole smooting procedure along with any potential warnings should ideally be documented in the notes section. See https://numpydoc.readthedocs.io/en/latest/format.html#notes for reference on the documentation style we aim to follow. Very happy to provide support if you can help with that. I am not sure it is possible to compute things in the log-odds domain and then transform these back. Computing quantiles should be the same before and after a monotonic transformation, but even for monotonic transformations, the mean of the transformed samples is not the same as the transformation of the mean, the same happens for the hdi. |
(Sorry @OriolAbril, I missed this when you replied originally.)
|
Here is an example to try and illustrate the different interpolations: rng = np.random.default_rng(42)
a = rng.normal([0, 0, 0, 1, 1, 1, 1, 1, 1], size=(10, 9))[:, [0, 6, 7]]
# a has now shape (10, 3) We will take the first dimension to be equivalent to draw/sample one, the 2nd one is the x dimension. We take x to take values When we use numpy quantile function, we get values that are not exactly any of the values in our array. It interpolates between the two closest values and finds a value between them that it then returns. It is therefore not possible to have values larger than the max in the array returned by quantile. np.quantile(a, 0.9, axis=0)
# array([0.61544513, 1.62616018, 1.49426994]) Here are the larger two values in each column for comparison: np.sort(a, axis=0)[-2:, :]
# array([[0.58622233, 1.62559039, 1.48074666],
# [0.8784503 , 1.63128823, 1.61597942]]) HDI on the other hand returns exactly a value from the array: az.hdi(a[None, ...])[:, 1]
# array([0.8784503 , 1.63128823, 1.61597942]) This makes HDI less "stable" and leads to the "wigglier" plot shown in #2168 (comment). To try and fix this a bit, However, this interpolation is radically different from the one in quantile. In this one, we take the values returned from HDI and their corresponding x and find a polinomial function that goes through those points. Here we have 3 points so we can make an exact quadratic interpolation (it is not the method used by plot_hdi but it is illustrative). It looks like this: It can clearly be seen that it goes over the maximum values present in the array, which is something I think we don't want to happen, and it can break model constraints. Referencing #2168 (comment) again, the 2nd plot looks much smoother, but there is no interpolation along the x axis. If we compute the quantiles column-wise we'll get the exact same values, and they will always preserve the contraints imposed by the samples. This is why I was advocating for a similar interpolation scheme in hdi so we can then get smoother looking results without the need for interpolation/smoothing along the x dimension |
Yes, I see your point. Just a clarification: in #2168 I question whether using standard t-distribution approximation is preferrable to using percentile CIs. Which of the two would you consider more appropriate? (or neither). |
Describe the bug
HDI smoothed intervals can result in impossible values. Furthermore, the smoothness parameter can have a unexpectedly large effect.
To Reproduce
Bambi code (somewhat irrelevant for the issue)
Plotting the bands now. Default behaviour in red.
Expected behavior
Not have the HDI upper limit go above 1.0 when using smoothing. Maybe do not smooth by default?
I appreciate that the HDIs should probably be computed in the log-odds domain but here we are. This likely going to be a case for many "back-transformed" estimates. Would it be reasonable to add a "link" or a "family" argument, so the HDI calculations are done in their relevant domain?
(Secondary point, I was a bit surprised that smoothing with a lower order polynomial results in so visibly different HDIs - maybe have a comment about it in docs?)
Issue is loosely related to #2168.
The text was updated successfully, but these errors were encountered: