-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shortest Probability Interval support #136
Comments
Ah at the moment I probably don't have time to investigate it, though I don't in principle object if it is a better variant of HDIs. How does it differ from |
This thread by @bwiernik seems relevant. |
Ah hrm. So it's also based on density estimators? I guess I'm not sure the advantage without doing more comparison (or seeing comparisons others have done if there are some). One nice property of hdci is that it doesn't require a density estimator. |
The major advantage of SPI over HDI is reduced Monte Carlo error error. https://doi.org/10.1007/s11222-015-9563-8 Both HDI via empirical shortest distance algorithm and SPI via the method linked above are implemented in bayetestR if you wanted to swap in that for HPDInterval |
Yeah the reason I'm interested specifically in comparisons of SPI to HDCI (not HDI) is that HDCI uses the CDF approach from HDInterval, so if those two methods are similar I can (lazily) just leave things as they are ;) |
What's HDCI stand for? |
@DominiqueMakowski I think you had some simulation code for comparing intervals set up? Could you compare 80% and 95% intervals for HDI, HDCI, and SPI? |
"hdci yields the highest-density continuous interval" https://mjskay.github.io/ggdist/reference/point_interval.html |
I should probably state more clearly what hdci is in the docs. It uses the algorithm for a continuous interval from HDInterval, which basically uses the ECDF to find the smallest interval of the specified mass: https://github.com/mikemeredith/HDInterval/blob/main/R/hdiVector.R It is nice because it is simple, fast, and does not rely on picking a density estimator. But if the SPI algorithm has lower error maybe I'd switch it to that. Or a better option might be for me to make sure |
I can't seem to find it, but it was basically a big loop generating various distributions of different sizes and computing the different types of CIs. I'm not sure how we can reliably assess the "error" 🤔 Unless it already exists, it would be quite interesting to create a gist/vignette/paper that would 1) describe and explain the advantages/caveats of each interval method, 2) provide clean, community-contributed, R code to compute them (that could be re-used in other packages), and 3) provide some benchmarking on efficiency & behavior (PS: I will be co-editing a special issue in Mathematics on Advances in Statistical Computing and I'm pretty sure such work would be a great fit...) |
This is a great idea! I would be happy to help with something like this but don't really have the time to lead it. Maybe I could see if one of my students is interested... |
good idea! also tagging |
Sounds great! I'm in (: |
Doesn't the current implementation of |
Ah, I see this is a different algo to estimate the same thing as HDCI. Cool! |
Yeah, I'm interested in supporting a paper, too! |
Just need someone that actually does the job now 😁 |
Alright I started playing with this for my own curiosity. Some comparisons here: https://github.com/mjskay/interval-estimators/blob/main/interval-estimators.md (source Rmd in same repo). I can see three broad classes worth investigating:
I have been looking at a few quantities of interest:
Also to look at: bias. One intriguing thing so far is that SPI doesn't perform much differently on these metrics compared to the empirical shortest interval implementations (with the exception of having coverage closer to nominal), which is surprising as the SPI paper says SPI should have lower error than HDCI. I'm not sure if that's an implementation issue or what. The other thing I've found so far is that amongst HDI algorithms, See e.g. this chart: these are 80% ribbons; there are 100 groups along the x axis, each a sample of size 10,000 from N(0,1). As far as I'm concerned qi and (barely) hdr are the only two with acceptable variance here. The other three are different algorithms for shortest intervals. Given that 10,000 tends to be the upper end of effective sample sizes people tend to generate from MCMC, unless there's a better algorithm out there for shortest intervals we aren't aware of, I'm not sure I'd recommend using them... |
I find it very surprising that even with 500 samples the methods don't differ that much (and that SPI seems to have relatively larger variance, if ever so slightly...). Would it be interesting to compare these methods with |
I think the comparison could make sense for single distributions. For multiple distributions, |
Hi Matthew,
Do you plan to support the Shortest Probability Interval (further details here) with functions like ggdist::median_spi?
Thanks!
The text was updated successfully, but these errors were encountered: