Use `ggdist::curve_interval` for plotting curves in Bayesian models #455

mattansb · 2021-08-14T21:48:31Z

This gives an overall smoother line / interval (less "lumpy").

Linear regression plot

m <- rstanarm::stan_glm(mpg ~ hp, data = mtcars)

plot(modelbased::estimate_relation(m, length = 100))

Looks lumpy...

library(ggplot2)

emt <- emmeans::emmeans(m, ~ hp, at = list(hp = seq(min(mtcars$hp), max(mtcars$hp), length.out = 100)))

tidybayes::gather_emmeans_draws(emt) |>
  ggdist::curve_interval(.value, .along = hp, .width = 0.95)  |>
  ggplot(aes(hp, .value)) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper), alpha = 0.4) +
  geom_line()

Mmmmm... smoothhhhh

JN-plot smooth() plot

formula <- brms::brmsformula(
  mpg ~ s(wt),
  sigma ~ s(wt),
  beta ~ s(wt),
  family = brms::exgaussian()
)

m <- brms::brm(formula, data = mtcars, chains = 1, iter = 200)

plot(modelbased::estimate_slopes(m, trend = "wt", at = "wt", length = 100))

library(ggplot2)

emt <- emmeans::emtrends(m, ~ wt, "wt", at = list(wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 100)))

tidybayes::gather_emmeans_draws(emt) |>
  ggdist::curve_interval(.value, .along = wt, .width = 0.95)  |>
  transform(.sig = .lower > 0 | .upper < 0) |>
  transform(.part.group = {function(x) rep(seq_along(x$lengths), x$length)}(rle(.sig))) |> 
  ggplot(aes(wt, .value)) + 
  geom_hline(yintercept = 0) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper, group = .part.group,  fill = .sig), alpha = 0.4) +
  geom_line()

DominiqueMakowski · 2021-08-15T01:28:10Z

Mmh interesting, I'll try to include that... I suppose it will go somewhere here:

https://github.com/easystats/modelbased/blob/eed1a1572fd71197aaa979dbcb857c6d0be5f459/R/visualisation_recipe.estimate_predicted.R#L292-L293

this one looks funny tho

DominiqueMakowski · 2021-08-15T01:29:25Z

oh actually no, this should be something that must be done at the level of bayestestR, when the computation of the CI is done. Maybe we could add an option to describe_posterior like smooth_ci or something?

DominiqueMakowski · 2021-08-15T01:30:06Z

or in ci_method add "curve" for curve_interval method

mattansb · 2021-08-15T04:30:50Z

Yeah, that sounds right (:

mattansb · 2021-08-15T04:31:37Z

this one looks funny tho

A very hurtful thing to say to the poor graph....

mattansb · 2021-08-15T04:34:53Z

Actually, do we need this implemented in bayestestR? It would essentially be a wrapper around ggdist::curve_interval, no?

DominiqueMakowski · 2021-08-15T04:48:29Z

Well it seems to be like it belongs there, because if I understand it's a different method of computing CIs, different from hdi, quantile etc. Perhaps mostly used for plotting, true, but still

Moreover, modelbased now simply runs insight:

https://github.com/easystats/modelbased/blob/3ae98f03d5311b883276c67c18dd2cf5b49836ee/R/estimate_predicted.R#L219-L226

which in turn calls bayestestR ^^

https://github.com/easystats/insight/blob/f4e5c6d312f96fd671bc1cdc91d274bc567ad41e/R/get_predicted_ci.R#L435-L437

so yeah for generalizability (for people that use insight::get_predicted directly) it'd be convenient to have that feature in bayestestR

bwiernik · 2021-08-15T05:08:17Z

I think it's okay if the curve estimation and the tabling have slightly different estimation. It would be okay to restrict the ggdist calls to see

mattansb · 2021-08-15T06:44:37Z

Another example with moderation:

m <- rstanarm::stan_glm(mpg ~ factor(cyl) * hp, data = mtcars,
                        chains = 1, iter = 200, refresh = 0)

em_cyl <- emmeans::emmeans(m, ~ cyl + hp,
                           at = list(hp = seq(50, 350, by = 15))) |>
  tidybayes::gather_emmeans_draws()

library(ggplot2)

em_cyl |>
  ggdist::median_hdci(.width = 0.9) |>
  transform(cyl = factor(cyl)) |>
  ggplot(aes(hp, .value, ymin = .lower, ymax = .upper, 
             fill = cyl, color = cyl, group = cyl)) + 
  geom_ribbon(alpha = 0.2, color = NA) + 
  geom_line()

em_cyl |>
  ggdist::curve_interval(.width = 0.9) |>
  transform(cyl = factor(cyl)) |>
  ggplot(aes(hp, .value, ymin = .lower, ymax = .upper, 
             fill = cyl, color = cyl, group = cyl)) + 
  geom_ribbon(alpha = 0.2, color = NA) + 
  geom_line()

^{Created on 2021-08-15 by the reprex package (v2.0.0)}

mjskay · 2021-08-18T06:47:18Z

Chiming in since @DominiqueMakowski pinged me on mjskay/ggdist#95. I'm not sure in what context you're planning to use curve_interval, but it's worth pointing out a few things you may want to ponder:

It is giving a joint band rather than pointwise intervals, which may or may not be what you want. I'm not sure where in your API this sits and if users are expecting pointwise or joint intervals there in general.
I hesitate to suggest it being the default for anything at the moment as it is a bit experimental, though I'd certainly be happy if folks put it through its paces. Unlike pointwise intervals, getting good joint bands is nontrivial and can quite possibly give weird results on some datasets. I'll say that I think what I have in curve_interval is more robust than anything else I've seen, but I wouldn't be surprised if someone came along with a dataset that caused it to do weird stuff.
Internally it currently requires {posterior} (only required as Suggests) so you might need to convince me to rewrite part of it to not use posterior, rewrite it for me (I wouldn't object!), or accept {posterior} as a dep in Suggests.
A slightly more reliable calculation of band depth (one step in the algorithm for finding the simultaneous bands) as implemented in {fda} can be used with curve_interval via .interval = "bd-mbd" but this requires the {fda} package, which pulls in a bunch of dependencies. Worth pondering if you are thinking about defaults.

mjskay · 2021-08-19T04:45:03Z

One last thought from me: I was trying to figure out why your pointwise intervals were so noisy looking in the first place and realized you are using HDIs. HDIs can be very noisy, as you have discovered. If you use quantile intervals instead you'll find the pointwise bands are much smoother, besides having other benefits (like being invariant under transformation). FWIW if I were picking a default pointwise interval type I would lean in that direction.

DominiqueMakowski transferred this issue from easystats/modelbased Aug 18, 2021

DominiqueMakowski mentioned this issue Aug 18, 2021

curve_interval: a numeric/dataframe method? mjskay/ggdist#95

Closed

strengejacke added enhancement 💥 Implemented features can be improved or revised feature idea 🔥 New feature or request labels Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `ggdist::curve_interval` for plotting curves in Bayesian models #455

Use `ggdist::curve_interval` for plotting curves in Bayesian models #455

mattansb commented Aug 14, 2021

DominiqueMakowski commented Aug 15, 2021

DominiqueMakowski commented Aug 15, 2021 •

edited

DominiqueMakowski commented Aug 15, 2021

mattansb commented Aug 15, 2021

mattansb commented Aug 15, 2021

mattansb commented Aug 15, 2021

DominiqueMakowski commented Aug 15, 2021

bwiernik commented Aug 15, 2021

mattansb commented Aug 15, 2021

mjskay commented Aug 18, 2021

mjskay commented Aug 19, 2021

Use ggdist::curve_interval for plotting curves in Bayesian models #455

Use ggdist::curve_interval for plotting curves in Bayesian models #455

Comments

mattansb commented Aug 14, 2021

Linear regression plot

JN-plot smooth() plot

DominiqueMakowski commented Aug 15, 2021

DominiqueMakowski commented Aug 15, 2021 • edited

DominiqueMakowski commented Aug 15, 2021

mattansb commented Aug 15, 2021

mattansb commented Aug 15, 2021

mattansb commented Aug 15, 2021

DominiqueMakowski commented Aug 15, 2021

bwiernik commented Aug 15, 2021

mattansb commented Aug 15, 2021

mjskay commented Aug 18, 2021

mjskay commented Aug 19, 2021

Use `ggdist::curve_interval` for plotting curves in Bayesian models #455

Use `ggdist::curve_interval` for plotting curves in Bayesian models #455

DominiqueMakowski commented Aug 15, 2021 •

edited