New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for data frames with an rvar
column
#604
Comments
I'm not quite sure what the function is doing, isn't there an easy way to convert |
Nah, ignore all that gross code, I copied it from something else I had. We definitely can convert the rvar into a data frame, but I want to preserve the other columns and return them as well. |
It could be roughly something like library(brms)
library(posterior)
library(bayestestR)
mod <- brm(mpg ~ factor(cyl) + hp, data = mtcars,
backend = "cmdstanr", cores = 4, refresh = 0)
grid <- expand.grid(
cyl = unique(mtcars$cyl),
hp = mean(mtcars$hp)
)
grid$epred <- posterior_epred(mod, grid) |> rvar()
x <- grid
x$epred <- NULL
rbind(
bayestestR::describe_posterior(x),
bayestestR::describe_posterior(grid$epred)
)
#> Summary of Posterior Distribution
#>
#> Parameter | Median | 95% CI | pd | ROPE | % in ROPE
#> ------------------------------------------------------------------------
#> cyl | 6.00 | [ 4.10, 7.90] | 100% | [-0.10, 0.10] | 0%
#> hp | 146.69 | [146.69, 146.69] | 100% | [-0.10, 0.10] | 0%
#> x[1] | 19.16 | [ 16.55, 21.76] | 100% | [-0.10, 0.10] | 0%
#> x[2] | 25.12 | [ 22.25, 28.03] | 100% | [-0.10, 0.10] | 0%
#> x[3] | 16.56 | [ 13.87, 19.21] | 100% | [-0.10, 0.10] | 0% with some nicer printing. Not sure what |
x <- rnorm(4000)
head(x)
#> [1] -0.627925890 0.675107948 1.375053570 1.195942455 -0.006494341 0.894640065
rvar(x)
#> rvar<4000>[1] mean ± sd:
#> [1] 0.024 ± 0.99 Or instead of having a matrix of posterior samples x estimates, you can have a vector of rvars: m <- matrix(rnorm(4000*4), 4000, 4)
head(m)
#> [,1] [,2] [,3] [,4]
#> [1,] -0.8296439 -0.2135017 0.35422961 1.0199753
#> [2,] -0.7488206 -0.2938458 0.67141688 1.1986600
#> [3,] -0.7133707 1.0533782 0.20289016 -0.8099680
#> [4,] -0.4385958 -0.4919160 0.07978666 0.5154708
#> [5,] -0.5045125 1.5209865 0.58179130 0.8505250
#> [6,] -0.1048083 -0.7765123 -1.12228048 -0.9841341
rvar(m)
#> rvar<4000>[4] mean ± sd:
#> [1] 0.0049 ± 1.00 0.0151 ± 1.00 0.0038 ± 1.02 -0.0186 ± 0.99 You can read more about it here: https://mc-stan.org/posterior/articles/rvar.html describe_posterior.data.frame <- function(posteriors, ...) {
is_rvar <- which(sapply(posteriors, inherits, "rvar"))
if (length(is_rvar) == 0L) {
# Do usual stuff...
return(bayestestR:::describe_posterior.data.frame(posteriors, ...))
} else if (length(is_rvar) > 1L) {
insight::format_error("Cannot use more than 1 rvar")
}
# describe_posterior(<rvar>)
out <- describe_posterior(posteriors[[is_rvar]], ...)
out$Parameter <- NULL
# append other data frame columns to the beginning of the output
df <- posteriors[,-is_rvar, drop = FALSE]
df_nms <- colnames(df)
out[df_nms] <- df
out <- datawizard::data_relocate(out, select = df_nms, before = 1)
return(out)
} Here's an example of typical workflow: library(brms)
library(posterior)
library(bayestestR)
mod <- brm(mpg ~ factor(cyl) + hp, data = mtcars,
backend = "cmdstanr", cores = 4, refresh = 0)
#> Compiling Stan program...
#> Start sampling
#> Running MCMC with 4 parallel chains...
#>
#> Chain 1 finished in 0.3 seconds.
#> Chain 2 finished in 0.3 seconds.
#> Chain 3 finished in 0.2 seconds.
#> Chain 4 finished in 0.2 seconds.
#>
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.3 seconds.
#> Total execution time: 0.5 seconds.
#>
grid <- expand.grid(
cyl = unique(mtcars$cyl),
hp = mean(mtcars$hp)
)
grid$epred <- posterior_epred(mod, grid) |> rvar()
grid
#> #> cyl hp epred
#> #> 1 6 146.6875 19 ± 1.3
#> #> 2 4 146.6875 25 ± 1.4
#> #> 3 8 146.6875 17 ± 1.3
describe_posterior(grid)
#> Summary of Posterior Distribution
#>
#> cyl | hp | Median | 95% CI | pd | ROPE | % in ROPE
#> --------------------------------------------------------------------------
#> 6.00 | 146.69 | 19.16 | [16.63, 21.63] | 100% | [-0.10, 0.10] | 0%
#> 4.00 | 146.69 | 25.10 | [22.20, 27.79] | 100% | [-0.10, 0.10] | 0%
#> 8.00 | 146.69 | 16.65 | [14.10, 19.29] | 100% | [-0.10, 0.10] | 0% |
The
rvar
class is a super convenient way to wrangle posteriors.Here is a basic workflow I've adopted:
Unfortunately, this data frame structure does not work with
bayestestR
functions (here is an example withdescribe_posterior()
):And passing the
rvar
column loses all the grid data:What I'd like is for the
foo.data.frame()
methods to check if there is anrvar
column - if there is (only 1), it should be processed and added to the grid. Like so:Should be easy enough to implement...
The text was updated successfully, but these errors were encountered: