Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posterior predictive simulations #43

Open
bwiernik opened this issue Feb 16, 2021 · 5 comments
Open

Posterior predictive simulations #43

bwiernik opened this issue Feb 16, 2021 · 5 comments
Labels
feature feature request

Comments

@bwiernik
Copy link
Contributor

Classification:

Feature Request

Summary

I'd like to be able to simulate posterior draws and posterior predictive draws ala arm::sim() and bayesplot::pp_check(). I wrote a quick and dirty function to draw coefficients/tau or yi samples assuming that the REML and knha is used. I can generalize it, but I'd like to verify you'd be interested and get your input on API and coding style.

https://gist.github.com/bwiernik/be04c8817046ea0ffea9e9838928ea96

@wviechtb
Copy link
Owner

Hi Brenton. Not sure if you have seen the simulate.rma() method. Is this what you are trying to do via your function?

library(metafor)

dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
res <- rma(yi, vi, data=dat)
res

sim <- simulate(res, nsim=1000)
sav <- lapply(sim, function(x) rma(x, vi, data=dat))
sapply(sav, function(x) x$beta)
t(sapply(sav, function(x) c(x$beta, x$tau2)))

@bwiernik
Copy link
Contributor Author

As far as I can tell, the simulate.rma() function doesn't incorporate any uncertainty in either the mean function or the tau2 estimate, similar to the stats::simulate.lm() function?

@wviechtb
Copy link
Owner

wviechtb commented Feb 16, 2021

Correct, it just simulates new data based on the estimated fixed effects and the marginal var-cov matrix of the estimates. Can you point me to a paper describing the theory as to what you are trying to accomplish?

To add to this: I took a look at getMethod("sim", "lm") from arm. I see what they are doing there. But doing something similar is going to be very difficult for meta-analytic models. In simple regression models, the distribution of sigma^2 is a scaled chi-square distribution but this doesn't apply to tau^2 in random-effects models. For iterative estimators, such as REML, the exact distribution is not even known.

@bwiernik
Copy link
Contributor Author

This is the original paper on posterior predictive checks: http://www.stat.columbia.edu/~gelman/research/published/A6n41.pdf
See also https://cran.r-project.org/web/packages/bayesplot/vignettes/graphical-ppcs.html and Andrew Gelman and Jennifer Hill. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

So, the function I linked to does two things:

  1. Draws samples of model parameters based on the fitted model and uncertainty (parametric bootstrap), analogous to posterior draws in Bayesian estimation. Implemented as in arm::sim() using the sampling distributions of beta and tau2.
  2. Optionally uses those sampled parameters to draw samples of individual effect sizes (ala predict()). Compared to simulate() or predict(), each row of the sample matrix contains its own mean computed from a draw of the simulated betas and its own SD computed from a draw of the simulated tau2s and the sample vi.

The two main applications here are:

  1. Model checking--do the simulated effect size distributions resemble the observed distribution or are there major assumption violations?
  2. Uncertainty visualization incorporating uncertainty in the tau2 value (e.g., as we might use for power analysis for complex models).

@wviechtb
Copy link
Owner

If you know the distribution of the parameter estimates, then you can indeed use this information to directly sample them. However, in random-effects models, this is not the case (except for some very special cases). Once you move to rma.mv() models, it gets even more tricky.

However, if you simulate new data from the estimated parameters and then refit the same model to each simulated dataset, you should be accomplishing the same thing or something close to it. The former is of course faster, but the latter is completely general as it does not require that we figure out for every possible model what the distribution of the parameter estimates is.

I examined this with an lm() model:

library(arm)
library(RcppEigen)

res <- lm(mpg ~ cyl + hp + wt, data = mtcars)
sim1 <- sim(res, n.sims = 1000000)

X <- model.matrix(res)

tmp <- simulate(res, nsim = 1000000)
sav <- lapply(tmp, function(y) fastLmPure(X, y)) # much faster than lm()
sim2 <- data.frame(t(sapply(sav, coef)))
sim2$sigma <- sapply(sav, function(x) x$s)

par(mfrow=c(5,1))

for (i in 1:4) {
   plot(ecdf(sim1@coef[,i]), col="red")
   plot(ecdf(sim2[,i]), add=TRUE, col="blue")
}

plot(ecdf(sim1@sigma), col="red")
plot(ecdf(sim2$sigma), add=TRUE, col="blue")

summary(sim1@sigma)
summary(sim2$sigma)
sd(sim1@sigma)
sd(sim2$sigma)

For the fixed effects, the distributions appear to be essentially the same. For sigma, they are not, but they are similar. I don't know the theory that would say under what conditions these two approaches converge to the same distributions (maybe at least asymptotically?). But based on this, I would be hesitant to embark on the attempt to do something like arm::sim() for meta-analytic models, when the machinery to do something analogous is in essence already available (and again, right now we don't even have the theory to implement something like arm::sim() for random-effects models anyway).

@wviechtb wviechtb added the feature feature request label Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature feature request
Projects
None yet
Development

No branches or pull requests

2 participants