Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept data.frame in predict() #48

Open
bwiernik opened this issue Aug 28, 2021 · 8 comments
Open

Accept data.frame in predict() #48

bwiernik opened this issue Aug 28, 2021 · 8 comments
Labels
enhancement enhancement suggestion

Comments

@bwiernik
Copy link
Contributor

Classification:

Enhancement Suggestion

Summary

I'm trying to fit an rma.mv() model with a fairly complex moderator structure (factor variable with several levels and splines for a continuous variable, interaction effects to model the different outcomes). Given this complexity, I'm having a really hard time trying to construct the model matrix for newmods by hand. It would be much easier and less error-prone if predict.rma() accepted a data frame for the newmods argument the same way most R modeling functions do.

Reproducible Example (if applicable)

library(metafor)
#> Loading required package: Matrix
#> 
#> Loading the 'metafor' package (version 3.1-3). For an
#> introduction to the package please type: help(metafor)
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
mod <- rma(yi ~ ablat + year, vi, data=dat)
predict(mod, newmods = dat[1:4,])
#> Error in predict.rma(mod, newmods = dat[1:4, ]): Argument 'newmods' should be a vector or matrix, but is of class 'escalc'.Argument 'newmods' should be a vector or matrix, but is of class 'data.frame'.

cf. predict.lm():

predict(lm(mpg ~ disp, data = mtcars), newdata = mtcars[1:5,])
#>         Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive 
#>          23.00544          23.00544          25.14862          18.96635 
#> Hornet Sportabout 
#>          14.76241
@wviechtb
Copy link
Owner

predict.rma() is such an unholy mess that I barely dare to touch it 🤯

But this aside, there are actually two things here:

First, dat[1:4,] contains variables that are not actually part of the moderators, so those should then be (silently?) ignored. So, at the moment, while

predict(mod, newmods = as.matrix(dat[1:4,c(3,8)]))

works,

predict(mod, newmods = as.matrix(dat[1:4,c(1,3,8)]))

also throws an error. One can debate this implementation. It is less convenient, but it can also be argued to provide a certain 'safeguard' (one should only put moderators into newmods that are actually in the model). I will have to think about how things are implemented in predict.rma() to see whether this requirement can be dropped though.

The other thing is the class of what you put into newmods. That indeed must be a matrix (or vector) at the moment. I can't just do a if (is.data.frame(newmods)) newmods <- as.matrix(newmods) since as.matrix(dat[1:4,]) would just end up being an all character matrix.

As a general workflow that deals with these two issues automatically, you could do however:

mod <- rma(yi, vi, mods = ~ ablat + year, data=dat)
predict(mod, newmods = model.matrix(formula(mod), dat[1:4,])[,-1])

That is also how you can deal with the fact that predict.rma() does not create the model matrix for you -- the user has to do this directly (that's a design decision that I regret, but it's way too late to change this now). So, for example, one has to do:

mod <- rma(yi, vi, mods = ~ ablat + year + I(ablat^2), data=dat)
predict(mod, newmods = c(44,1948,44^2))

(which gets really tedious when you have interactions, factors, and other 'derived' terms in your model), but one can also just do:

predict(mod, newmods = model.matrix(formula(mod), dat[1,])[,-1])

(the [,-1] is to get rid of the (Intercept) term, since that gets added by predict.rma() and shouldn't be in newmods -- another regrettable design decision).

Note that for splines, you have to be careful. The knot positions that were used in the model should obviously be the same as those used in newmods, so you should check carefully that model.matrix() is doing that. See also this example to illustrate this point: https://www.metafor-project.org/doku.php/tips:non_linear_meta_regression#spline_model

@bwiernik
Copy link
Contributor Author

The way predict works for most R modeling functions is that it takes the data frame supplied and then evaluates the model formula or terms with that data frame as the environment. So, for example, I would absolutely expect to be able to provide a data frame with unused variables or a factor variable that is not already dummy coded, etc.

@wviechtb
Copy link
Owner

Yeah, that's just not how things work in metafor. When I wrote rma() (this goes back to code I wrote around 2000), I did not have such a deep understanding of things like model frames, terms, and so on, and hence did not make use of this 'machinery'. As I said, it's a regrettable design choice I made a long time ago and I cannot really fix this now or break tons of code. Using model.matrix(formula(<model>), <some data frame>)[,-1] is a workaround.

@bwiernik
Copy link
Contributor Author

Hey @wviechtb I've pulled together some functions to help construct a model matrix from a data frame in newmods https://gist.github.com/bwiernik/9602623e092662bcc70e9b37f78c838b

Now that the modeling objects return the data argument (bwiernik@1c57165), I could simplify these a bit, and they could work natively inside predct.rma(). Would you be open to a PR for that?

Related to this, it would be really helpful if the rma object could also store xlev and contrasts slots for factor variables in the model.

@wviechtb
Copy link
Owner

Hi @bwiernik. Thanks for those suggestions. Some thoughts:

  1. get_tau_ci() essentially does confint()$random[2,]. Don't quite see the added benefit, except that one doesn't have to dig into the object returned by confint() to figure out how to pull the stuff out of it. I would rather stick to known idioms and possibly consider an as.data.frame method for confint.rma objects that works appropriately not just for rma.uni objects but also when used for example on rma.mv objects.
  2. clean_rma_predictions(predict(res)) made me aware that as.data.frame(predict(res)) was also returning the cr.lb and cr.ub elements, which are now superseded by pi.lb and pi.ub. Thanks! I will update the as.data.frame() method to not do that, so that takes care of that.
  3. Most importantly, create_newmods(). I really can't change how predict.rma() works at this point as this would break tons of code. As a helper function, this is something to consider, although not at the cost of adding two dependencies.

@bwiernik
Copy link
Contributor Author

bwiernik commented Feb 23, 2022

Let me clarify a few things:

  1. Yes, this is a just a helper for my students

  2. Great!

  3. If I were to make a pull request, I would of course write it without the dependencies. I am not exactly following your concern about breaking predict.rma(). Incorporating create_newmods() wouldn't require changing any of the core functioning of predict.rma(). This could be added at the top of the function, where currently there is a check that newmods is a matrix. Rather than erroring, the function could check if newmods is a data frame and then construct the needed matrix. This shouldn't break any other code because it is all happening upstream of the actual function operations, which would still only ever see the expected matrix.

An even safer alternative might be to add a newdata argument that is explicitly intended to be a data frame, from which newmods and newscale can be constructed. That would definitely not collide with any existing code.

Besides those, what do you think of adding the xlev and contrasts information to the rma object?

@wviechtb
Copy link
Owner

wviechtb commented Feb 23, 2022

  1. Ok, but then I would rather go with a as.data.frame method.
  2. Done.
  3. Good point. Okay, I see this as a possibility now. I also just realized that formula() with type="mods" (the default) should always work, even if the formula is specified via yi (i.e., it should then just get the part after ~). Then one wouldn't have to do the is.null(ma_obj$formula.yi) check. I just fixed this. Adding xlevels and contrasts should also be possible, although how this is done requires some thought. Subsetting/removal of missings can affect things. Also, what if the user doesn't use data or has specified the model matrix directly to mods? This will all require some thought.

@wviechtb
Copy link
Owner

Also 1. is done now. Will eventually tackle 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement enhancement suggestion
Projects
None yet
Development

No branches or pull requests

2 participants