feature request – predict.brms: allow new values for random effects fit with splines #1554

kat3stewart · 2023-10-27T15:19:55Z

The predict.brmsfit() function includes an argument allow_new_levels, which allows one to predict on new data containing group levels that were not present in the original data used to fit the model. However, this functionality does not extend to models whose random effects are fit with splines (as stated in the comments section of this post: https://discourse.mc-stan.org/t/predict-error-with-nested-random-effect-structure/4817/13).

If I understand correctly, the work-around for this issue is to re-run the model without a spline on the random effect. However, in some cases, this is not an ideal situation (e.g., when someone is explicitly interested in explicitly modelling the non-linear effects of group levels). For me, personally, I would much prefer to keep the splines on my random effects. Thus, is it possible to add this functionality to the predict function?

pminasandra · 2023-10-27T16:17:15Z

Definitely. This feature would be extremely helpful!

paul-buerkner · 2023-10-29T12:25:06Z

Can you post a minimal reproducible example here to explain what you have in mind?

kat3stewart · 2023-10-30T06:53:07Z

My model takes several weeks to run so it's likely more efficient if I privately send you the fit model as well as the data I would like to predict on. Would this be okay?

But, so others who view this post have an idea of the situation, I have trained the following model with 70% of my data:

m1 <- brm(bf(y|trials(1)~s(x1,k=6,bs="tp", m=2) + 
                s(x2, k=12, bs="cc",m=2) + 
                x3 + s(x2, by=x3, bs="cc", k=12, m=1) + 
                s(x4, bs="re") + s(x4, x2, bs="re")), 
           data=df, family=zero_inflated_binomial(link="logit",link_zi="logit"), 
           knots = list(x2 = c(0.5,12.5)),  prior=prior1, 
           cores=4, chains=4, thin=10, iter=4000, save_pars = save_pars(all = TRUE),
           control = list(adapt_delta=0.99), backend = "cmdstanr")

where y = a binomial response (1,0)
x1 and x2 = continuous predictors
x3 = a factor with 59 levels
x4 = >1000 individuals sampled monthly across the 59 levels of the factor

What I would now like to do is predict on the remaining 30% of data using predict.brms(). However, when I try to do this, I get the error that new levels are not allowed. This is because I have new individuals for my x4 variable. I have tried adding the argument allow_new_levels = TRUE, but I still get the same error. From my understanding, this is because I have fit the random effects with splines.

paul-buerkner · 2023-10-30T08:15:15Z

perhaps you can post a simple version of your model on simulated data here that only contains the core aspects relevant for this feature? I won't be able to run the model for several weeks either :-D Kathrine Stewart ***@***.***> schrieb am Mo., 30. Okt. 2023, 08:07:

…

Reopened #1554 <#1554>. — Reply to this email directly, view it on GitHub <#1554 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCW2AHD6ZTDFIPK3NICGN3YB5G3PAVCNFSM6AAAAAA6S7TKWWVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJQHAYDGNBRG4ZTGMI> . You are receiving this because you commented.Message ID: ***@***.***>

kat3stewart · 2023-10-31T08:36:05Z

Apologies, it took me some time to figure out how to simulate data that captures the core features of my actual data set, but please find below a minimally reproducible example. The model was run with brms version 2.20.4; cmdstanr version 0.5.3 and rstan version 2.32.3.

### (1) Simulate predictor variables

library(tidyverse)

df <- data.frame(
  x1=rep(c(1,2,3),each=1200), 
  x2=rep(c(1:12),each=100,times=3),
  grp=rep(c(letters[1:10]),each=10,times=36),
  ind=as.character(rep(c(1:100),times=36))
)

# where:
# x1 = year
# x2 = month
# grp = group id
# ind = sampled individuals

#add unique individual id/ per year
df$id <- paste(df$ind, df$x1, sep="_")

### (2) Add binomial response variable with distinct seasonal trend per group

ids <- unique(df$id)     #list of unique individual-by-year ids
my_grps <- letters[1:10] #list of groups

#list of probabilities by group
probs <- data.frame(
  grp=my_grps,
  pr=c(0.2, 0.2, 0.3, 0.2, 0.4, 0.3, 0.2, 0.2, 0.6, 0.3) 
)

newdata=c() #create empty data frame to store loop output
count=1

for(i in 1:length(ids)){
  temp=df[which(df$id==ids[i]),]
  temp$resp <- 0 #add empty response column
  grp=unique(temp$grp)
  pr=probs$pr[which(probs$grp==grp)] #probability of getting a '1' based on group
  m=rbinom(n = 12, size = 1, prob = pr) #randomly draw monthly responses 
  s=as.numeric(sum(m)-1) #number of successes-1
  #create seasonal pattern of successes 
  p=as.numeric(unique(match(grp,my_grps))) #first month according to group letter
  v=rep(0,12) #vector of zeroes (1 per month)
  diff=(12-(p+s)) 
  d=ifelse(diff<0, (abs(diff)),0)
  v[p:((p+s)-d)] <- 1 #replace the 0s with 1s (start with first month 'p' and add 's' months)
  v[0:d] <- 1 #if applicable, add remaining sampled 1s to start of vector
  temp$resp=v #fill response column with values in 'v'
  newdata[count]=list(temp)
  count=count+1
}
df <- do.call(rbind, newdata)

### (3) Split data into training/validation sets
b <- sample(letters[1:10], replace=F, size=3) #randomly sample three grps
df70=df[!(df$grp %in% b),] #training set
df30=df[df$grp %in% b,]    #validation set
  
### (4) Run Model
library(brms)
library(rstan)
library(cmdstanr)

#set prior
prior1 <- c(prior(normal(0, 1.5), class = Intercept),
            prior(normal(0, 1.5), class = b),
            prior(exponential(1), class = sds))

options(mc.cores = parallel::detectCores())

#fit model
m1 <- brm(bf(resp|trials(1)~s(x1,k=3,bs="tp", m=2) + 
                s(x2, k=12, bs="cc",m=2) + 
                grp + s(x2, by=grp, bs="cc", k=12, m=1) + 
                s(ind, bs="re") + s(ind, x2, bs="re")), 
           data=df70, family=zero_inflated_binomial(link="logit",link_zi="logit"), 
           knots = list(x2 = c(0.5,12.5)),  prior=prior1, 
           cores=4, chains=4, thin=10, iter=2000, save_pars = save_pars(all = TRUE),
           control = list(adapt_delta=0.99), backend = "cmdstanr")

#(5) Predict on new data
pred1 <- predict(m1, newdata = df30, allow_new_levels=TRUE)

When I run the prediction, I get the error:
Error: New factor levels are not allowed.
Levels allowed: 'a', 'b', 'c', 'e', 'f', 'g', 'i'
Levels found: 'd', 'h', 'j'
m1.zip

Please find attached a zip file with the model object. Let me know if you have any issues opening it or need anything else. Cheers!

paul-buerkner · 2023-10-31T12:10:01Z

Thank you! This is way for detailed than I expected. I would have been happy with a much simpler example, but this of course works well too. I will take a look.

kat3stewart · 2023-10-31T13:02:05Z

Great, thank-you. I really appreciate it.

sealavi · 2023-11-17T04:34:12Z

I have also encountered multiple occasions recently where this feature would be extremely beneficial. Would be much appreciated if it could be implemented in the next release!

paul-buerkner added the feature label Oct 29, 2023

kat3stewart closed this as completed Oct 30, 2023

kat3stewart reopened this Oct 30, 2023

kat3stewart closed this as completed Oct 31, 2023

kat3stewart reopened this Oct 31, 2023

paul-buerkner added this to the 2.21.0 milestone Oct 31, 2023

paul-buerkner removed this from the 2.21.0 milestone Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request – predict.brms: allow new values for random effects fit with splines #1554

feature request – predict.brms: allow new values for random effects fit with splines #1554

kat3stewart commented Oct 27, 2023

pminasandra commented Oct 27, 2023

paul-buerkner commented Oct 29, 2023

kat3stewart commented Oct 30, 2023 •

edited

paul-buerkner commented Oct 30, 2023 via email

kat3stewart commented Oct 31, 2023 •

edited

paul-buerkner commented Oct 31, 2023

kat3stewart commented Oct 31, 2023

sealavi commented Nov 17, 2023

feature request – predict.brms: allow new values for random effects fit with splines #1554

feature request – predict.brms: allow new values for random effects fit with splines #1554

Comments

kat3stewart commented Oct 27, 2023

pminasandra commented Oct 27, 2023

paul-buerkner commented Oct 29, 2023

kat3stewart commented Oct 30, 2023 • edited

paul-buerkner commented Oct 30, 2023 via email

kat3stewart commented Oct 31, 2023 • edited

paul-buerkner commented Oct 31, 2023

kat3stewart commented Oct 31, 2023

sealavi commented Nov 17, 2023

kat3stewart commented Oct 30, 2023 •

edited

kat3stewart commented Oct 31, 2023 •

edited