Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weights in qgam #39

Open
LisaHuelsmann opened this issue Apr 13, 2021 · 4 comments
Open

weights in qgam #39

LisaHuelsmann opened this issue Apr 13, 2021 · 4 comments

Comments

@LisaHuelsmann
Copy link

Hi Matteo,
I am running a two-step analysis:

  1. I estimate average marginal effects (i.e. derivatives at the response scale) from several binomial model (fitted with mgcv::gam()).
  2. I want to estimate how these average marginal effects are affected by two predictors. The distribution of the average marginal effects has very heavy tails, so I thought quantile regression (i.e. qgam with qu = 0.5) may be a good option.

Now, I am struggling with how I can accurately propagate the uncertainty in these estimates. Option 1 would be to weight with 1/var, but I don't think this is accurate in a quantile setting. Option 2 would be to use the distribution of the posterior samples of the average marginal effects directly that I simulated to get the variances. So I tested this with qgam but then I realized that qgam doesnt downweight the repeated samples per average marginal effect when using weight = 1/nsamples. I tested this in glmmTMB, where the weights are directly acting on the likelihood, which would be what I would need for qgam too.

Any idea what I could do instead? Thanks for you help...

Here an illustrative example of what I am looking for:

# data generation ---------------------------------------------------------
set.seed(1)
dat = mgcv::gamSim(1, n = 30)
dat1 = dat[rep(1:nrow(dat), each = 10), ]

I simply repeat each observation 10 times in the second dataset.

# qgam --------------------------------------------------------------------

mod = qgam(y ~ x0 + x1 + x2
           , data = dat
           , qu = 0.5
           )
mod1 = qgam(y ~ x0 + x1 + x2
          , data = dat1
          , qu = 0.5
          , argGam = list(weights = rep(1/10, nrow(dat1)))
          )
summary(mod)
summary(mod1)

My hope was that both models result in the same effective sample size, but apperently they don't and the CI are more narrow for mod1.

# glmmTMB -----------------------------------------------------------------

mod = glmmTMB(y ~ x0 + x1 + x2
              , data = dat
)
mod1 = glmmTMB(y ~ x0 + x1 + x2
               , data = dat1
               , weights = rep(1/10,  nrow(dat1)) 
)
summary(mod)
summary(mod1)

For glmmTMB, the downweighting works and both models produce the same result.

@mfasiolo
Copy link
Owner

Hi Lisa,

I am not entirely getting what you are trying to do. Note that here qgam behaves in the same way as mgcv::gam, so probably this would require changing mgcv.
I don't understand why you want to use a data set where an observation is repeated several times and then you want to down weight the repeated observations. Can't you just remove the duplicates? Obviously I am missing something!

@Waschoi
Copy link

Waschoi commented Jul 12, 2023

I have the problem as well, that I can use weights in mgcv but not in qgam. In my case I am using survey weights.
If helpfull, I can give yu a reprex

@mfasiolo
Copy link
Owner

Hi, yes please, a (minimal) reprex would be helpful.

@Waschoi
Copy link

Waschoi commented Aug 17, 2023

Hello,
sorry it took me so long to reply.
Here is an example. Data simulation is not my strong suit, but it should show the problem.
Normally we have survey weights, for example younger people are weighted higher there than older people, because they participate less often in surveys. Accordingly, there are different results.

library(tidyverse)
library(mgcv)
library(qgam)

n <- 3000
data <- tribble(~price, ~group,
                rbeta(n,20,2), "middle",
                rbeta(n,40,1), "new",
                rbeta(n,10,7), "old"
                     )%>%
  mutate(age = map(.x = group, ~case_when(
    .x == "old" ~ rbinom(n, size = 10, prob = .5),
    .x == "middle" ~ rbinom(n, size = 5, prob = .3),
    .x == "new" ~ rbinom(n, size = 4, prob = .2)
  )))%>%
  mutate(price = map(.x = price, ~sort(.x, )))%>%
  mutate(age = map(.x = age, ~sort(.x, decreasing = T)))%>%
  unnest()%>%
  mutate(price = round(price*10, 2))%>%
  mutate(weight = map_dbl(.x = group, ~case_when(
    .x == "old" ~ abs(rnorm(1, mean = 2, sd = .5)),
    .x == "middle" ~ 1,
    .x == "new" ~ abs(rnorm(1, mean = 1, sd = .5))
  )))%>%
  mutate(weight = case_when(
    price < 4 & age > 8 ~ weight*2,
    price > 9 & age > 3 ~ weight/2,
    price > 5 & age > 3 ~ weight*2,
    price > 10 & age > 2 ~ weight*2,
    T ~ weight
  ))%>%
  mutate(group = as.factor(group))




data%>%
  ggplot(aes(x = age, y = price, col = group, group = group))+
  geom_smooth(method = "loess")


gam(price ~ s(age, by = group) + group, data = data) -> gam_wo_weight
gam(price ~ s(age, by = group) + group, data = data, weights = weight) -> gam_w_weight

qgam(price ~ s(age, by = group) + group, data = data, qu = .5) -> qgam_wo_weight
#not working
#qgam(price ~ s(age, by = group) + group, data = data, weights = weight, .5) -> qgam_w_weight

library(ggeffects)
#library(gratia)
library(patchwork)

ggpredict(gam_wo_weight, terms = c("age", "group"))%>%
  plot()+
  scale_y_continuous(limits = c(-2, 12))+
  labs(title = "gam without weights")+
ggpredict(gam_w_weight, terms = c("age", "group"))%>%
  plot()+
  scale_y_continuous(limits = c(-2, 12))+
  labs(title = "gam with weights")+
ggpredict(qgam_wo_weight, terms = c("age", "group"))%>%
  plot()+
  scale_y_continuous(limits = c(-2, 12))+
  labs(title = "qgam without weights")+
  plot_layout(guides = 'collect') &
  theme(legend.position = "bottom")

sjPlot::tab_model(gam_wo_weight, gam_w_weight, qgam_wo_weight,
                  dv.labels = c("gam without weights", "gam with weights", "qgam without weights"))


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants