Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dichotomous data? #87

Closed
wlandau opened this issue Apr 4, 2024 · 7 comments
Closed

Dichotomous data? #87

wlandau opened this issue Apr 4, 2024 · 7 comments

Comments

@wlandau
Copy link
Collaborator

wlandau commented Apr 4, 2024

Should we extend brms.mmrm to support dichotomous data? I think brms straightforwardly accepts nonlinear link functions, but I am wondering how this would affect #53 and #40.

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 15, 2024

For the time being, I restricted brm_model() to continuous families with identity links: a9cbfaa.

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 15, 2024

There are two main challenges for dichotomous endpoints:

  1. Estimated marginal means: Explain and verify how marginal means are computed #53
  2. Conditional means priors: Conditional means priors #40

For (2), the worst-case scenario is that non-continuous non-identity-link conditional means priors are not possible in brms, in which case we can disable conditional means priors for the non-continuous non-identity-link case. For (1), emmeans just reports marginal means on the scale of the link and makes no attempt to transform them to the scale of the response (see details below). In the case of our Bayesian model, all we have to do is invert the link function (apply the mean function) on the link-scale marginal means we already calculate. We might have to add custom support for each one, but this does not seem too hard.

``` r suppressPackageStartupMessages({ library(brms) library(coda) library(emmeans) library(mmrm) library(posterior) library(tidyverse) library(zoo) }) emm_options(sep = "|")

FEV data from the mmrm package, using LOCF and then LOCF reversed

to impute responses.

data(fev_data, package = "mmrm")
data <- fev_data %>%
mutate(FEV1_CHG = FEV1 - FEV1_BL, USUBJID = as.character(USUBJID)) %>%
select(-FEV1) %>%
group_by(USUBJID) %>%
complete(
AVISIT,
fill = as.list(.[1L, c("ARMCD", "FEV1_BL", "RACE", "SEX", "WEIGHT")])
) %>%
ungroup() %>%
arrange(USUBJID, AVISIT) %>%
group_by(USUBJID) %>%
mutate(FEV1_CHG = na.locf(FEV1_CHG, na.rm = FALSE)) %>%
mutate(FEV1_CHG = na.locf(FEV1_CHG, na.rm = FALSE, fromLast = TRUE)) %>%
ungroup() %>%
filter(!is.na(FEV1_CHG)) %>%
mutate(FEV1_CHG = as.integer(FEV1_CHG > mean(FEV1_CHG))) # dichotomous data
summary_data <- data %>%
group_by(ARMCD, AVISIT) %>%
summarize(
source = "1_data",
mean = mean(FEV1_CHG),
lower = mean(FEV1_CHG) - qnorm(0.975) * sd(FEV1_CHG) / sqrt(n()),
upper = mean(FEV1_CHG) + qnorm(0.975) * sd(FEV1_CHG) / sqrt(n()),
.groups = "drop"
)

lm + emmeans

model <- glm(
FEV1_CHG ~ FEV1_BL + FEV1_BL:AVISIT + ARMCD + ARMCD:AVISIT + AVISIT +
RACE + SEX + WEIGHT,
data = data,
family = binomial(link = "logit")
)
summary_emmeans <- emmeans(
object = model,
specs = ~ARMCD:AVISIT,
wt.nuis = "proportional",
nuisance = c("USUBJID", "RACE", "SEX")
) %>%
as.data.frame() %>%
as_tibble() %>%
select(ARMCD, AVISIT, emmean) %>%
rename(emmeans = emmean)

custom marginal means from lm draws using a custom mapping

from model parameters to marginal means.

proportional_factors <- model.matrix(
object = FEV1_CHG ~ 0 + SEX + RACE,
data = data
) %>%
colMeans() %>%
t()
grid <- data %>%
mutate(FEV1_BL = mean(FEV1_BL), FEV1_CHG = 0, WEIGHT = mean(WEIGHT)) %>%
distinct(ARMCD, AVISIT, FEV1_BL, WEIGHT, FEV1_CHG)
parameters <- model %>%
coef() %>%
t() %>%
as_tibble()
mapping <- model.matrix(
object = FEV1_CHG ~ FEV1_BL + FEV1_BL:AVISIT + ARMCD + ARMCD:AVISIT +
AVISIT + WEIGHT,
data = grid
) %>%
bind_cols(proportional_factors)
stopifnot(all(colnames(parameters) %in% colnames(mapping)))
mapping <- as.matrix(mapping)[, colnames(parameters)]
rownames(mapping) <- paste(grid$ARMCD, grid$AVISIT, sep = "|")
custom <- as.matrix(parameters) %*% t(mapping) %>%
as.data.frame() %>%
as_tibble()
summary_custom <- custom %>%
pivot_longer(everything()) %>%
separate("name", c("ARMCD", "AVISIT")) %>%
group_by(ARMCD, AVISIT) %>%
summarize(custom = mean(value), .groups = "drop")

Compare results

summary <- summary_custom %>%
left_join(y = summary_emmeans, by = c("ARMCD", "AVISIT")) %>%
mutate(difference = custom - emmeans)
print(summary)
#> # A tibble: 8 × 5
#> ARMCD AVISIT custom emmeans difference
#>
#> 1 PBO VIS1 -1.40 -1.40 0
#> 2 PBO VIS2 -1.43 -1.43 0
#> 3 PBO VIS3 -0.0198 -0.0198 -1.39e-17
#> 4 PBO VIS4 0.971 0.971 2.22e-16
#> 5 TRT VIS1 -0.568 -0.568 0
#> 6 TRT VIS2 0.131 0.131 1.11e-16
#> 7 TRT VIS3 1.08 1.08 0
#> 8 TRT VIS4 1.73 1.73 2.22e-16


<sup>Created on 2024-04-15 with [reprex v2.1.0](https://reprex.tidyverse.org)</sup>
</details>

@chstock
Copy link
Collaborator

chstock commented Apr 16, 2024

We may want to focus on "unconditional treatment effects" in the sense of the recently updated FDA guideline on Adjusting for Covariates in Randomized Clinical Trials (see section III C. on "Nonlinear Models").

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 16, 2024

If I understand correctly, I think we are mostly using unconditional treatment effects, i.e. regress on RACE as a covariate but report non-RACE-specific marginal means and treatment effects. We only calculate conditional treatment effects when formally declaring a factor as a subgroup. Seems like this would carry over to from the continuous outcome case to the dichotomous outcome scenario just fine. Please let me know if I am missing something.

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 22, 2024

I wonder, if we allow a non-identity link and dichotomous data, would we have to change the package name to something like brms.glmm?

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 29, 2024

And would it make sense anymore to regress on baseline?

@wlandau
Copy link
Collaborator Author

wlandau commented May 20, 2024

If we use a link function for dichotomous data, we need a distributional family for dichotomous data, such as bernoulli. If we use a different family, the parameters change, and there could be any number of distributional parameters instead of the current sigma. This not only affects the "effect size" calculation in brm_marginal_draws() (we would just need to exclude it from the output), but also the way any non-sigma distributional parameters are specified in brm_formula(). And of course the Stan code is very different because the model is completely different. From make_stancode(), I see brms even models the unscaled residuals as actual parameters for the Bernoulli and binomial cases. Supporting any one family is a lot to ask, and supporting every family in brms is not feasible.

I am favor of excluding non-Gaussian families from the package, or at least waiting until we know exactly which family or families to bring within scope.

Transferring this issue thread to a discussion thread.

@wlandau wlandau removed the order: 4 label May 20, 2024
@openpharma openpharma locked and limited conversation to collaborators May 20, 2024
@wlandau wlandau converted this issue into discussion #106 May 20, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants