Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative modeling workflows for specific parameterizations and informative prior cases #96

Closed
wlandau opened this issue Apr 22, 2024 · 9 comments · Fixed by #100
Closed
Labels

Comments

@wlandau
Copy link
Collaborator

wlandau commented Apr 22, 2024

Following up on #93, we could have different end-to-end workflows for specific informative prior use cases, such as successive differences (e.g. #92). We could begin with brm_data(), have specific functions like brm_data_sdif() to create case-specific classed datasets, and then make functions like brm_formula() and brm_transform_marginal() could become S3 generics with case-specific methods. We could also have a new S3 generic for informative prior specification which would only apply to those specific cases.

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 22, 2024

Would be good to have cell-means-like sdif, ordinary cell means, and the treatment effect case.

@wlandau wlandau changed the title Alternative modeling workflows for specific informative prior cases Alternative modeling workflows for specific parameterizations and informative prior cases Apr 22, 2024
@wlandau wlandau added order: 1 and removed order: 2 labels Apr 23, 2024
@wlandau
Copy link
Collaborator Author

wlandau commented Apr 23, 2024

I think the term "informative prior scenario" fits here, with functions like brm_scenario_successive() and brm_scenario_cells() which do the reparameterization and reference level adjustment, then return special classed data frames with the transformed data.

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 26, 2024

In branch 96, I implemented a new brm_scenario_successive_cells() function to create the cell-means-like successive differences scenario for both subgroup and non-subgroup cases. Say we have this simulated dataset:

set.seed(0L)
data <- brm_simulate_outline(
  n_group = 2,
  n_patient = 100,
  n_time = 4,
  rate_dropout = 0,
  rate_lapse = 0
) |>
  dplyr::mutate(response = rnorm(n = dplyr::n())) |>
  brm_data_change() |>
  brm_simulate_continuous(names = c("biomarker1", "biomarker2")) |>
  brm_simulate_categorical(
    names = c("status1", "status2"),
    levels = c("present", "absent")
  ) |>
  dplyr::mutate(response = rnorm(n = dplyr::n()))
dplyr::select(
  data,
  group,
  time,
  patient,
  starts_with("biomarker"),
  starts_with("status")
)
#> # A tibble: 600 × 7
#>    group   time   patient     biomarker1 biomarker2 status1 status2
#>    <chr>   <chr>  <chr>            <dbl>      <dbl> <chr>   <chr>  
#>  1 group_1 time_2 patient_001     -1.42      -0.287 absent  present
#>  2 group_1 time_3 patient_001     -1.42      -0.287 absent  present
#>  3 group_1 time_4 patient_001     -1.42      -0.287 absent  present
#>  4 group_1 time_2 patient_002     -1.67       1.84  absent  present
#>  5 group_1 time_3 patient_002     -1.67       1.84  absent  present
#>  6 group_1 time_4 patient_002     -1.67       1.84  absent  present
#>  7 group_1 time_2 patient_003      1.38      -0.157 absent  absent 
#>  8 group_1 time_3 patient_003      1.38      -0.157 absent  absent 
#>  9 group_1 time_4 patient_003      1.38      -0.157 absent  absent 
#> 10 group_1 time_2 patient_004     -0.920     -1.39  present present

brm_scenario_successive_cells(data) creates a new classed dataset with extra columns.

scenario <- brm_scenario_successive_cells(data)
scenario
#> # A tibble: 600 × 23
#>    x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2 x_group_2_time_3
#>  *            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
#>  1                1                0                0                0                0
#>  2                1                1                0                0                0
#>  3                1                1                1                0                0
#>  4                1                0                0                0                0
#>  5                1                1                0                0                0
#>  6                1                1                1                0                0
#>  7                1                0                0                0                0
#>  8                1                1                0                0                0
#>  9                1                1                1                0                0
#> 10                1                0                0                0                0
#> # ℹ 590 more rows
#> # ℹ 18 more variables: x_group_2_time_4 <dbl>, nuisance_biomarker1 <dbl>, nuisance_biomarker2 <dbl>,
#> #   nuisance_baseline <dbl>, nuisance_status1absent <dbl>, nuisance_status1present <dbl>,
#> #   nuisance_status2present <dbl>, patient <chr>, time <chr>, change <dbl>, missing <lgl>,
#> #   baseline <dbl>, group <chr>, biomarker1 <dbl>, biomarker2 <dbl>, status1 <chr>, status2 <chr>,
#> #   response <dbl>
#> # ℹ Use `print(n = ...)` to see more rows

All original columns from the dataset are in there, and there are a bunch of new columns that will be modeled later. There are binary columns to encode successive differences on the factors of interest (group and time, and subgroup if it exists):

scenario[, attr(scenario, "brm_scenario_interest")]
#> # A tibble: 600 × 6
#>    x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2 x_group_2_time_3 x_group_2_time_4
#>               <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
#>  1                1                0                0                0                0                0
#>  2                1                1                0                0                0                0
#>  3                1                1                1                0                0                0
#>  4                1                0                0                0                0                0
#>  5                1                1                0                0                0                0
#>  6                1                1                1                0                0                0
#>  7                1                0                0                0                0                0
#>  8                1                1                0                0                0                0
#>  9                1                1                1                0                0                0
#> 10                1                0                0                0                0                0
#> # ℹ 590 more rows
#> # ℹ Use `print(n = ...)` to see more rows

The nuisance variables are one-hot-encoded and centered at their means so that the factors of interest are not implicitly conditional on a subset of the data:

scenario[, attr(scenario, "brm_scenario_nuisance")]
#> # A tibble: 600 × 6
#>    nuisance_biomarker1 nuisance_biomarker2 nuisance_baseline nuisance_status1absent nuisance_status1present nuisance_status2present
#>                  <dbl>               <dbl>             <dbl>                  <dbl>                   <dbl>                   <dbl>
#>  1              -1.37               -0.231            1.30                     0.52                   -0.52                   0.585
#>  2              -1.37               -0.231            1.30                     0.52                   -0.52                   0.585
#>  3              -1.37               -0.231            1.30                     0.52                   -0.52                   0.585
#>  4              -1.61                1.90             0.450                    0.52                   -0.52                   0.585
#>  5              -1.61                1.90             0.450                    0.52                   -0.52                   0.585
#>  6              -1.61                1.90             0.450                    0.52                   -0.52                   0.585
#>  7               1.43               -0.101            0.0297                   0.52                   -0.52                  -0.415
#>  8               1.43               -0.101            0.0297                   0.52                   -0.52                  -0.415
#>  9               1.43               -0.101            0.0297                   0.52                   -0.52                  -0.415
#> 10              -0.865              -1.33            -1.11                    -0.48                    0.48                   0.585
#> # ℹ 590 more rows
#> # ℹ Use `print(n = ...)` to see more rows

Lastly, there is a "parameterization" object which maps the generated factors of interest with the actual levels in the data. This will be important for assigning informative priors without having to guess how brms and model.matrix() generate the names of columns and fixed effects.

attr(scenario, "brm_scenario_parameterization")
#> # A tibble: 6 × 3
#>   group   time   variable        
#>   <chr>   <chr>  <chr>           
#> 1 group_1 time_2 x_group_1_time_2
#> 2 group_1 time_3 x_group_1_time_3
#> 3 group_1 time_4 x_group_1_time_4
#> 4 group_2 time_2 x_group_2_time_2
#> 5 group_2 time_3 x_group_2_time_3
#> 6 group_2 time_4 x_group_2_time_4

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 26, 2024

Going forward, I plan to make functions brm_formula() and brm_transform_marginal() S3 generics with scenario-specific methods. Then, I plan to add a brm_prior() function to specify informative priors for scenarios.

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 30, 2024

I think "archetype" is a better name than "scenario" or "parameterization" for these specialized cases like sdif.

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 30, 2024

Changed to "archetype" in https://github.com/openpharma/brms.mmrm/tree/96

@wlandau
Copy link
Collaborator Author

wlandau commented May 2, 2024

Here is a preview of the workflow for informative prior archetypes. I think it is coming together nicely.

library(brms.mmrm)

# Simulate the data.
set.seed(0L)
data <- brm_simulate_outline(
  n_group = 2,
  n_patient = 100,
  n_time = 4,
  rate_dropout = 0,
  rate_lapse = 0
) |>
  dplyr::mutate(response = rnorm(n = dplyr::n())) |>
  brm_data_change() |>
  brm_simulate_continuous(names = c("biomarker1", "biomarker2")) |>
  brm_simulate_categorical(
    names = c("status1", "status2"),
    levels = c("present", "absent")
  )
dplyr::select(
  data,
  group,
  time,
  patient,
  starts_with("biomarker"),
  starts_with("status")
)
#> # A tibble: 600 × 7
#>    group   time   patient     biomarker1 biomarker2 status1 status2
#>    <chr>   <chr>  <chr>            <dbl>      <dbl> <chr>   <chr>  
#>  1 group_1 time_2 patient_001     -1.42      -0.287 absent  present
#>  2 group_1 time_3 patient_001     -1.42      -0.287 absent  present
#>  3 group_1 time_4 patient_001     -1.42      -0.287 absent  present
#>  4 group_1 time_2 patient_002     -1.67       1.84  absent  present
#>  5 group_1 time_3 patient_002     -1.67       1.84  absent  present
#>  6 group_1 time_4 patient_002     -1.67       1.84  absent  present
#>  7 group_1 time_2 patient_003      1.38      -0.157 absent  absent 
#>  8 group_1 time_3 patient_003      1.38      -0.157 absent  absent 
#>  9 group_1 time_4 patient_003      1.38      -0.157 absent  absent 
#> 10 group_1 time_2 patient_004     -0.920     -1.39  present present
#> # ℹ 590 more rows

# Create a successive differences archetype.
archetype <- brm_archetype_successive_cells(data)
archetype
#> # A tibble: 600 × 23
#>    x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2
#>  *            <dbl>            <dbl>            <dbl>            <dbl>
#>  1                1                0                0                0
#>  2                1                1                0                0
#>  3                1                1                1                0
#>  4                1                0                0                0
#>  5                1                1                0                0
#>  6                1                1                1                0
#>  7                1                0                0                0
#>  8                1                1                0                0
#>  9                1                1                1                0
#> 10                1                0                0                0
#> # ℹ 590 more rows
#> # ℹ 19 more variables: x_group_2_time_3 <dbl>, x_group_2_time_4 <dbl>,
#> #   nuisance_biomarker1 <dbl>, nuisance_biomarker2 <dbl>,
#> #   nuisance_status1_absent <dbl>, nuisance_status2_present <dbl>,
#> #   nuisance_baseline <dbl>, nuisance_baseline.timetime_2 <dbl>,
#> #   nuisance_baseline.timetime_3 <dbl>, patient <chr>, time <chr>,
#> #   change <dbl>, missing <lgl>, baseline <dbl>, group <chr>, …
formula <- brm_formula(archetype)
formula
#> change ~ 0 + x_group_1_time_2 + x_group_1_time_3 + x_group_1_time_4 + x_group_2_time_2 + x_group_2_time_3 + x_group_2_time_4 + nuisance_biomarker1 + nuisance_biomarker2 + nuisance_status1_absent + nuisance_status2_present + nuisance_baseline + nuisance_baseline.timetime_2 + nuisance_baseline.timetime_3 + unstr(time = time, gr = patient) 
#> sigma ~ 0 + time

# Put priors on intercepts and successive differences.
label <- brm_prior_label(
  code = "normal(1, 1)",
  group = "group_1",
  time = "time_2"
) |>
  brm_prior_label("normal(1, 2)", group = "group_1", time = "time_3") |>
  brm_prior_label("normal(1, 3)", group = "group_1", time = "time_4") |>
  brm_prior_label("normal(2, 1)", group = "group_2", time = "time_2") |>
  brm_prior_label("normal(2, 2)", group = "group_2", time = "time_3") |>
  brm_prior_label("normal(2, 3)", group = "group_2", time = "time_4")
label
#> # A tibble: 6 × 3
#>   code         group   time  
#>   <chr>        <chr>   <chr> 
#> 1 normal(1, 1) group_1 time_2
#> 2 normal(1, 2) group_1 time_3
#> 3 normal(1, 3) group_1 time_4
#> 4 normal(2, 1) group_2 time_2
#> 5 normal(2, 2) group_2 time_3
#> 6 normal(2, 3) group_2 time_4
prior <- brm_prior_archetype(archetype, label)
prior
#>         prior class             coef group resp dpar nlpar   lb   ub source
#>  normal(1, 1)     b x_group_1_time_2                       <NA> <NA>   user
#>  normal(1, 2)     b x_group_1_time_3                       <NA> <NA>   user
#>  normal(1, 3)     b x_group_1_time_4                       <NA> <NA>   user
#>  normal(2, 1)     b x_group_2_time_2                       <NA> <NA>   user
#>  normal(2, 2)     b x_group_2_time_3                       <NA> <NA>   user
#>  normal(2, 3)     b x_group_2_time_4                       <NA> <NA>   user
class(prior)
#> [1] "brmsprior"  "data.frame"

# Run the model on the archetype with the prior.
model <- brm_model(
  data = archetype,
  formula = formula,
  prior = prior,
  refresh = 0
)

# Our custom informative prior was actually used for successive differences.
brms::prior_summary(model)
#>                 prior    class                         coef group resp  dpar
#>                (flat)        b                                              
#>                (flat)        b            nuisance_baseline                 
#>                (flat)        b nuisance_baseline.timetime_2                 
#>                (flat)        b nuisance_baseline.timetime_3                 
#>                (flat)        b          nuisance_biomarker1                 
#>                (flat)        b          nuisance_biomarker2                 
#>                (flat)        b      nuisance_status1_absent                 
#>                (flat)        b     nuisance_status2_present                 
#>          normal(1, 1)        b             x_group_1_time_2                 
#>          normal(1, 2)        b             x_group_1_time_3                 
#>          normal(1, 3)        b             x_group_1_time_4                 
#>          normal(2, 1)        b             x_group_2_time_2                 
#>          normal(2, 2)        b             x_group_2_time_3                 
#>          normal(2, 3)        b             x_group_2_time_4                 
#>                (flat)        b                                         sigma
#>                (flat)        b                   timetime_2            sigma
#>                (flat)        b                   timetime_3            sigma
#>                (flat)        b                   timetime_4            sigma
#>  lkj_corr_cholesky(1) Lcortime                                              
#>  nlpar lb ub       source
#>                   default
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>                      user
#>                      user
#>                      user
#>                      user
#>                      user
#>                      user
#>                   default
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>                   default

# Plot the marginal means against the data.
draws <- brm_marginal_draws(
  data = archetype,
  formula = formula,
  model = model
)
summaries_model <- brm_marginal_summaries(draws)
summaries_data <- brm_marginal_data(data)
brm_plot_compare(model = summaries_model, data = summaries_data)

@wlandau
Copy link
Collaborator Author

wlandau commented May 2, 2024

I think I have a reasonable print method. The cool thing about it is that it uses the marginal mean transformation matrix, which means we don't need a separate method for each individual scenario.

# This tibble is an informative prior archetype in brms.mmrm.
# The fixed effect parameters of interest express the marginal means
# as follows (on the link scale):
# 
#    group_1:time_2 = x_group_1_time_2
#    group_1:time_3 = x_group_1_time_2 + x_group_1_time_3
#    group_1:time_4 = x_group_1_time_2 + x_group_1_time_3 + x_group_1_time_4
#    group_2:time_2 = x_group_2_time_2
#    group_2:time_3 = x_group_2_time_2 + x_group_2_time_3
#    group_2:time_4 = x_group_2_time_2 + x_group_2_time_3 + x_group_2_time_4
# 
# The contents of the tibble are as follows:
# 
# A tibble: 600 × 23
   x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2 x_group_2_time_3
 *            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
 1                1                0                0                0                0
 2                1                1                0                0                0
 3                1                1                1                0                0
 4                1                0                0                0                0
 5                1                1                0                0                0
 6                1                1                1                0                0
 7                1                0                0                0                0
 8                1                1                0                0                0
 9                1                1                1                0                0
10                1                0                0                0                0
# ℹ 590 more rows
# ℹ 18 more variables: x_group_2_time_4 <dbl>, nuisance_biomarker1 <dbl>,
#   nuisance_biomarker2 <dbl>, nuisance_status1_absent <dbl>,
#   nuisance_status2_present <dbl>, nuisance_baseline <dbl>,
#   nuisance_baseline.timetime_2 <dbl>, nuisance_baseline.timetime_3 <dbl>, patient <chr>,
#   time <chr>, change <dbl>, missing <lgl>, baseline <dbl>, group <chr>, biomarker1 <dbl>,
#   biomarker2 <dbl>, status1 <chr>, status2 <chr>
# ℹ Use `print(n = ...)` to see more rows

@wlandau
Copy link
Collaborator Author

wlandau commented May 2, 2024

hmmm maybe it's better as a summary method after all, it sure takes up a lot of space for large datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant