Alternative modeling workflows for specific parameterizations and informative prior cases #96

wlandau · 2024-04-22T19:04:29Z

Following up on #93, we could have different end-to-end workflows for specific informative prior use cases, such as successive differences (e.g. #92). We could begin with brm_data(), have specific functions like brm_data_sdif() to create case-specific classed datasets, and then make functions like brm_formula() and brm_transform_marginal() could become S3 generics with case-specific methods. We could also have a new S3 generic for informative prior specification which would only apply to those specific cases.

The text was updated successfully, but these errors were encountered:

wlandau · 2024-04-22T19:05:42Z

Would be good to have cell-means-like sdif, ordinary cell means, and the treatment effect case.

wlandau · 2024-04-23T15:40:44Z

I think the term "informative prior scenario" fits here, with functions like brm_scenario_successive() and brm_scenario_cells() which do the reparameterization and reference level adjustment, then return special classed data frames with the transformed data.

wlandau · 2024-04-26T19:39:25Z

In branch 96, I implemented a new brm_scenario_successive_cells() function to create the cell-means-like successive differences scenario for both subgroup and non-subgroup cases. Say we have this simulated dataset:

set.seed(0L)
data <- brm_simulate_outline(
  n_group = 2,
  n_patient = 100,
  n_time = 4,
  rate_dropout = 0,
  rate_lapse = 0
) |>
  dplyr::mutate(response = rnorm(n = dplyr::n())) |>
  brm_data_change() |>
  brm_simulate_continuous(names = c("biomarker1", "biomarker2")) |>
  brm_simulate_categorical(
    names = c("status1", "status2"),
    levels = c("present", "absent")
  ) |>
  dplyr::mutate(response = rnorm(n = dplyr::n()))
dplyr::select(
  data,
  group,
  time,
  patient,
  starts_with("biomarker"),
  starts_with("status")
)
#> # A tibble: 600 × 7
#>    group   time   patient     biomarker1 biomarker2 status1 status2
#>    <chr>   <chr>  <chr>            <dbl>      <dbl> <chr>   <chr>  
#>  1 group_1 time_2 patient_001     -1.42      -0.287 absent  present
#>  2 group_1 time_3 patient_001     -1.42      -0.287 absent  present
#>  3 group_1 time_4 patient_001     -1.42      -0.287 absent  present
#>  4 group_1 time_2 patient_002     -1.67       1.84  absent  present
#>  5 group_1 time_3 patient_002     -1.67       1.84  absent  present
#>  6 group_1 time_4 patient_002     -1.67       1.84  absent  present
#>  7 group_1 time_2 patient_003      1.38      -0.157 absent  absent 
#>  8 group_1 time_3 patient_003      1.38      -0.157 absent  absent 
#>  9 group_1 time_4 patient_003      1.38      -0.157 absent  absent 
#> 10 group_1 time_2 patient_004     -0.920     -1.39  present present

brm_scenario_successive_cells(data) creates a new classed dataset with extra columns.

scenario <- brm_scenario_successive_cells(data)
scenario
#> # A tibble: 600 × 23
#>    x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2 x_group_2_time_3
#>  *            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
#>  1                1                0                0                0                0
#>  2                1                1                0                0                0
#>  3                1                1                1                0                0
#>  4                1                0                0                0                0
#>  5                1                1                0                0                0
#>  6                1                1                1                0                0
#>  7                1                0                0                0                0
#>  8                1                1                0                0                0
#>  9                1                1                1                0                0
#> 10                1                0                0                0                0
#> # ℹ 590 more rows
#> # ℹ 18 more variables: x_group_2_time_4 <dbl>, nuisance_biomarker1 <dbl>, nuisance_biomarker2 <dbl>,
#> #   nuisance_baseline <dbl>, nuisance_status1absent <dbl>, nuisance_status1present <dbl>,
#> #   nuisance_status2present <dbl>, patient <chr>, time <chr>, change <dbl>, missing <lgl>,
#> #   baseline <dbl>, group <chr>, biomarker1 <dbl>, biomarker2 <dbl>, status1 <chr>, status2 <chr>,
#> #   response <dbl>
#> # ℹ Use `print(n = ...)` to see more rows

All original columns from the dataset are in there, and there are a bunch of new columns that will be modeled later. There are binary columns to encode successive differences on the factors of interest (group and time, and subgroup if it exists):

scenario[, attr(scenario, "brm_scenario_interest")]
#> # A tibble: 600 × 6
#>    x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2 x_group_2_time_3 x_group_2_time_4
#>               <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
#>  1                1                0                0                0                0                0
#>  2                1                1                0                0                0                0
#>  3                1                1                1                0                0                0
#>  4                1                0                0                0                0                0
#>  5                1                1                0                0                0                0
#>  6                1                1                1                0                0                0
#>  7                1                0                0                0                0                0
#>  8                1                1                0                0                0                0
#>  9                1                1                1                0                0                0
#> 10                1                0                0                0                0                0
#> # ℹ 590 more rows
#> # ℹ Use `print(n = ...)` to see more rows

The nuisance variables are one-hot-encoded and centered at their means so that the factors of interest are not implicitly conditional on a subset of the data:

scenario[, attr(scenario, "brm_scenario_nuisance")]
#> # A tibble: 600 × 6
#>    nuisance_biomarker1 nuisance_biomarker2 nuisance_baseline nuisance_status1absent nuisance_status1present nuisance_status2present
#>                  <dbl>               <dbl>             <dbl>                  <dbl>                   <dbl>                   <dbl>
#>  1              -1.37               -0.231            1.30                     0.52                   -0.52                   0.585
#>  2              -1.37               -0.231            1.30                     0.52                   -0.52                   0.585
#>  3              -1.37               -0.231            1.30                     0.52                   -0.52                   0.585
#>  4              -1.61                1.90             0.450                    0.52                   -0.52                   0.585
#>  5              -1.61                1.90             0.450                    0.52                   -0.52                   0.585
#>  6              -1.61                1.90             0.450                    0.52                   -0.52                   0.585
#>  7               1.43               -0.101            0.0297                   0.52                   -0.52                  -0.415
#>  8               1.43               -0.101            0.0297                   0.52                   -0.52                  -0.415
#>  9               1.43               -0.101            0.0297                   0.52                   -0.52                  -0.415
#> 10              -0.865              -1.33            -1.11                    -0.48                    0.48                   0.585
#> # ℹ 590 more rows
#> # ℹ Use `print(n = ...)` to see more rows

Lastly, there is a "parameterization" object which maps the generated factors of interest with the actual levels in the data. This will be important for assigning informative priors without having to guess how brms and model.matrix() generate the names of columns and fixed effects.

attr(scenario, "brm_scenario_parameterization")
#> # A tibble: 6 × 3
#>   group   time   variable        
#>   <chr>   <chr>  <chr>           
#> 1 group_1 time_2 x_group_1_time_2
#> 2 group_1 time_3 x_group_1_time_3
#> 3 group_1 time_4 x_group_1_time_4
#> 4 group_2 time_2 x_group_2_time_2
#> 5 group_2 time_3 x_group_2_time_3
#> 6 group_2 time_4 x_group_2_time_4

wlandau · 2024-04-26T19:40:34Z

Going forward, I plan to make functions brm_formula() and brm_transform_marginal() S3 generics with scenario-specific methods. Then, I plan to add a brm_prior() function to specify informative priors for scenarios.

wlandau · 2024-04-30T17:22:12Z

I think "archetype" is a better name than "scenario" or "parameterization" for these specialized cases like sdif.

wlandau · 2024-04-30T19:19:09Z

Changed to "archetype" in https://github.com/openpharma/brms.mmrm/tree/96

wlandau · 2024-05-02T17:43:39Z

Here is a preview of the workflow for informative prior archetypes. I think it is coming together nicely.

library(brms.mmrm)

# Simulate the data.
set.seed(0L)
data <- brm_simulate_outline(
  n_group = 2,
  n_patient = 100,
  n_time = 4,
  rate_dropout = 0,
  rate_lapse = 0
) |>
  dplyr::mutate(response = rnorm(n = dplyr::n())) |>
  brm_data_change() |>
  brm_simulate_continuous(names = c("biomarker1", "biomarker2")) |>
  brm_simulate_categorical(
    names = c("status1", "status2"),
    levels = c("present", "absent")
  )
dplyr::select(
  data,
  group,
  time,
  patient,
  starts_with("biomarker"),
  starts_with("status")
)
#> # A tibble: 600 × 7
#>    group   time   patient     biomarker1 biomarker2 status1 status2
#>    <chr>   <chr>  <chr>            <dbl>      <dbl> <chr>   <chr>  
#>  1 group_1 time_2 patient_001     -1.42      -0.287 absent  present
#>  2 group_1 time_3 patient_001     -1.42      -0.287 absent  present
#>  3 group_1 time_4 patient_001     -1.42      -0.287 absent  present
#>  4 group_1 time_2 patient_002     -1.67       1.84  absent  present
#>  5 group_1 time_3 patient_002     -1.67       1.84  absent  present
#>  6 group_1 time_4 patient_002     -1.67       1.84  absent  present
#>  7 group_1 time_2 patient_003      1.38      -0.157 absent  absent 
#>  8 group_1 time_3 patient_003      1.38      -0.157 absent  absent 
#>  9 group_1 time_4 patient_003      1.38      -0.157 absent  absent 
#> 10 group_1 time_2 patient_004     -0.920     -1.39  present present
#> # ℹ 590 more rows

# Create a successive differences archetype.
archetype <- brm_archetype_successive_cells(data)
archetype
#> # A tibble: 600 × 23
#>    x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2
#>  *            <dbl>            <dbl>            <dbl>            <dbl>
#>  1                1                0                0                0
#>  2                1                1                0                0
#>  3                1                1                1                0
#>  4                1                0                0                0
#>  5                1                1                0                0
#>  6                1                1                1                0
#>  7                1                0                0                0
#>  8                1                1                0                0
#>  9                1                1                1                0
#> 10                1                0                0                0
#> # ℹ 590 more rows
#> # ℹ 19 more variables: x_group_2_time_3 <dbl>, x_group_2_time_4 <dbl>,
#> #   nuisance_biomarker1 <dbl>, nuisance_biomarker2 <dbl>,
#> #   nuisance_status1_absent <dbl>, nuisance_status2_present <dbl>,
#> #   nuisance_baseline <dbl>, nuisance_baseline.timetime_2 <dbl>,
#> #   nuisance_baseline.timetime_3 <dbl>, patient <chr>, time <chr>,
#> #   change <dbl>, missing <lgl>, baseline <dbl>, group <chr>, …
formula <- brm_formula(archetype)
formula
#> change ~ 0 + x_group_1_time_2 + x_group_1_time_3 + x_group_1_time_4 + x_group_2_time_2 + x_group_2_time_3 + x_group_2_time_4 + nuisance_biomarker1 + nuisance_biomarker2 + nuisance_status1_absent + nuisance_status2_present + nuisance_baseline + nuisance_baseline.timetime_2 + nuisance_baseline.timetime_3 + unstr(time = time, gr = patient) 
#> sigma ~ 0 + time

# Put priors on intercepts and successive differences.
label <- brm_prior_label(
  code = "normal(1, 1)",
  group = "group_1",
  time = "time_2"
) |>
  brm_prior_label("normal(1, 2)", group = "group_1", time = "time_3") |>
  brm_prior_label("normal(1, 3)", group = "group_1", time = "time_4") |>
  brm_prior_label("normal(2, 1)", group = "group_2", time = "time_2") |>
  brm_prior_label("normal(2, 2)", group = "group_2", time = "time_3") |>
  brm_prior_label("normal(2, 3)", group = "group_2", time = "time_4")
label
#> # A tibble: 6 × 3
#>   code         group   time  
#>   <chr>        <chr>   <chr> 
#> 1 normal(1, 1) group_1 time_2
#> 2 normal(1, 2) group_1 time_3
#> 3 normal(1, 3) group_1 time_4
#> 4 normal(2, 1) group_2 time_2
#> 5 normal(2, 2) group_2 time_3
#> 6 normal(2, 3) group_2 time_4
prior <- brm_prior_archetype(archetype, label)
prior
#>         prior class             coef group resp dpar nlpar   lb   ub source
#>  normal(1, 1)     b x_group_1_time_2                       <NA> <NA>   user
#>  normal(1, 2)     b x_group_1_time_3                       <NA> <NA>   user
#>  normal(1, 3)     b x_group_1_time_4                       <NA> <NA>   user
#>  normal(2, 1)     b x_group_2_time_2                       <NA> <NA>   user
#>  normal(2, 2)     b x_group_2_time_3                       <NA> <NA>   user
#>  normal(2, 3)     b x_group_2_time_4                       <NA> <NA>   user
class(prior)
#> [1] "brmsprior"  "data.frame"

# Run the model on the archetype with the prior.
model <- brm_model(
  data = archetype,
  formula = formula,
  prior = prior,
  refresh = 0
)

# Our custom informative prior was actually used for successive differences.
brms::prior_summary(model)
#>                 prior    class                         coef group resp  dpar
#>                (flat)        b                                              
#>                (flat)        b            nuisance_baseline                 
#>                (flat)        b nuisance_baseline.timetime_2                 
#>                (flat)        b nuisance_baseline.timetime_3                 
#>                (flat)        b          nuisance_biomarker1                 
#>                (flat)        b          nuisance_biomarker2                 
#>                (flat)        b      nuisance_status1_absent                 
#>                (flat)        b     nuisance_status2_present                 
#>          normal(1, 1)        b             x_group_1_time_2                 
#>          normal(1, 2)        b             x_group_1_time_3                 
#>          normal(1, 3)        b             x_group_1_time_4                 
#>          normal(2, 1)        b             x_group_2_time_2                 
#>          normal(2, 2)        b             x_group_2_time_3                 
#>          normal(2, 3)        b             x_group_2_time_4                 
#>                (flat)        b                                         sigma
#>                (flat)        b                   timetime_2            sigma
#>                (flat)        b                   timetime_3            sigma
#>                (flat)        b                   timetime_4            sigma
#>  lkj_corr_cholesky(1) Lcortime                                              
#>  nlpar lb ub       source
#>                   default
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>                      user
#>                      user
#>                      user
#>                      user
#>                      user
#>                      user
#>                   default
#>              (vectorized)
#>              (vectorized)
#>              (vectorized)
#>                   default

# Plot the marginal means against the data.
draws <- brm_marginal_draws(
  data = archetype,
  formula = formula,
  model = model
)
summaries_model <- brm_marginal_summaries(draws)
summaries_data <- brm_marginal_data(data)
brm_plot_compare(model = summaries_model, data = summaries_data)

wlandau · 2024-05-02T18:26:32Z

I think I have a reasonable print method. The cool thing about it is that it uses the marginal mean transformation matrix, which means we don't need a separate method for each individual scenario.

# This tibble is an informative prior archetype in brms.mmrm.
# The fixed effect parameters of interest express the marginal means
# as follows (on the link scale):
# 
#    group_1:time_2 = x_group_1_time_2
#    group_1:time_3 = x_group_1_time_2 + x_group_1_time_3
#    group_1:time_4 = x_group_1_time_2 + x_group_1_time_3 + x_group_1_time_4
#    group_2:time_2 = x_group_2_time_2
#    group_2:time_3 = x_group_2_time_2 + x_group_2_time_3
#    group_2:time_4 = x_group_2_time_2 + x_group_2_time_3 + x_group_2_time_4
# 
# The contents of the tibble are as follows:
# 
# A tibble: 600 × 23
   x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2 x_group_2_time_3
 *            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
 1                1                0                0                0                0
 2                1                1                0                0                0
 3                1                1                1                0                0
 4                1                0                0                0                0
 5                1                1                0                0                0
 6                1                1                1                0                0
 7                1                0                0                0                0
 8                1                1                0                0                0
 9                1                1                1                0                0
10                1                0                0                0                0
# ℹ 590 more rows
# ℹ 18 more variables: x_group_2_time_4 <dbl>, nuisance_biomarker1 <dbl>,
#   nuisance_biomarker2 <dbl>, nuisance_status1_absent <dbl>,
#   nuisance_status2_present <dbl>, nuisance_baseline <dbl>,
#   nuisance_baseline.timetime_2 <dbl>, nuisance_baseline.timetime_3 <dbl>, patient <chr>,
#   time <chr>, change <dbl>, missing <lgl>, baseline <dbl>, group <chr>, biomarker1 <dbl>,
#   biomarker2 <dbl>, status1 <chr>, status2 <chr>
# ℹ Use `print(n = ...)` to see more rows

wlandau · 2024-05-02T18:31:57Z

hmmm maybe it's better as a summary method after all, it sure takes up a lot of space for large datasets.

wlandau changed the title ~~Alternative modeling workflows for specific informative prior cases~~ Alternative modeling workflows for specific parameterizations and informative prior cases Apr 22, 2024

wlandau added the order: 2 label Apr 22, 2024

This was referenced Apr 22, 2024

Conditional means priors #40

Closed

A single unifying attribute in the formula to detect subgroup usage #98

Closed

wlandau added order: 1 and removed order: 2 labels Apr 23, 2024

wlandau mentioned this issue Apr 29, 2024

Ensure model matrices are full-rank #99

Closed

wlandau mentioned this issue May 2, 2024

Informative prior archetypes #100

Merged

wlandau closed this as completed in #100 May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative modeling workflows for specific parameterizations and informative prior cases #96

Alternative modeling workflows for specific parameterizations and informative prior cases #96

wlandau commented Apr 22, 2024

wlandau commented Apr 22, 2024

wlandau commented Apr 23, 2024

wlandau commented Apr 26, 2024

wlandau commented Apr 26, 2024

wlandau commented Apr 30, 2024 •

edited

wlandau commented Apr 30, 2024

wlandau commented May 2, 2024 •

edited

wlandau commented May 2, 2024

wlandau commented May 2, 2024

Alternative modeling workflows for specific parameterizations and informative prior cases #96

Alternative modeling workflows for specific parameterizations and informative prior cases #96

Comments

wlandau commented Apr 22, 2024

wlandau commented Apr 22, 2024

wlandau commented Apr 23, 2024

wlandau commented Apr 26, 2024

wlandau commented Apr 26, 2024

wlandau commented Apr 30, 2024 • edited

wlandau commented Apr 30, 2024

wlandau commented May 2, 2024 • edited

wlandau commented May 2, 2024

wlandau commented May 2, 2024

wlandau commented Apr 30, 2024 •

edited

wlandau commented May 2, 2024 •

edited