-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative modeling workflows for specific parameterizations and informative prior cases #96
Comments
Would be good to have cell-means-like sdif, ordinary cell means, and the treatment effect case. |
I think the term "informative prior scenario" fits here, with functions like |
In branch 96, I implemented a new set.seed(0L)
data <- brm_simulate_outline(
n_group = 2,
n_patient = 100,
n_time = 4,
rate_dropout = 0,
rate_lapse = 0
) |>
dplyr::mutate(response = rnorm(n = dplyr::n())) |>
brm_data_change() |>
brm_simulate_continuous(names = c("biomarker1", "biomarker2")) |>
brm_simulate_categorical(
names = c("status1", "status2"),
levels = c("present", "absent")
) |>
dplyr::mutate(response = rnorm(n = dplyr::n()))
dplyr::select(
data,
group,
time,
patient,
starts_with("biomarker"),
starts_with("status")
)
#> # A tibble: 600 × 7
#> group time patient biomarker1 biomarker2 status1 status2
#> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 group_1 time_2 patient_001 -1.42 -0.287 absent present
#> 2 group_1 time_3 patient_001 -1.42 -0.287 absent present
#> 3 group_1 time_4 patient_001 -1.42 -0.287 absent present
#> 4 group_1 time_2 patient_002 -1.67 1.84 absent present
#> 5 group_1 time_3 patient_002 -1.67 1.84 absent present
#> 6 group_1 time_4 patient_002 -1.67 1.84 absent present
#> 7 group_1 time_2 patient_003 1.38 -0.157 absent absent
#> 8 group_1 time_3 patient_003 1.38 -0.157 absent absent
#> 9 group_1 time_4 patient_003 1.38 -0.157 absent absent
#> 10 group_1 time_2 patient_004 -0.920 -1.39 present present
scenario <- brm_scenario_successive_cells(data)
scenario
#> # A tibble: 600 × 23
#> x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2 x_group_2_time_3
#> * <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 0 0 0
#> 2 1 1 0 0 0
#> 3 1 1 1 0 0
#> 4 1 0 0 0 0
#> 5 1 1 0 0 0
#> 6 1 1 1 0 0
#> 7 1 0 0 0 0
#> 8 1 1 0 0 0
#> 9 1 1 1 0 0
#> 10 1 0 0 0 0
#> # ℹ 590 more rows
#> # ℹ 18 more variables: x_group_2_time_4 <dbl>, nuisance_biomarker1 <dbl>, nuisance_biomarker2 <dbl>,
#> # nuisance_baseline <dbl>, nuisance_status1absent <dbl>, nuisance_status1present <dbl>,
#> # nuisance_status2present <dbl>, patient <chr>, time <chr>, change <dbl>, missing <lgl>,
#> # baseline <dbl>, group <chr>, biomarker1 <dbl>, biomarker2 <dbl>, status1 <chr>, status2 <chr>,
#> # response <dbl>
#> # ℹ Use `print(n = ...)` to see more rows All original columns from the dataset are in there, and there are a bunch of new columns that will be modeled later. There are binary columns to encode successive differences on the factors of interest (group and time, and subgroup if it exists): scenario[, attr(scenario, "brm_scenario_interest")]
#> # A tibble: 600 × 6
#> x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2 x_group_2_time_3 x_group_2_time_4
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 0 0 0 0
#> 2 1 1 0 0 0 0
#> 3 1 1 1 0 0 0
#> 4 1 0 0 0 0 0
#> 5 1 1 0 0 0 0
#> 6 1 1 1 0 0 0
#> 7 1 0 0 0 0 0
#> 8 1 1 0 0 0 0
#> 9 1 1 1 0 0 0
#> 10 1 0 0 0 0 0
#> # ℹ 590 more rows
#> # ℹ Use `print(n = ...)` to see more rows The nuisance variables are one-hot-encoded and centered at their means so that the factors of interest are not implicitly conditional on a subset of the data: scenario[, attr(scenario, "brm_scenario_nuisance")]
#> # A tibble: 600 × 6
#> nuisance_biomarker1 nuisance_biomarker2 nuisance_baseline nuisance_status1absent nuisance_status1present nuisance_status2present
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -1.37 -0.231 1.30 0.52 -0.52 0.585
#> 2 -1.37 -0.231 1.30 0.52 -0.52 0.585
#> 3 -1.37 -0.231 1.30 0.52 -0.52 0.585
#> 4 -1.61 1.90 0.450 0.52 -0.52 0.585
#> 5 -1.61 1.90 0.450 0.52 -0.52 0.585
#> 6 -1.61 1.90 0.450 0.52 -0.52 0.585
#> 7 1.43 -0.101 0.0297 0.52 -0.52 -0.415
#> 8 1.43 -0.101 0.0297 0.52 -0.52 -0.415
#> 9 1.43 -0.101 0.0297 0.52 -0.52 -0.415
#> 10 -0.865 -1.33 -1.11 -0.48 0.48 0.585
#> # ℹ 590 more rows
#> # ℹ Use `print(n = ...)` to see more rows Lastly, there is a "parameterization" object which maps the generated factors of interest with the actual levels in the data. This will be important for assigning informative priors without having to guess how attr(scenario, "brm_scenario_parameterization")
#> # A tibble: 6 × 3
#> group time variable
#> <chr> <chr> <chr>
#> 1 group_1 time_2 x_group_1_time_2
#> 2 group_1 time_3 x_group_1_time_3
#> 3 group_1 time_4 x_group_1_time_4
#> 4 group_2 time_2 x_group_2_time_2
#> 5 group_2 time_3 x_group_2_time_3
#> 6 group_2 time_4 x_group_2_time_4 |
Going forward, I plan to make functions |
I think "archetype" is a better name than "scenario" or "parameterization" for these specialized cases like sdif. |
Changed to "archetype" in https://github.com/openpharma/brms.mmrm/tree/96 |
Here is a preview of the workflow for informative prior archetypes. I think it is coming together nicely. library(brms.mmrm)
# Simulate the data.
set.seed(0L)
data <- brm_simulate_outline(
n_group = 2,
n_patient = 100,
n_time = 4,
rate_dropout = 0,
rate_lapse = 0
) |>
dplyr::mutate(response = rnorm(n = dplyr::n())) |>
brm_data_change() |>
brm_simulate_continuous(names = c("biomarker1", "biomarker2")) |>
brm_simulate_categorical(
names = c("status1", "status2"),
levels = c("present", "absent")
)
dplyr::select(
data,
group,
time,
patient,
starts_with("biomarker"),
starts_with("status")
)
#> # A tibble: 600 × 7
#> group time patient biomarker1 biomarker2 status1 status2
#> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 group_1 time_2 patient_001 -1.42 -0.287 absent present
#> 2 group_1 time_3 patient_001 -1.42 -0.287 absent present
#> 3 group_1 time_4 patient_001 -1.42 -0.287 absent present
#> 4 group_1 time_2 patient_002 -1.67 1.84 absent present
#> 5 group_1 time_3 patient_002 -1.67 1.84 absent present
#> 6 group_1 time_4 patient_002 -1.67 1.84 absent present
#> 7 group_1 time_2 patient_003 1.38 -0.157 absent absent
#> 8 group_1 time_3 patient_003 1.38 -0.157 absent absent
#> 9 group_1 time_4 patient_003 1.38 -0.157 absent absent
#> 10 group_1 time_2 patient_004 -0.920 -1.39 present present
#> # ℹ 590 more rows
# Create a successive differences archetype.
archetype <- brm_archetype_successive_cells(data)
archetype
#> # A tibble: 600 × 23
#> x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2
#> * <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 0 0
#> 2 1 1 0 0
#> 3 1 1 1 0
#> 4 1 0 0 0
#> 5 1 1 0 0
#> 6 1 1 1 0
#> 7 1 0 0 0
#> 8 1 1 0 0
#> 9 1 1 1 0
#> 10 1 0 0 0
#> # ℹ 590 more rows
#> # ℹ 19 more variables: x_group_2_time_3 <dbl>, x_group_2_time_4 <dbl>,
#> # nuisance_biomarker1 <dbl>, nuisance_biomarker2 <dbl>,
#> # nuisance_status1_absent <dbl>, nuisance_status2_present <dbl>,
#> # nuisance_baseline <dbl>, nuisance_baseline.timetime_2 <dbl>,
#> # nuisance_baseline.timetime_3 <dbl>, patient <chr>, time <chr>,
#> # change <dbl>, missing <lgl>, baseline <dbl>, group <chr>, …
formula <- brm_formula(archetype)
formula
#> change ~ 0 + x_group_1_time_2 + x_group_1_time_3 + x_group_1_time_4 + x_group_2_time_2 + x_group_2_time_3 + x_group_2_time_4 + nuisance_biomarker1 + nuisance_biomarker2 + nuisance_status1_absent + nuisance_status2_present + nuisance_baseline + nuisance_baseline.timetime_2 + nuisance_baseline.timetime_3 + unstr(time = time, gr = patient)
#> sigma ~ 0 + time
# Put priors on intercepts and successive differences.
label <- brm_prior_label(
code = "normal(1, 1)",
group = "group_1",
time = "time_2"
) |>
brm_prior_label("normal(1, 2)", group = "group_1", time = "time_3") |>
brm_prior_label("normal(1, 3)", group = "group_1", time = "time_4") |>
brm_prior_label("normal(2, 1)", group = "group_2", time = "time_2") |>
brm_prior_label("normal(2, 2)", group = "group_2", time = "time_3") |>
brm_prior_label("normal(2, 3)", group = "group_2", time = "time_4")
label
#> # A tibble: 6 × 3
#> code group time
#> <chr> <chr> <chr>
#> 1 normal(1, 1) group_1 time_2
#> 2 normal(1, 2) group_1 time_3
#> 3 normal(1, 3) group_1 time_4
#> 4 normal(2, 1) group_2 time_2
#> 5 normal(2, 2) group_2 time_3
#> 6 normal(2, 3) group_2 time_4
prior <- brm_prior_archetype(archetype, label)
prior
#> prior class coef group resp dpar nlpar lb ub source
#> normal(1, 1) b x_group_1_time_2 <NA> <NA> user
#> normal(1, 2) b x_group_1_time_3 <NA> <NA> user
#> normal(1, 3) b x_group_1_time_4 <NA> <NA> user
#> normal(2, 1) b x_group_2_time_2 <NA> <NA> user
#> normal(2, 2) b x_group_2_time_3 <NA> <NA> user
#> normal(2, 3) b x_group_2_time_4 <NA> <NA> user
class(prior)
#> [1] "brmsprior" "data.frame"
# Run the model on the archetype with the prior.
model <- brm_model(
data = archetype,
formula = formula,
prior = prior,
refresh = 0
)
# Our custom informative prior was actually used for successive differences.
brms::prior_summary(model)
#> prior class coef group resp dpar
#> (flat) b
#> (flat) b nuisance_baseline
#> (flat) b nuisance_baseline.timetime_2
#> (flat) b nuisance_baseline.timetime_3
#> (flat) b nuisance_biomarker1
#> (flat) b nuisance_biomarker2
#> (flat) b nuisance_status1_absent
#> (flat) b nuisance_status2_present
#> normal(1, 1) b x_group_1_time_2
#> normal(1, 2) b x_group_1_time_3
#> normal(1, 3) b x_group_1_time_4
#> normal(2, 1) b x_group_2_time_2
#> normal(2, 2) b x_group_2_time_3
#> normal(2, 3) b x_group_2_time_4
#> (flat) b sigma
#> (flat) b timetime_2 sigma
#> (flat) b timetime_3 sigma
#> (flat) b timetime_4 sigma
#> lkj_corr_cholesky(1) Lcortime
#> nlpar lb ub source
#> default
#> (vectorized)
#> (vectorized)
#> (vectorized)
#> (vectorized)
#> (vectorized)
#> (vectorized)
#> (vectorized)
#> user
#> user
#> user
#> user
#> user
#> user
#> default
#> (vectorized)
#> (vectorized)
#> (vectorized)
#> default
# Plot the marginal means against the data.
draws <- brm_marginal_draws(
data = archetype,
formula = formula,
model = model
)
summaries_model <- brm_marginal_summaries(draws)
summaries_data <- brm_marginal_data(data)
brm_plot_compare(model = summaries_model, data = summaries_data) |
I think I have a reasonable print method. The cool thing about it is that it uses the marginal mean transformation matrix, which means we don't need a separate method for each individual scenario. # This tibble is an informative prior archetype in brms.mmrm.
# The fixed effect parameters of interest express the marginal means
# as follows (on the link scale):
#
# group_1:time_2 = x_group_1_time_2
# group_1:time_3 = x_group_1_time_2 + x_group_1_time_3
# group_1:time_4 = x_group_1_time_2 + x_group_1_time_3 + x_group_1_time_4
# group_2:time_2 = x_group_2_time_2
# group_2:time_3 = x_group_2_time_2 + x_group_2_time_3
# group_2:time_4 = x_group_2_time_2 + x_group_2_time_3 + x_group_2_time_4
#
# The contents of the tibble are as follows:
#
# A tibble: 600 × 23
x_group_1_time_2 x_group_1_time_3 x_group_1_time_4 x_group_2_time_2 x_group_2_time_3
* <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0
2 1 1 0 0 0
3 1 1 1 0 0
4 1 0 0 0 0
5 1 1 0 0 0
6 1 1 1 0 0
7 1 0 0 0 0
8 1 1 0 0 0
9 1 1 1 0 0
10 1 0 0 0 0
# ℹ 590 more rows
# ℹ 18 more variables: x_group_2_time_4 <dbl>, nuisance_biomarker1 <dbl>,
# nuisance_biomarker2 <dbl>, nuisance_status1_absent <dbl>,
# nuisance_status2_present <dbl>, nuisance_baseline <dbl>,
# nuisance_baseline.timetime_2 <dbl>, nuisance_baseline.timetime_3 <dbl>, patient <chr>,
# time <chr>, change <dbl>, missing <lgl>, baseline <dbl>, group <chr>, biomarker1 <dbl>,
# biomarker2 <dbl>, status1 <chr>, status2 <chr>
# ℹ Use `print(n = ...)` to see more rows |
hmmm maybe it's better as a summary method after all, it sure takes up a lot of space for large datasets. |
Following up on #93, we could have different end-to-end workflows for specific informative prior use cases, such as successive differences (e.g. #92). We could begin with
brm_data()
, have specific functions likebrm_data_sdif()
to create case-specific classed datasets, and then make functions likebrm_formula()
andbrm_transform_marginal()
could become S3 generics with case-specific methods. We could also have a new S3 generic for informative prior specification which would only apply to those specific cases.The text was updated successfully, but these errors were encountered: