Skip to content

psboonstra/RankModeling

Repository files navigation

Penalized multistage models for ordered data

Current Suggested Citation

Boonstra, Philip S. and Krauss, John C., "Inferring a consensus problem list using penalized multistage models for ordered data" (October 2019) The University of Michigan Department of Biostatistics Working Paper Series. Working Paper 126. https://biostats.bepress.com/umichbiostat/paper126

DOI

See also:

Krauss, John C., Boonstra, Philip S., Vantsevich, Anna V., and Friedman, Charles P., "Is the problem list in the eye of the beholder? An exploration of consistency across physicians" Journal of the American Medical Informatics Association (2016); 23(5), 859--865 https://doi.org/10.1093/jamia/ocv211

Executive Summary

The function penRank_path contained in the file functions_bpl.R represents the primary statistical contribution from this manuscript. This function will estimate the solution path for a Benter-Plackett-Luce model penalized with seamless L0 penalties

Further details

In more detail, there are eleven files included in this repository (in addition to this README and the authors' version of the manuscript): three CSV files (ending in .csv) and eight R scripts (ending in .R). The results reported in the manuscript were run using commit 23.

CSV files

caseX_20Dec2014.csv, where X = 23, 83, and 111 contain the problem list data as ranked lists. Case 23 = Case A; Case 111 = Case B; Case 83 = Case C

R files

functions_bpl.R provides all of the necessary functions to fit the BPL methods described in the paper

functions_ldrbo.R provides all of the necessary functions to calculate the consensus LDRBO reported in this paper and Krauss, et al. (2015). See the file ldrbo_vignette.pdf for more details.

gather_data.R reads in the problem list data from the .csv files and recharacterizes from ranked lists to ordered lists. Case 23 == Case A; Case 111 == Case B; Case 83 == Case C

fit_model_problists.R calls the previous three scripts and then calculates the solution paths and creates the tables and figures reported in the manuscript

run_rank_sims.R is the top-level script for conducting the simulation study. You provide the value of array_id on line 13 or the SLURM scheduler provides the value on line 15 to run the simulation scenario that you want to run. Choose any integer from 1 to 36 (actually you can choose any positive integer and it will be mapped to the numbers 1 to 36 via modular arithmetic). If you run this script once for each of array_id = 1, ..., 972, you will have run the entire simulation study reported in this manuscript

generate_params.R is called by run_rank_sims.R to create the individual data generating mechanisms used in the simulation study

function_simulations.R is also called by run_rank_sims.R and contains all of the code for doing the simulation study

process_results.R should be called after the simulations are complete. Each time the script run_rank_sims.R is run, two files will be saved: simX_performance.csv and simX_bpl.csv, where X = the array_id used. Assuming these are all collected in a folder called 'out', running process_results.R will collate all of these results together and create the tables and figures reported in the manuscript

Acknowledgments

This work was supported by the National Institutes of Health (UL1TR002240)