Skip to content

kaz-yos/mw

Repository files navigation

Matching weights simulation study R code

What is this?

The code included in this repository was used to conduct the simulation study described in the presentation at the Epidemiology Congress of Americas 2016 (Miami, Florida, USA). The simulation study compared matching weights (Li et al 2013; Li et al in Pan & Bai 2015), three-way matching (Rassen et al 2013), and inverse probability of treatment weighting (Robins et al 2000) in three-level categorical point exposure setting. The corresponding online-first manuscript is available at Epidemiology. Please email me at kazukiyoshida@mail.harvard.edu if you have difficulty obtaining the paper. A tutorial for using matching weights in an empirical study is at the RPubs. Franklin et al is another simulation study on propensity score methods including matching weights. A recent example of application of matching weights can be found in Sauer et al.

Files and folders

  • *.R: Main R script files for generating simulation data, analyzing data, and reporting results. Execution each file will generate a plain text report file named *.R.txt. Only 03.Report.R.txt is kept in this repository.
  • *_Lsf.sh: Example parallelization shell scripts for the Linux LSF batch job system. These are designed for Harvard Medical School’s Orchestra cluster specifically, and are not expected to work without modification elsewhere.
  • *_Slurm.sh: Example parallelization shell scripts for the Linux SLURM batch job system. These are designed for Harvard University’s Odyssey cluster specifically, and are not expected to work without modification elsewhere.
  • data/: Folder for simulation data. Due to file size issues, only the analysis result files sufficient for running 03.Report.R is kept.
  • figures/: Folder for figure PDFs generatd by reporting scripts.
  • function_definitions/: Folder for R function definitions used by the main R scripts.
  • rassen_toolbox/: Folder for Rassen et al’s Pharmacoepidemiology Toolbox. rassen_toolbox/java/pharmacoepi.jar is required for three-way matching.

How to replicate simulation

The scripts were written on a macOS system, and for the most part executed on Linux high-performance cluster systems. The execution is computationally intensive, thus, parallelized execution on a computer cluster system is required, particularly for the bootstrapping part.

Setting up environment

The following code should install packages that are required by the simulation study.

Rscript ./00.InstallDependencies.R

Generate data

The simulated dataset must be generated first. The following generates 48 scenario data files (e.g., Scenario001_R1000.RData) having 1000 iterations each in the data subfolder.

Rscript ./01.DataGenerator.R

Conduct analyses

Matching weights, three-way matching, and inverse probability of treatment weights-based analyses are conducted by specifying the scenario data file. Analysis must be invoked on one file at a time. For example, for invocation on the first scenario file, use the following code.

Rscript ./02.RunSimulation.R ./data/Scenario001_R1000.RData

Parallelization

This process can be parallelized by distributing the simulation job on each file to a node in a large computer cluster. Example batch files for the LSF and SLURM job dispatch systems are included. Modification of these script to your local cluster system is necessary before they are of any use.

LSF job dispatch system

./02.RunSimulation_Lsf.sh ./data/Scenario*

SLURM job dispatch system

./02.RunSimulation_Slurm.sh ./data/Scenario*

Report results

After conducting analyses on all scenarios, the following script can be used to generate reporting. The figures are generated in the figures folder.

Rscript ./03.Report.R

Bootstrapping

Bootstrapping within a simulation study is a highly computationally intensive task. Thus, this part is kept separate from the rest of the simulation. The script is designed to work on the one tenth of each scenario at a time. For example, for the first part of the first scenario, execute the first line. For the last part of the first scenario, execute the second line.

Rscript ./04.Bootstrap.R ./data/Scenario001_R1000.RData 1
Rscript ./04.Bootstrap.R ./data/Scenario001_R1000.RData 10

Parallelization

This process can also be parallelized. Each one tenth of each scenario is dispatched to a separate node using the following scripts. Again modification of these shell scripts to your local cluster system is necessary before they are of any use.

LSF job dispatch system

./04.Bootstrap_Lsf.sh ./data/Scenario*

SLURM job dispatch system

./04.Bootstrap_Slurm.sh ./data/Scenario*

Bootstrap reporting

After conducting bootstrapping on all scenarios, the following script can be used to generate reporting. The figure is generated in the figures folder.

Rscript ./05.BootstrapReport.R

Version history

  • 2017-02-06: Add online-first article link and additional reading links.
  • 2016-07-26: Add manuscript status and tutorial link
  • 2016-07-16: Initial upload

About

Simulation scripts for matching weights in three-group studies (Epidemiology 2017)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published