Note: In order to reproduce this analysis, you need the R package
separation from GitHub, which you can install using
devtools::install_github("carlislerainey/separation")
. See below for
the details about other R pacakges.
To reproduce the analysis, first clear all created files. Run:
make cleanALL
Then reproduce the analysis. Run:
make
The Makfile
describes the structure of the code and allows the user to
reproduce the entire analysis or portions of it.
make all
or justmake
reproduces the entire analysis, including the simulations, which take about five hours.make dag
reproduces a DAG that shows the structure of the dependencies in theMakefile
.make sims
reproduces the.rds
files of the simulations (created by bothsimulations.R
andsample-size-simulations.R
) and saves them in thesimulations
directory. You can monitor the progress ofsimulations.R
andsample-size-simulations.R
withsimulations.log
andsample-size-simulations.log
, respectively. This takes about five hours.make simplots
reproduces the figures summarizing the simulations (Figures 2-5 and 8 from the main paper as well as figures for the appendix) and saves them in themanuscript/figs
directory.make ge
reproduces our re-analysis of George and Epstein (1992) and saves the figures to themanuscript/figs
directory.make weisiger
reproduces our re-analysis of Weisiger (2014) and saves the figures to themanuscript/figs
directory.make manuscript
recompiles the LaTeX manuscriptsmall.tex
intosmall.pdf
andsmall-appendix.tex
intosmall-appendix.pdf
. It automatically handles the bibliography.make readme
knits this document from the.Rmd
file.make computedvalues
knitscomputed-values.pdf
, which creates tables of the numeric quantities reported in the text.
You can clear any produced files from by the code by running make clean*
and using any of the phonies above. For example,make cleandag
removes the figure makefile-dag.png
. To clean the entire directory,
run make cleanALL
(I put ALL
in caps to remind myself of the
consequences).
We used a combination of ggplot and Apple Keynote to create Figure 1
manuscript/figs/illustrate-bias-annotated.pdf
, which illustrates the
source of the small sample bias. The R script R/illustrate-bias.R
create the underlying plot manuscript/figs/illustrate-bias.pdf
, but we
added the annotations manually in Keynote.
- The simulations take about five hours. We’ve set them up to run in parallel on four clusters. You might speed this up with a change here.
- The code automatically stores the packages used in the last run in
the file
session-info.txt
. - There is no log file, but all the figures are created and saved in
the
manuscript/figs
directory. All quantities reported in the manuscript are computed and/or reported in the filecomputed-values.pdf
.
We ran the analysis using the system below.
R.version
## _
## platform x86_64-apple-darwin15.6.0
## arch x86_64
## os darwin15.6.0
## system x86_64, darwin15.6.0
## status
## major 3
## minor 6.1
## year 2019
## month 07
## day 05
## svn rev 76782
## language R
## version.string R version 3.6.1 (2019-07-05)
## nickname Action of the Toes
In order to reproduce the analysis, several R packages, which you can install with the following code:
# list of packages on CRAN used in this project (exclusing base packages)
pkg <- c("brglm",
"brglm2",
"clusterGeneration",
"devtools",
"doParallel",
"doRNG",
"foreach",
"ggraph",
"gridExtra",
"gridExtra",
"igraph",
"kableExtra",
"logistf",
"quantreg",
"scoring",
"texreg",
"tidyverse",
"xtable")
To install these packages, you can run the code above along with the command below.
install.packages(pkg, repos = "http://cran.rstudio.com")
You also need the package separation from GitHub, which you can install with the command below.
devtools::install_github("carlislerainey/separation")
We recommend using the latest version of each package, but the versions
we used are saved to the file package-versions.csv
.
library(tidyverse)
library(kableExtra)
devtools::package_info(pkgs = c(pkg, "separation"), dependencies = TRUE) %>%
select(package, version = ondiskversion, date, source) %>%
write_csv("package-versions.csv")
- To create
makefile-dag.png
, run the R scriptmakefile-dag.R
. - To do the simulations for figures 2-5 and store them as
simulations/simulations.rds
, run the R scriptR/simulations.R
. - To create the figures based on the simulations above, run the R
script
R/plot-simulations.R
. These figures are stored as.pdf
s inmanuscript/figs
. - To perform the sample size simulations for figure 8 and store them
as
simulations/sample-size-simulations.rds
, run the R scriptR/sample-size-simulations.R
. - To create the figures based on the sample size simulations above,
run
R/plot-sample-size-simulations.R
. These figures are stored as.pdf
s inmanuscript/figs
. - To reproduce the George and Epstein re-analysis, run the R script
ge-replication/R/analysis.R
. Figures are stored as.pdf
s inmanuscript/figs
. - To reproduce the Weisiger re-analysis, run the R script
weisiger-replication/R/analysis.R
. Figures are stored as.pdf
s inmanuscript/figs
. - To compile the manuscript and appendix, compile
manuscript/small.tex
andmanuscript/small-appendix.tex
, respectively, with pdftex and bibtex. - To render the
README.md
, compileREADME.Rmd
with knitr. - To render the
computed-values.pdf
, compilecomputed-values.pdf
with knitr.