Targeted Learning for the Sample Average Treatment Effect on Treated Units (SATT)

2017 Atlantic Causal Inference Conference Data Analysis Challenge (pdf)

Authors: Jonathan Levy, Chris J. Kennedy, Caleb H. Miles, Ivana Malenica, Nima Hejazi, Andre Kurepa Waschka, and Alan E. Hubbard.

Description: Targeted minimum loss-based estimation (TMLE) was implemented using weighted logistic regression fluctuation. The pooled outcome regression and treatment mechanism were modeled using super learning, with a library consisting of logistic regression, gradient boosted machines (6 configurations), multivariate adaptive regression splines, random forest, neural networks, lasso, elastic net, and bayesian additive trees. Covariates supplied to the SuperLearner were pre-screened based on their univariate association with the outcome.

Acknowledgments: We thank Susan Gruber for theoretical inspiration and for sharing the source code from her & Mark's 2016 competition entry. We also thank Mark van der Laan for helpful discussions.

Expected runtime: 160 seconds per dataset of 250 observations and 58 covariates.

Notes: We assume no missing data in the datasets. We do not include inference for the unit-level causal estimates as those are not asymptotically linear within the targeted learning framework.

Requirements

R 3.2 or later, R 3.3+ recommended.
Java JDK for rJava
R Packages:
- CRAN: bartMachine, caret, devtools, doMC, earth, ggplot2, glmnet, kernlab, mgcv, nnet, randomForest, ranger, RhpcBLASctl, xgboost
- Github: ecpolley/SuperLearner, ck37/ck37r
Hardware assumptions: 4 CPU cores available for multi-threaded algorithms (BART, Ranger, XGBoost), 16GB+ RAM, and a UNIX-based operating system.

How to run

Minimal:

Make sure java JDK is installed and R can load rJava & bartMachine packages.
Run setup.R to install other necessary packages: make setup
Modify targeted_learning.R settings at the top of the file if necessary.
./targeted_learning.R inputData outfile1 outfile2

Analysis of 2016 or 2017-pre data:

Unzip 2017 data into inbound/pre_data/
Unzip 2016 data into inbound/data-2016/
Run import-2016.R to import the 2016 data: make import-2016
Run test-2016.R to conduct a single test analysis of 2016: make test-2016
Run analyze-2016.R to analyze all 2016 files using targeted_learning.R: make analyze-2016
Run import-2017.R to import the 2017 data.

Subdirectory layout

Data - working RData files generated during analysis, not tracked via git.
Exports - exported files (cvs, tsvs, etc.) that are not tracked via git.
Inbound - input datasets that are not tracked via git.
Lib - R source code that defines functions; all .R files are loaded.
Output - log output files from Savio jobs etc.
Scripts - shell (BASH) scripts.
Simulations - simulation studies.

Troubleshooting

Please feel free to post any issues to the issue queue or email us.

rJava issues

There can be issues installing and using rJava for bartMachine. If necessary, one edit from Vince Dorie for cluster usage is to manually load libjvm.so:

# Update this path to the appropriate one for your system.
dyn.load("/usr/lib/jvm/java-1.8.0-ibm-1.8.0.3.10-1jpp.2.el7_2.x86_64/jre/lib/amd64/compressedrefs/libjvm.so")

References

Balzer, L. B., Petersen, M. L., & Laan, M. J. (2016). Targeted estimation and inference for the sample average treatment effect in trials with and without pair‐matching. Statistics in medicine, 35(21), 3717-3732.

Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266-298.

Dorie, V., Hill, J., Shalit, U., Scott, M., & Cervone, D. (2017). Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition. arXiv preprint arXiv:1707.02641.

Green, D. P., & Kern, H. L. (2012). Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public opinion quarterly, nfs036.

Hill, J. L. (2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1), 217-240.

Hubbard, A. E., Jewell, N. P., & van der Laan, M. J. (2011). Direct effects and effect among the treated. In Targeted Learning (pp. 133-143). Springer New York.

Kapelner, A., & Bleich, J. (2014). bartmachine: Machine learning with bayesian additive regression trees. arXiv preprint arXiv:1312.2171.

Luedtke, A. R., & van der Laan, M. J. (2016). Super-learning of an optimal dynamic treatment rule. The international journal of biostatistics, 12(1), 305-332.

Polley, E., LeDell, E., Kennedy, C., Lendle, S., & van der Laan, M. J. (2017). R Package ‘SuperLearner’. Development version 2.0-22.

Polley, E., & van der Laan, M. (2009). Selecting optimal treatments based on predictive factors. Design and Analysis of Clinical Trials with Time-to-Event Endpoints, 441-454.

van der Laan, M. J., & Gruber, S. (2016). "One-Step Targeted Minimum Loss-based Estimation Based on Universal Least Favorable One-Dimensional Submodels". U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 347.

van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).

van der Laan, M. J., & Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.

License

The contents of this repository are distributed under the MIT license. See file LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 284 Commits
data		data
exports		exports
inbound		inbound
lib		lib
output		output
scripts		scripts
simulations		simulations
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
analyze-2016.R		analyze-2016.R
atlantic-causal-2017.Rproj		atlantic-causal-2017.Rproj
import-2016.R		import-2016.R
import-2017.R		import-2017.R
setup.R		setup.R
targeted_learning.R		targeted_learning.R
test-2016.R		test-2016.R

License

ck37/atlantic-causal-2017

Folders and files

Latest commit

History

Repository files navigation

Targeted Learning for the Sample Average Treatment Effect on Treated Units (SATT)

Requirements

How to run

Subdirectory layout

Troubleshooting

rJava issues

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages