Repository for my work on the Flu Shot Learning competition on Driven Data. Driven Data profile: apoirel. Work in progress.
Requirements
snakemake
conda
This is only necessary if you intend to experiment with or modify the code. Create a conda environment with all the required packages:
conda env create -f environment.yml
In this directory
snakemake --use-conda all
snakemake --use-singularity --use-conda all
├── environment.yml <- The file defining the conda Python environmnet.
├── Snakefile <- Definition of the full workflow for reproducing the analysis.
├── LICENSE
├── README.md <- The top-level README.
├── data
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── output
| ├── models <- Serialized models, predictions, model summaries.
| └── figures <- Graphics created during analysis.
├── paper <- Generated analysis as PDF, LaTeX.
└── src <- Source code for this project.
├── notebooks <- Jupyter notebooks.
└── __init__.py <- Makes this a python module.
0.8342 AUC on hidden test set, 181/948 on leaderboard(baseline LR)0.8462 AUC on hidden test set, 133/953 on leaderboard(tuned random forest)0.8473 AUC on hidden test set, 130/953 on leaderboard.(xgboost baseline)- 0.8530 AUC on hidden test set, 112/953 on leaderboard (moderately tuned xgboost)
Todo: refactor notebooks code for the last 2 into proper python scripts.
This project is distributed under the MIT license.