Skip to content

A cookiecutter template for reproducible research projects using Python, Snakemake, and Pandoc.

License

Notifications You must be signed in to change notification settings

timtroendle/cookiecutter-reproducible-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reproduction

cookiecutter-reproducible-research

This repository provides cookiecutter templates for reproducible research projects. The templates do not attempt to be generic, but have a clear and opinionated focus.

Projects build with these templates aim at full automation, and use Python 3.11, mamba/conda, Git, Snakemake, and pandoc to create a HTML report out of raw data, code, and Markdown text. Fork, clone, or download this repository on GitHub if you want to change any of these.

The template includes a few lines of code as a demo to allow you to create a HTML report out of made-up simulation results right away. Read the README.md in the generated repository to see how.

Template types

default

This generates the basic structure of a reproducible workflow.

cluster

The cluster template extends the basic template by adding infrastructure to support running on a compute cluster.

Getting Started

Make sure you have cookiecutter installed, otherwise install it with conda:

conda install cookiecutter -c conda-forge

Then create a repository using:

cookiecutter gh:timtroendle/cookiecutter-reproducible-research --directory=[default/cluster]

You will be asked for the following parameters:

Parameter Description
project_name The name of your project, used in the documentation and report.
project_short_name An abbreviation, used for environments and such. Avoid special characters and whitespace.
author Your name.
institute The name of your institute, used for report metadata.
short_description A short description of the project, used for documentation and report.
path_to_conda_envs The path to the directory hosting your conda envs (leave untouched for Snakemake default).

The cluster template requires the following parameter values in addition:

Parameter Description
cluster_url The address of the cluster to allow syncing to and from the cluster.
cluster_base_dir The base path for the project on the cluster (default: ~/<project-short-name>).
cluster_type The type of job scheduler used on the cluster. Currently, only LSF is supported.

Project Structure

The generated repository will have the following structure:

├── config                  <- Configuration files, e.g., for your model if needed.
│   └── default.yaml        <- Default set of configuration parameter values.
├── data                    <- Raw input data.
├── envs                    <- Execution environments.
│   ├── default.yaml        <- Default execution environment.
│   ├── report.yaml         <- Environment for compilation of the report.
│   └── test.yaml           <- Environment for executing tests.
├── profiles                <- Snakemake profiles.
│   └── default             <- Default Snakemake profile folder.
│       └── config.yaml     <- Default Snakemake profile.
├── report                  <- All files creating the final report, usually text and figures.
│   ├── apa.csl             <- Citation style definition to be used in the report.
│   ├── literature.yaml     <- Bibliography file for the report.
│   ├── report.md           <- The report in Markdown.
│   └── pandoc-metadata.yaml<- Metadata for the report.
├── rules                   <- The place for all your Snakemake rules.
├── scripts                 <- Scripts go in here.
│   ├── model.py            <- Demo file.
│   └── vis.py              <- Demo file.
├── tests                   <- Automatic tests of the source code go in here.
│   └── test_model.py       <- Demo file.
├── .editorconfig           <- Editor agnostic configuration settings.
├── .ruff                   <- Linter and formatter settings for ruff.
├── .gitignore
├── environment.yaml        <- A file to create an environment to execute your project in.
├── LICENSE.md              <- MIT license description
├── Snakefile               <- Description of all computational steps to create results.
└── README.md

cluster templates additionally contain the following files:

├── envs
│   └── shell.yaml              <- An environment for shell rules.
├── profiles
│   └── cluster                 <- Cluster Snakemake profile folder.
│       └── config.yaml         <- Cluster Snakemake profile.
├── rules
│   └── sync.yaml               <- Snakemake rules to sync to and from the cluster.
├── .syncignore-receive         <- Build files to ignore when receiving from the cluster.
└── .syncignore-send            <- Local files to ignore when sending to the cluster.

License

Some ideas for this cookiecutter template are taken from cookiecutter-data-science and mkrapp/cookiecutter-reproducible-science. This template is MIT licensed itself.

About

A cookiecutter template for reproducible research projects using Python, Snakemake, and Pandoc.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published