You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Use pathlib in Snakefile
* Add logdir config param. Get tired because Snakemake doesn't support Path objects as input or log files.
* Use pathlib for all paths. Add version printout to Snakefile
* Add info about pathlib use
* Add details on branching structure to CONTRIBUTING.md
* Bump docs version
* Fix dbdir typo in error message of remove_human
* Enable BBMap to multiple databases (#35)
* Add support for mapping to multiple Bowtie2 databases (#36)
Closes#33
* First draft of CHANGELOG.md (#38)
Closes#37
* Rackham profile (#39)
* Fix issues with local rules not being local on Rackham
* New Slurm profile for Rackham based on Snakemake-Profiles/slurm
Closes#32, at least for now. It might be reopened a later date.
* Bump version to 0.1.2-dev, update CHANGELOG
* Add note about --conda-prefix and editing rackham.yaml to set slurm project
* [docs] add info about mapping to multiple databases
* Update docs (#40)
* Add note about --conda-prefix and editing rackham.yaml to set slurm project
* [docs] add info about mapping to multiple databases
Closes#34#33
* Add HUMAnN2 functional profiling
* Add rules/functional_profiling to gitignore
* Update CHANGELOG, README
* Make MetaPhlAn2 dependency for HUMAnN2 rule explicit, and enforce even if user sets metaphlan2:False in config.yaml
* Add mention about MPA2 always being run if HUMAnN2 is enabled
* Add wording about HUMAnN2 and MetaPhlAn2 in changelog.
* Add download_humann2_databases to docs
* Make note in docs about updating config.yaml after downloading databases
* Change count table rule docstring to Bowtie2
* Minor modifications to CONTRIBUTING
* Change sketch.sh cpus to 4
* Specify lineage in Kaiju summary reports
* Replace BBMap to MEGARes with groot for ARGene profiling (#51)
* Replace MEGARes with Groot for ARGene profiling
* Remove MEGARes stuff from config.yaml
* Add note about groot to changelog
* Fix output folder issues for groot
* Swap plots and graph directories
* Update remove_human resource requirements in rackham profile
* Add hierarchical clustered heatmap to sketch compare (#53)
Closes#48
* Conditionally include output files (#55)
* Add first draft overview graph of StaG
* Updated long-term vision overview flowchart
* Change interface for plot filenames in sketch_compare
* Change rule inclusions to outfile inclusion. Fix metaphlan2 invocation
* Update docs. Add overview graph
* Update CHANGELOG
* Update CONTRIBUTING
* Check for length of SAMPLES. Closes#45
* Update README
* Add Kraken2 rules, and docs
* Fix kraken2 logging
* Add note about download_minikraken2
* Fix double include of metaphlan2.smk
* Add first test of report functionality
* Add report to changelog
* Add report subsection to Running section of docs
* Expand workflow intro paragraph in report
* Replace stag.html with report.html in docs
* Fix stag-mwc link target in workflow.rst
* Add onstart, onsuccess, onerror handlers, and email messages
* Add email notifications to CHANGELOG
* Add automatic report generation
* Remove removed metaphlan2 double inclusion from CHANGELOG.
* Change version number to 0.3.0-beta
2. Single `yaml` or `json`[cluster config files](http://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html?highlight=cluster-config#cluster-configuration).
66
71
72
+
67
73
### docs
68
74
The documentation for the project is built automatically by
69
75
[readthedocs](www.readthedocs.org) upon every commit. The HTML documentation is
@@ -72,6 +78,7 @@ documentation, but avoid committing anything but source documents to the repo.
72
78
The documentation is written using Sphinx, so all documentation sources are
73
79
written in [reStructuredText](http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html).
74
80
81
+
75
82
### envs
76
83
The `envs` folder contains conda environments for the workflow. The ambition is
77
84
that all dependencies should be included in the main `stag-mwc.yaml`
@@ -80,31 +87,37 @@ of conda environments in total. It is absolutely preferable if all tools used
80
87
in the workflow are available via conda (either default channels, or bioconda,
81
88
conda-forge, etc.).
82
89
90
+
83
91
### rules
84
92
All workflow rules are organized in the `rules` folder. It contains a directory
85
93
hierarchy organized by overall function in the workflow, e.g., the subfolder
86
94
`taxonomic_profiling` contains rules for all taxonomic profiling tools. It is
87
95
recommended to keep one file per logical unit or tool, so they can be easily
88
-
toggled by a simple if-statement in the main Snakefile.
96
+
added by a single ``include:`` in the main Snakefile.
89
97
90
98
The overall concept of StaG-mwc is that analyses are performed on trimmed/cleaned
91
99
reads that have had human sequences removed, so rules should generally start
92
-
with the clean FASTQ files output from the `remove_human` step. This is of course
93
-
only a general recommendation, and some tools require the raw reads for their
94
-
analysis.
95
-
96
-
Each rule file should define the expected output files of that module and add
97
-
them to the `all_outputs` object, defined in the main Snakefile. This is
98
-
designed to allow some inclusion logic in the main Snakefile, so components can
99
-
be turned on or off without too much trouble. Output should typically be in a
100
+
with the clean FASTQ files output from the `remove_human` step. This is of
101
+
course only a general recommendation, and some tools naturally require the raw
102
+
reads for their analysis.
103
+
104
+
Each rule file should define the expected output files of that module and
105
+
conditionally add them to the `all_outputs` object defined in the main
106
+
Snakefile. Wrap adding of the files to the ``all_outputs`` list in an
107
+
if-statement conditioned on the booleans defined in ``config.yaml`` under the
108
+
``Pipeline steps included`` section. This is the preferred way, as it makes
109
+
Snakemake aware of all rules, and uses its own dependency resolution engine to
110
+
figure out the rule graph to produce the desired output files. This way, users
111
+
can easily change which output files they want in ``config.yaml`` in an easy
112
+
way, and Snakemake figures out the rest. Output should typically be in a
100
113
subfolder inside the overall `outdir` folder. `outdir` is available as a string
101
114
in all rule files, as it is defined in the main Snakefile based on the value
102
115
set in `config.yaml`.
103
116
104
117
Declare paths to input, output and log files using the pathlib Path objects
105
118
INPUTDIR, OUTDIR, and LOGDIR. Note that Snakemake is not yet fully pathlib
106
119
compatible so convert Path objects to strings inside `expand` statements and
107
-
log file declarations.
120
+
log file declarations. In future versions of Snakemake this will not be necessary.
108
121
109
122
Tools that require databases or other reference material to work can be
110
123
confusing or annyoing to users of the workflow. To minimize the amount of
@@ -121,10 +134,12 @@ The `scripts` folder contains all scripts required by workflow rules. These
121
134
are typically read summarization or plotting scripts, but anything that is
122
135
used by rules that aren't specifically rules themselves should go in here.
123
136
137
+
124
138
### utils
125
139
The `utils` folder contains auxiliary scripts or tools that are useful in the
126
140
context of StaG-mwc, but are not necessarily used directly by the workflow.
127
141
142
+
128
143
### config.yaml
129
144
The configuration file is the main point of configuration of StaG-mwc. It
130
145
should include reasonable default values for all important settings for the
@@ -141,6 +156,7 @@ The following sections reflect the folder structure inside the `rules` folder,
141
156
and are organized by tool name. If the same tool is used in several steps, it
142
157
is recommended to choose a more descriptive name.
143
158
159
+
144
160
### Snakefile
145
161
`Snakefile` is the main workflow script. This is where all the different rules
146
162
defined in the `rules` folder are included into the overall Snakemake workflow.
0 commit comments