Skip to content

Commit 1bf9516

Browse files
authored
Release 0.3 (#59)
* Use pathlib in Snakefile * Add logdir config param. Get tired because Snakemake doesn't support Path objects as input or log files. * Use pathlib for all paths. Add version printout to Snakefile * Add info about pathlib use * Add details on branching structure to CONTRIBUTING.md * Bump docs version * Fix dbdir typo in error message of remove_human * Enable BBMap to multiple databases (#35) * Add support for mapping to multiple Bowtie2 databases (#36) Closes #33 * First draft of CHANGELOG.md (#38) Closes #37 * Rackham profile (#39) * Fix issues with local rules not being local on Rackham * New Slurm profile for Rackham based on Snakemake-Profiles/slurm Closes #32, at least for now. It might be reopened a later date. * Bump version to 0.1.2-dev, update CHANGELOG * Add note about --conda-prefix and editing rackham.yaml to set slurm project * [docs] add info about mapping to multiple databases * Update docs (#40) * Add note about --conda-prefix and editing rackham.yaml to set slurm project * [docs] add info about mapping to multiple databases Closes #34 #33 * Add HUMAnN2 functional profiling * Add rules/functional_profiling to gitignore * Update CHANGELOG, README * Make MetaPhlAn2 dependency for HUMAnN2 rule explicit, and enforce even if user sets metaphlan2:False in config.yaml * Add mention about MPA2 always being run if HUMAnN2 is enabled * Add wording about HUMAnN2 and MetaPhlAn2 in changelog. * Add download_humann2_databases to docs * Make note in docs about updating config.yaml after downloading databases * Change count table rule docstring to Bowtie2 * Minor modifications to CONTRIBUTING * Change sketch.sh cpus to 4 * Specify lineage in Kaiju summary reports * Replace BBMap to MEGARes with groot for ARGene profiling (#51) * Replace MEGARes with Groot for ARGene profiling * Remove MEGARes stuff from config.yaml * Add note about groot to changelog * Fix output folder issues for groot * Swap plots and graph directories * Update remove_human resource requirements in rackham profile * Add hierarchical clustered heatmap to sketch compare (#53) Closes #48 * Conditionally include output files (#55) * Add first draft overview graph of StaG * Updated long-term vision overview flowchart * Change interface for plot filenames in sketch_compare * Change rule inclusions to outfile inclusion. Fix metaphlan2 invocation * Update docs. Add overview graph * Update CHANGELOG * Update CONTRIBUTING * Check for length of SAMPLES. Closes #45 * Update README * Add Kraken2 rules, and docs * Fix kraken2 logging * Add note about download_minikraken2 * Fix double include of metaphlan2.smk * Add first test of report functionality * Add report to changelog * Add report subsection to Running section of docs * Expand workflow intro paragraph in report * Replace stag.html with report.html in docs * Fix stag-mwc link target in workflow.rst * Add onstart, onsuccess, onerror handlers, and email messages * Add email notifications to CHANGELOG * Add automatic report generation * Remove removed metaphlan2 double inclusion from CHANGELOG. * Change version number to 0.3.0-beta
1 parent 56e700d commit 1bf9516

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+2892
-664
lines changed

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
*
2-
!config.yaml
32
!cluster_configs
43
!cluster_configs/*
54
!cluster_configs/*/*
5+
!config.yaml
66
!docs
77
!docs/Makefile
88
!docs/source
@@ -17,12 +17,16 @@
1717
!.gitignore
1818
!LICENSE.md
1919
!README.md
20+
!report
21+
!report/*
2022
!resources
2123
!resources/*
2224
!rules
2325
!rules/*
2426
!rules/antibiotic_resistance
2527
!rules/antibiotic_resistance/*
28+
!rules/functional_profiling
29+
!rules/functional_profiling/*
2630
!rules/mappers
2731
!rules/mappers/*
2832
!rules/preproc

CHANGELOG.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Changelog
2+
All notable changes to this project will be documented in this file.
3+
4+
The format is inspired by [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
5+
and this project loosely adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
6+
The version numbering scheme consists of three numbers separated by dots:
7+
`major.minor.patch`. Major versions are incremented when a substantial change
8+
in overall functionality of StaG-mwc is introduced. Minor versions are
9+
incremented for any modifications to the interface or output files (i.e.
10+
changes that would likely lead to different output for end-users when
11+
re-running an analysis, either by giving error messages or changing output
12+
files), and the patch version is typically incremented for any set of changes
13+
committed to the master branch that does not trigger any of the aforementioned
14+
situations.
15+
16+
## [Unreleased]
17+
### Added
18+
- Added CHANGELOG.md
19+
- New functionality to run mappers several times against different databases,
20+
based on a list of reference databases to map against in the config file.
21+
- Functional profiling using HUMAnN2. Will automatically run all
22+
MetaPhlAn2-associated rules to produce taxonomic profiles for use in HUMAnN2.
23+
- Added Overview page to documentation that includes a draft of a simplified
24+
graph overview of the workflow (including some unfinished parts).
25+
- Added rules to run Kraken2.
26+
- Added onstart, onerror, and onsuccess messages.
27+
- Added `email` functionality. The workflow can now automatically send an email
28+
after a successful or failed run.
29+
- Added automatic report generation upon successful workflow completion.
30+
31+
### Changed
32+
- Substantial improvements to Rackham Slurm profile, focusing on better Slurm
33+
log handling.
34+
- A few low-impact rules that can be run locally are now declared as localrules.
35+
- Replaced MEGARes antibiotic resistance gene mapping with GROOT resistance
36+
gene profiling using gene variation graphs, using a default database based on
37+
arg-annot.
38+
- Added clustered sketch comparison output heatmap.
39+
- Updated MetaPhlAn2 to version 2.7.8, with corresponding changes to config file.
40+
41+
### Fixed
42+
- Fixed error handling if hg19 database is missing for the remove human step.
43+
44+
### Removed
45+
46+
47+
## [0.1.1-dev] - 2018-04-30
48+
### Added
49+
50+
### Changed
51+
- Started using Python's pathlib module for Snakefile rule input, output, and
52+
log file declarations. Some unsightly explicit string conversions still remain,
53+
due to Snakemake not being fully compatible with pathlib (yet).
54+
- Add details about branching structure/strategy to CONTRIBUTING.md
55+
56+
### Removed
57+
58+
59+
## [0.1.0-dev] - 2018-04-30
60+
First public release
61+
62+
### Added
63+
- First public release of StaG-mwc!
64+
- Snakemake workflow capable of read preprocessing, rudimentary sequencing
65+
depth assessment using kmer uniqueness counting, naive sample comparison using
66+
MinHash sketches, mapping to user-defined databases using BBMap or Bowtie2
67+
(with customizable read count summarization per annotated feature), taxonomic
68+
profiling using Centrifuge, Kaiju, or MetaPhlAn2, and basics required for
69+
antibiotic resistance gene detection using MEGARes.
70+
- First public draft of docs, available at https://stag-mwc.readthedocs.org.

CONTRIBUTING.md

Lines changed: 39 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ bug reports, feature requests, or general improvement discussion topics.
88
The main branch of StaG-mwc should always be stable and reliable. All
99
development should be based on the develop branch. Please create new feature
1010
branches from the develop branch. The develop branch is then merged into the
11-
master branch when enough improvements have accrued. The typical procedure to
12-
develop new features or fix bugs in StaG-mwc looks something like this:
11+
master branch when enough improvements have accumulated. The typical procedure
12+
to develop new features or fix bugs in StaG-mwc looks something like this:
1313

1414
1. Fork or clone the repository.
1515
2. Checkout the develop branch and create a new feature branch from there.
@@ -23,23 +23,27 @@ develop new features or fix bugs in StaG-mwc looks something like this:
2323
config.yaml file.
2424
4. If a new feature has been added, document it in the Sphinx documentation.
2525
4. Commit changes to your fork/clone.
26-
5. Create a pull request (PR) with some descriptions of the work you have
27-
done and possibly some explanations for potentially tricky bits.
28-
6. When the feature is considered complete, we bump the version number and
29-
merge the PR back to the develop branch.
26+
5. Create a pull request (PR) to the develop branch with some descriptions of
27+
the work you have done and possibly some explanations for potentially tricky
28+
bits.
29+
6. When the feature is considered complete, we bump the version number depending
30+
on the size and impact of the PR before merging the PR to the develop branch.
31+
3032

3133
### Releases
32-
New releases are made whenever enough new features have accrued on the develop
33-
branch. Before creating a new release, ensure the following things have been
34-
taken care of:
34+
New releases are made whenever enough new features have accumulated on the
35+
develop branch. Before creating a new release, create a staging branch off of
36+
the develop branch, and ensure the following things have been taken care of:
3537

3638
* All pending features that should be included in the upcoming release are
37-
merged into the develop branch.
38-
* Double check that documentation is up-to-date for implemented features.
39+
merged.
40+
* Double check that documentation is available and up-to-date for implemented
41+
features.
3942
* Check that the version number in the documentation matches the Snakefile.
4043

41-
Then, merge the develop branch into master, squashing all commits, and tag
42-
the new release.
44+
Then, merge the staging branch into master, squashing all commits, and tag
45+
the new release. Afterwards, merge the staging branch back to develop so all
46+
changes in the staging branch are present in develop.
4347

4448

4549
## Code organization
@@ -58,12 +62,14 @@ git repository:
5862
README.md # The README shown in the github repo
5963
Snakefile # The main workflow script
6064

65+
6166
### cluster_configs
6267
The `cluster_configs` directory should contain either:
6368

6469
1. Folders representing entire [Snakemake cluster profiles](https://snakemake.readthedocs.io/en/stable/executable.html#profiles).
6570
2. Single `yaml` or `json` [cluster config files](http://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html?highlight=cluster-config#cluster-configuration).
6671

72+
6773
### docs
6874
The documentation for the project is built automatically by
6975
[readthedocs](www.readthedocs.org) upon every commit. The HTML documentation is
@@ -72,6 +78,7 @@ documentation, but avoid committing anything but source documents to the repo.
7278
The documentation is written using Sphinx, so all documentation sources are
7379
written in [reStructuredText](http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html).
7480

81+
7582
### envs
7683
The `envs` folder contains conda environments for the workflow. The ambition is
7784
that all dependencies should be included in the main `stag-mwc.yaml`
@@ -80,31 +87,37 @@ of conda environments in total. It is absolutely preferable if all tools used
8087
in the workflow are available via conda (either default channels, or bioconda,
8188
conda-forge, etc.).
8289

90+
8391
### rules
8492
All workflow rules are organized in the `rules` folder. It contains a directory
8593
hierarchy organized by overall function in the workflow, e.g., the subfolder
8694
`taxonomic_profiling` contains rules for all taxonomic profiling tools. It is
8795
recommended to keep one file per logical unit or tool, so they can be easily
88-
toggled by a simple if-statement in the main Snakefile.
96+
added by a single ``include:`` in the main Snakefile.
8997

9098
The overall concept of StaG-mwc is that analyses are performed on trimmed/cleaned
9199
reads that have had human sequences removed, so rules should generally start
92-
with the clean FASTQ files output from the `remove_human` step. This is of course
93-
only a general recommendation, and some tools require the raw reads for their
94-
analysis.
95-
96-
Each rule file should define the expected output files of that module and add
97-
them to the `all_outputs` object, defined in the main Snakefile. This is
98-
designed to allow some inclusion logic in the main Snakefile, so components can
99-
be turned on or off without too much trouble. Output should typically be in a
100+
with the clean FASTQ files output from the `remove_human` step. This is of
101+
course only a general recommendation, and some tools naturally require the raw
102+
reads for their analysis.
103+
104+
Each rule file should define the expected output files of that module and
105+
conditionally add them to the `all_outputs` object defined in the main
106+
Snakefile. Wrap adding of the files to the ``all_outputs`` list in an
107+
if-statement conditioned on the booleans defined in ``config.yaml`` under the
108+
``Pipeline steps included`` section. This is the preferred way, as it makes
109+
Snakemake aware of all rules, and uses its own dependency resolution engine to
110+
figure out the rule graph to produce the desired output files. This way, users
111+
can easily change which output files they want in ``config.yaml`` in an easy
112+
way, and Snakemake figures out the rest. Output should typically be in a
100113
subfolder inside the overall `outdir` folder. `outdir` is available as a string
101114
in all rule files, as it is defined in the main Snakefile based on the value
102115
set in `config.yaml`.
103116

104117
Declare paths to input, output and log files using the pathlib Path objects
105118
INPUTDIR, OUTDIR, and LOGDIR. Note that Snakemake is not yet fully pathlib
106119
compatible so convert Path objects to strings inside `expand` statements and
107-
log file declarations.
120+
log file declarations. In future versions of Snakemake this will not be necessary.
108121

109122
Tools that require databases or other reference material to work can be
110123
confusing or annyoing to users of the workflow. To minimize the amount of
@@ -121,10 +134,12 @@ The `scripts` folder contains all scripts required by workflow rules. These
121134
are typically read summarization or plotting scripts, but anything that is
122135
used by rules that aren't specifically rules themselves should go in here.
123136

137+
124138
### utils
125139
The `utils` folder contains auxiliary scripts or tools that are useful in the
126140
context of StaG-mwc, but are not necessarily used directly by the workflow.
127141

142+
128143
### config.yaml
129144
The configuration file is the main point of configuration of StaG-mwc. It
130145
should include reasonable default values for all important settings for the
@@ -141,6 +156,7 @@ The following sections reflect the folder structure inside the `rules` folder,
141156
and are organized by tool name. If the same tool is used in several steps, it
142157
is recommended to choose a more descriptive name.
143158

159+
144160
### Snakefile
145161
`Snakefile` is the main workflow script. This is where all the different rules
146162
defined in the `rules` folder are included into the overall Snakemake workflow.

README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# StaG Metagenomic Workflow Collaboration (mwc)
22

3-
[![Snakemake](https://img.shields.io/badge/snakemake-≥3.12.0-brightgreen.svg)](https://snakemake.bitbucket.io)
4-
[![Build Status](https://travis-ci.org/snakemake-workflows/mwc.svg?branch=master)](https://travis-ci.org/snakemake-workflows/mwc)
3+
[![Snakemake](https://img.shields.io/badge/snakemake-≥4.8.1-brightgreen.svg)](https://snakemake.bitbucket.io)
4+
<!--[![Build Status](https://travis-ci.org/snakemake-workflows/mwc.svg?branch=master)](https://travis-ci.org/snakemake-workflows/mwc) -->
55
![StaG mwc logo](docs/source/img/stag_head_text.png "StaG mwc")
66

77
This repo contains the code for a Snakemake workflow of the StaG Metagenomic
@@ -26,12 +26,8 @@ base environment. Conda will automatically install the required versions of
2626
all tools required to run StaG-mwc.
2727

2828
### Step 1: Install workflow
29-
<!--download and extract the [latest release](https://github.com/snakemake-workflows/mwc/releases). -->
30-
31-
If you simply want to use this workflow, clone the repository: `git clone
32-
git@github.com:boulund/mwc`. If you intend to modify or further develop this
33-
workflow, you are welcome to fork this reposity. Please consider sharing
34-
potential improvements via a pull request.
29+
To use StaG-mwc, you need a local copy of the workflow repository. Start by
30+
making a clone of the repository: `git clone git@github.com:boulund/stag-mwc`.
3531

3632
If you use StaG-mwc in a publication, please credit the authors by citing
3733
the URL of this repository and, when available, its DOI. Also, don't forget to
@@ -72,13 +68,17 @@ previously downloaded databases are reused. See the
7268

7369
## Testing
7470
Tests are currently not implemented. The ambition is that mwc will contain
75-
extensive tests to verify functionality. They should be executed via continuous
76-
integration with Travis CI.
71+
extensive tests to verify functionality. We plan to implement automated linting
72+
and testing on a small test data set via continuous integration.
7773

7874

7975
## Contributing
80-
Refer to the contributing guidelines in `CONTRIBUTING.md` for instructions on how to
81-
contribute to StaG-mwc.
76+
Refer to the contributing guidelines in `CONTRIBUTING.md` for instructions on
77+
how to contribute to StaG-mwc.
78+
79+
If you intend to modify or further develop this workflow, you are welcome to
80+
fork this reposity. Please consider sharing potential improvements via a pull
81+
request.
8282

8383
# Logo attribution
8484
<a href="https://www.freepik.com/free-photos-vectors/animal">Animal vector created by Patrickss - Freepik.com</a>

0 commit comments

Comments
 (0)