ctmrbio
diff --git a/‎.gitignore
Lines changed: 5 additions & 1 deletion b/‎.gitignore
Lines changed: 5 additions & 1 deletion
diff --git a/‎CHANGELOG.md
Lines changed: 70 additions & 0 deletions b/‎CHANGELOG.md
Lines changed: 70 additions & 0 deletions
diff --git a/‎CONTRIBUTING.md
Lines changed: 39 additions & 23 deletions b/‎CONTRIBUTING.md
Lines changed: 39 additions & 23 deletions
diff --git a/‎README.md
Lines changed: 12 additions & 12 deletions b/‎README.md
Lines changed: 12 additions & 12 deletions
@@ -1,8 +1,8 @@
 *
-!config.yaml
 !cluster_configs
 !cluster_configs/*
 !cluster_configs/*/*
+!config.yaml
 !docs
 !docs/Makefile
 !docs/source
@@ -17,12 +17,16 @@
 !.gitignore
 !LICENSE.md
 !README.md
+!report
+!report/*
 !resources
 !resources/*
 !rules
 !rules/*
 !rules/antibiotic_resistance
 !rules/antibiotic_resistance/*
+!rules/functional_profiling
+!rules/functional_profiling/*
 !rules/mappers
 !rules/mappers/*
 !rules/preproc
 
@@ -0,0 +1,70 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+
+The format is inspired by [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
+and this project loosely adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).  
+The version numbering scheme consists of three numbers separated by dots:
+`major.minor.patch`. Major versions are incremented when a substantial change
+in overall functionality of StaG-mwc is introduced. Minor versions are
+incremented for any modifications to the interface or output files (i.e.
+changes that would likely lead to different output for end-users when
+re-running an analysis, either by giving error messages or changing output
+files), and the patch version is typically incremented for any set of changes
+committed to the master branch that does not trigger any of the aforementioned
+situations.
+
+## [Unreleased]
+### Added
+- Added CHANGELOG.md
+- New functionality to run mappers several times against different databases,
+  based on a list of reference databases to map against in the config file.
+- Functional profiling using HUMAnN2. Will automatically run all
+  MetaPhlAn2-associated rules to produce taxonomic profiles for use in HUMAnN2.
+- Added Overview page to documentation that includes a draft of a simplified
+  graph overview of the workflow (including some unfinished parts).
+- Added rules to run Kraken2.
+- Added onstart, onerror, and onsuccess messages.
+- Added `email` functionality. The workflow can now automatically send an email
+  after a successful or failed run.
+- Added automatic report generation upon successful workflow completion.
+
+### Changed
+- Substantial improvements to Rackham Slurm profile, focusing on better Slurm
+  log handling.
+- A few low-impact rules that can be run locally are now declared as localrules.
+- Replaced MEGARes antibiotic resistance gene mapping with GROOT resistance
+  gene profiling using gene variation graphs, using a default database based on
+  arg-annot.
+- Added clustered sketch comparison output heatmap.
+- Updated MetaPhlAn2 to version 2.7.8, with corresponding changes to config file.
+
+### Fixed
+- Fixed error handling if hg19 database is missing for the remove human step.
+
+### Removed
+
+
+## [0.1.1-dev] - 2018-04-30
+### Added
+
+### Changed
+- Started using Python's pathlib module for Snakefile rule input, output, and
+  log file declarations. Some unsightly explicit string conversions still remain,
+  due to Snakemake not being fully compatible with pathlib (yet).
+- Add details about branching structure/strategy to CONTRIBUTING.md
+
+### Removed
+
+
+## [0.1.0-dev] - 2018-04-30 
+First public release
+
+### Added
+- First public release of StaG-mwc! 
+- Snakemake workflow capable of read preprocessing, rudimentary sequencing
+  depth assessment using kmer uniqueness counting, naive sample comparison using
+  MinHash sketches, mapping to user-defined databases using BBMap or Bowtie2
+  (with customizable read count summarization per annotated feature), taxonomic
+  profiling using Centrifuge, Kaiju, or MetaPhlAn2, and basics required for
+  antibiotic resistance gene detection using MEGARes.
+- First public draft of docs, available at https://stag-mwc.readthedocs.org.
@@ -8,8 +8,8 @@ bug reports, feature requests, or general improvement discussion topics.
 The main branch of StaG-mwc should always be stable and reliable. All
 development should be based on the develop branch. Please create new feature
 branches from the develop branch. The develop branch is then merged into the
-master branch when enough improvements have accrued. The typical procedure to
-develop new features or fix bugs in StaG-mwc looks something like this:
+master branch when enough improvements have accumulated. The typical procedure
+to develop new features or fix bugs in StaG-mwc looks something like this:
 
 1. Fork or clone the repository.
 2. Checkout the develop branch and create a new feature branch from there.
@@ -23,23 +23,27 @@ develop new features or fix bugs in StaG-mwc looks something like this:
    config.yaml file.
 4. If a new feature has been added, document it in the Sphinx documentation.
 4. Commit changes to your fork/clone.
-5. Create a pull request (PR) with some descriptions of the work you have
-   done and possibly some explanations for potentially tricky bits.
-6. When the feature is considered complete, we bump the version number and
-   merge the PR back to the develop branch.
+5. Create a pull request (PR) to the develop branch  with some descriptions of
+   the work you have done and possibly some explanations for potentially tricky
+   bits.
+6. When the feature is considered complete, we bump the version number depending
+   on the size and impact of the PR before merging the PR to the develop branch.
+
 
 ### Releases
-New releases are made whenever enough new features have accrued on the develop
-branch. Before creating a new release, ensure the following things have been
-taken care of:
+New releases are made whenever enough new features have accumulated on the
+develop branch. Before creating a new release, create a staging branch off of
+the develop branch, and ensure the following things have been taken care of:
 
 * All pending features that should be included in the upcoming release are
-  merged into the develop branch.
-* Double check that documentation is up-to-date for implemented features.
+  merged.
+* Double check that documentation is available and up-to-date for implemented
+  features.
 * Check that the version number in the documentation matches the Snakefile.
 
-Then, merge the develop branch into master, squashing all commits, and tag
-the new release.
+Then, merge the staging branch into master, squashing all commits, and tag
+the new release. Afterwards, merge the staging branch back to develop so all
+changes in the staging branch are present in develop.
 
 
 ## Code organization
@@ -58,12 +62,14 @@ git repository:
 	README.md           # The README shown in the github repo
 	Snakefile           # The main workflow script
 
+
 ### cluster_configs
 The `cluster_configs` directory should contain either:
 
 1. Folders representing entire [Snakemake cluster profiles](https://snakemake.readthedocs.io/en/stable/executable.html#profiles).
 2. Single `yaml` or `json` [cluster config files](http://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html?highlight=cluster-config#cluster-configuration).
 
+
 ### docs 
 The documentation for the project is built automatically by
 [readthedocs](www.readthedocs.org) upon every commit. The HTML documentation is
@@ -72,6 +78,7 @@ documentation, but avoid committing anything but source documents to the repo.
 The documentation is written using Sphinx, so all documentation sources are
 written in [reStructuredText](http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html).
 
+
 ### envs
 The `envs` folder contains conda environments for the workflow. The ambition is
 that all dependencies should be included in the main `stag-mwc.yaml`
@@ -80,31 +87,37 @@ of conda environments in total. It is absolutely preferable if all tools used
 in the workflow are available via conda (either default channels, or bioconda,
 conda-forge, etc.).
 
+
 ### rules
 All workflow rules are organized in the `rules` folder. It contains a directory
 hierarchy organized by overall function in the workflow, e.g., the subfolder
 `taxonomic_profiling` contains rules for all taxonomic profiling tools. It is
 recommended to keep one file per logical unit or tool, so they can be easily
-toggled by a simple if-statement in the main Snakefile.
+added by a single ``include:`` in the main Snakefile.
 
 The overall concept of StaG-mwc is that analyses are performed on trimmed/cleaned
 reads that have had human sequences removed, so rules should generally start
-with the clean FASTQ files output from the `remove_human` step. This is of course
-only a general recommendation, and some tools require the raw reads for their
-analysis.
-
-Each rule file should define the expected output files of that module and add
-them to the `all_outputs` object, defined in the main Snakefile. This is
-designed to allow some inclusion logic in the main Snakefile, so components can
-be turned on or off without too much trouble. Output should typically be in a
+with the clean FASTQ files output from the `remove_human` step. This is of
+course only a general recommendation, and some tools naturally require the raw
+reads for their analysis.
+
+Each rule file should define the expected output files of that module and
+conditionally add them to the `all_outputs` object defined in the main
+Snakefile. Wrap adding of the files to the ``all_outputs`` list in an
+if-statement conditioned on the booleans defined in ``config.yaml`` under the
+``Pipeline steps included`` section. This is the preferred way, as it makes
+Snakemake aware of all rules, and uses its own dependency resolution engine to
+figure out the rule graph to produce the desired output files. This way, users
+can easily change which output files they want in ``config.yaml`` in an easy
+way, and Snakemake figures out the rest.  Output should typically be in a
 subfolder inside the overall `outdir` folder. `outdir` is available as a string
 in all rule files, as it is defined in the main Snakefile based on the value
 set in `config.yaml`.
 
 Declare paths to input, output and log files using the pathlib Path objects
 INPUTDIR, OUTDIR, and LOGDIR. Note that Snakemake is not yet fully pathlib
 compatible so convert Path objects to strings inside `expand` statements and
-log file declarations.
+log file declarations. In future versions of Snakemake this will not be necessary.
 
 Tools that require databases or other reference material to work can be
 confusing or annyoing to users of the workflow. To minimize the amount of
@@ -121,10 +134,12 @@ The `scripts` folder contains all scripts required by workflow rules. These
 are typically read summarization or plotting scripts, but anything that is
 used by rules that aren't specifically rules themselves should go in here.
 
+
 ### utils
 The `utils` folder contains auxiliary scripts or tools that are useful in the
 context of StaG-mwc, but are not necessarily used directly by the workflow.
 
+
 ### config.yaml
 The configuration file is the main point of configuration of StaG-mwc. It
 should include reasonable default values for all important settings for the
@@ -141,6 +156,7 @@ The following sections reflect the folder structure inside the `rules` folder,
 and are organized by tool name. If the same tool is used in several steps, it
 is recommended to choose a more descriptive name. 
 
+
 ### Snakefile
 `Snakefile` is the main workflow script. This is where all the different rules
 defined in the `rules` folder are included into the overall Snakemake workflow. 
 
@@ -1,7 +1,7 @@
 # StaG Metagenomic Workflow Collaboration (mwc)
 
-[![Snakemake](https://img.shields.io/badge/snakemake-≥3.12.0-brightgreen.svg)](https://snakemake.bitbucket.io)
-[![Build Status](https://travis-ci.org/snakemake-workflows/mwc.svg?branch=master)](https://travis-ci.org/snakemake-workflows/mwc)
+[![Snakemake](https://img.shields.io/badge/snakemake-≥4.8.1-brightgreen.svg)](https://snakemake.bitbucket.io)
+<!--[![Build Status](https://travis-ci.org/snakemake-workflows/mwc.svg?branch=master)](https://travis-ci.org/snakemake-workflows/mwc) -->
 ![StaG mwc logo](docs/source/img/stag_head_text.png "StaG mwc")
 
 This repo contains the code for a Snakemake workflow of the StaG Metagenomic
@@ -26,12 +26,8 @@ base environment. Conda will automatically install the required versions of
 all tools required to run StaG-mwc.
 
 ### Step 1: Install workflow
-<!--download and extract the [latest release](https://github.com/snakemake-workflows/mwc/releases). -->
-
-If you simply want to use this workflow, clone the repository: `git clone
-git@github.com:boulund/mwc`. If you intend to modify or further develop this
-workflow, you are welcome to fork this reposity. Please consider sharing
-potential improvements via a pull request.
+To use StaG-mwc, you need a local copy of the workflow repository. Start by
+making a clone of the repository: `git clone git@github.com:boulund/stag-mwc`. 
 
 If you use StaG-mwc in a publication, please credit the authors by citing
 the URL of this repository and, when available, its DOI. Also, don't forget to
@@ -72,13 +68,17 @@ previously downloaded databases are reused. See the
 
 ## Testing
 Tests are currently not implemented. The ambition is that mwc will contain
-extensive tests to verify functionality. They should be executed via continuous
-integration with Travis CI. 
+extensive tests to verify functionality. We plan to implement automated linting
+and testing on a small test data set via continuous integration.
 
 
 ## Contributing
-Refer to the contributing guidelines in `CONTRIBUTING.md` for instructions on how to
-contribute to StaG-mwc.
+Refer to the contributing guidelines in `CONTRIBUTING.md` for instructions on
+how to contribute to StaG-mwc.
+
+If you intend to modify or further develop this workflow, you are welcome to
+fork this reposity. Please consider sharing potential improvements via a pull
+request.
 
 # Logo attribution
 <a href="https://www.freepik.com/free-photos-vectors/animal">Animal vector created by Patrickss - Freepik.com</a>