Two pass benchmarking pipeline:

A snakemake pipeline to process the data for the two-pass alignment paper.

Workflow without read simulation:

Workflow with read simulation:

Install and run:

To run the pipeline you first need conda. Conda environments for each rule will be created automatically by snakemake. The environment yamls are in rules/env_yamls. If you want to use the same conda environments for all pipeline runs, rather than building them for each run, you can use the --conda-prefix parameter of snakemake to set a shared location for environments to be stored in.

To run the benchmarking pipeline for the Arabidopsis DRS data will require c.a. 75GB of disk space.

Something like:

# need to clone recursively to get submodules
git clone --recursive https://github.com/bartongroup/two_pass_alignment_pipeline.git
cd two_pass_alignment_pipeline

# a specific minimap2 bugfix is required which is not yet on bioconda
# this is contained as a submodule, which needs building
cd scripts/minimap2 && make
cd ../..

# make snakemake environment
conda env create -f 2passpipeline.yml
conda activate 2passpipeline

# download data and annotations for e.g. Arabidopsis DRS benchmarking
cd annotations/arabidopsis
./fetch_annotations.sh
cd ../../pipeline/arabidopsis_drs
./fetch_data.sh
cd ..

# set up the conda environments required to run the pipeline
snakemake -d arabidopsis_drs \
  --use-conda \
  --create-envs-only \
  --conda-prefix <shared conda env location>

# run the pipeline on Arabidopsis DRS data, using SGE cluster
# note that the cluster settings may need to be altered for your setup
snakemake -d arabidopsis_drs \
  --use-conda \
  --conda-prefix <shared conda env location> \
  --cluster "qsub -V -cwd -pe smp {threads}" \
  -j 999

# To run the pipeline without a cluster i.e. in serial:
snakemake -d arabidopsis_drs \
  --use-conda \
  --conda-prefix <shared conda env location> \
  -j 999

Requirements:

Before running:

conda
snakemake

Installed using conda/snakemake

All of these things should be installed in their own environments by snakemake so you shouldn't have to worry about them:

For 2passtools:

2passtools
python 3 (tested with 3.6)
numpy
scipy
pysam
ncls
scikit-learn
click
click-log

For yanosim:

yanosim
python 3
numpy
scipy
pysam
click

For pipeline:

minimap2 >= 2.17
samtools
stringtie >= 2.0
bedtools
gffcompare

To run FLAIR:

flair
kerneltree

TODO:

Make annotation optional so pipeline can be run on organisms without annotations.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
annotations		annotations
images		images
notebooks		notebooks
pipeline		pipeline
rules		rules
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
2passpipeline.yml		2passpipeline.yml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

annotations

annotations

images

images

notebooks

notebooks

pipeline

pipeline

rules

rules

scripts

scripts

.gitignore

.gitignore

.gitmodules

.gitmodules

2passpipeline.yml

2passpipeline.yml

README.md

README.md

Repository files navigation

Two pass benchmarking pipeline:

Install and run:

Requirements:

Before running:

Installed using conda/snakemake

TODO:

About

Releases 1

Packages

Languages

bartongroup/two_pass_alignment_pipeline

Folders and files

Latest commit

History

Repository files navigation

Two pass benchmarking pipeline:

Install and run:

Requirements:

Before running:

Installed using conda/snakemake

TODO:

About

Resources

Stars

Watchers

Forks

Languages