Pappenheim ONT-SARS-CoV-2

Pipeline for sequencing of SARS-CoV-2 on local workstations with realtime-basecalling (GPU). This pipeline is based on snakemake and uses conda. All intermediary outputs are long-pivoted.

Below is the dag for the pipeline, for a single sample:

What it does

Pappenheim is designed to trail a ONT-minknow sequencing run. When you have started sequencing in minknow, you can start this pipeline and it will progress through the following steps:

On a batch level:

Input validation of the sample sheet
Starting rampart on the rundir for realtime sequencing oversight

On a sample level:

When the sequencing and basecalling is complete, the following steps are completed as well:

Filtering of reads using artic guppyplex
Consensus calling using artic minion via nanopolish
- Using the artic nCov-2019 V3 reference scheme
Variant typing using pangolin and nextclade
Optionally: Upload the variant data to a central server which can integrate patient data (pappenheim-receiver).

When pappenheim runs, it automatically checks and installs the newest versions of pangolin and nextclade. Given that you have a working internet connection at the time of starting the pipeline.

Usage

Pappenheim needs information on the samples you are analysing and the rundir directory where the raw data is stored. Below is a description of why and how these arguments must be formatted for pappenheim.

CLI-argument: `samplesheet`

Have a .xlsx-sample sheet ready for input to the pipeline. This samplesheet should have at least to columns present: "barcode" and "sample id". The barcode column should contain barcode names in the EXP-NBD104 and EXP-NBD114 kits. The "sample id" column has specific requirements to the formatting, but please be aware that spaces will be truncated. Positive- and negative controls are recognized with the following specification: Positive controls must have a sample id starting with "seqpos". Negative controls must have a sample id ending with "neg". For both types of controls the recognition is case insensitive.

CLI-argument: `rundir`

Because pappenheim is designed to trail the output of minknow, you must first have started the sequencing (including high accuracy basecalling) in the minknow user interface and decided where the output should be written. Pappenheim needs to know this rundir such that it can start taking the correct reads for downstream analysis.

Starting the pipeline

When you have a sample sheet ready and know the rundir, you can start the pappenheim pipeline:

Because pappenheim starts rampart automatically based on the given rundir, you should start pappenheim just after minknow has started sequencing.

cd ~/pappenheim
snakemake --config samplesheet='path/to/samplesheet.csv' rundir='path/to/minknow-output/'

First, the pappenheim pipeline validates the sample sheet and checks that the necessary columns exists and are correctly formatted. It also checks that the barcodes are unique. It then proceeds to check that the rundir exists. If it doesn't, pappenheim waits a few minutes and tries again.

When the sequencing is done in minknow, minknow writes a specific file to the rundir: "sequencing_summary_*.txt". This file is necessary for consensus calling and thus pappenheim can only start when minknow is done basecalling.

Installation

Install conda (if you haven't already):

cd ~
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh

We recommend to use mamba (inside conda) for installing packages onto the miniconda environment.

Hint: mamba is simply faster than the stock conda dependency solver and installer.
```
conda install mamba -n base -c conda-forge
```

Clone this repository

# install pappenheim in the home directory
cd ~

# Make sure you clone recursively, in order for the artic and pangolin submodules to be included in the download.
git clone --recurse-submodules https://github.com/KMA-Aarhus/pappenheim.git

Install snakemake and other dependencies needed for pappenheim:

We recommend that you install pappenheim in its own enclosed environment. The following command call creates a conda environment, and install pappenheim into it.
```
conda create -n pappenheim && conda activate pappenheim && mamba env update --file ~/pappenheim/environment.yaml 
```

Please create an issue (in this repo) if you encounter any problems or unanswered questions during installation or use.

Caveats

There is a few updater-rules (pangolin_updater and nextclade_updater), which run once each time the pipeline is started. The idea is to update the environment-yaml file that the internal conda handler uses to build conda environments for the jobs. Unfortunately, snakemake only decides whether to update conda environments before building the job-graph. This means that even though the environment-yaml files are updated before the pangolin and nextclade jobs are run, the actual environments are only updated next time the pipeline is run. The worst case scenario is, that if you have an installation which you haven't used for a long time, the pangolin/nextclade calls might be outdated. A fix would be to somehow force snakemake to reinstall those specific conda environments each time the pipeline is run.

Affiliations

Pappenheim is developed at Department of Clinical Microbiology (KMA) at Aarhus Universityhospital, Denmark, in response to the political decision of sequencing all positive SARS-CoV-2 samples. We are highly open to involving collaborators from other regions.

Name		Name	Last commit message	Last commit date
Latest commit History 368 Commits
CoverMon @ 96009c0		CoverMon @ 96009c0
artic-ncov2019 @ 7e359da		artic-ncov2019 @ 7e359da
configs		configs
default		default
documentation		documentation
envs		envs
logs		logs
opentest		opentest
pangolin @ a8ee773		pangolin @ a8ee773
primer_schemes/ONT-Midnight-nCoV-2019/V1		primer_schemes/ONT-Midnight-nCoV-2019/V1
rampart		rampart
scripts		scripts
test_data		test_data
workstation_setup		workstation_setup
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
config.yaml		config.yaml
data		data
environment.yaml		environment.yaml
snakefile		snakefile

KMA-Aarhus/pappenheim

Folders and files

Latest commit

History

Repository files navigation

Pappenheim ONT-SARS-CoV-2

What it does

On a batch level:

On a sample level:

Usage

CLI-argument: samplesheet

CLI-argument: rundir

Starting the pipeline

Installation

Caveats

Affiliations

pap2

About

Topics

Resources

Stars

Watchers

Forks

Languages

CLI-argument: `samplesheet`

CLI-argument: `rundir`