Ancient metagenome assembly pipeline

Introduction

This is basic pipeline to de-novo assemble metagenomic sequencing data. It is written in Snakemake and it was adapted to include ancient DNA specific steps, such as the evaluation of the presence of ancient DNA damage using PyDamage that are usually absent in metagenome assembly pipelines. This pipeline heavily borrows ideas from the Nextflow pipeline MADMAN, which was deprecated and its ideas further developed in the Nextflow pipeline nf-core/mag.

Quick start

To be able to run the pipeline, Snakemake with a minimal version of 7.0 is necessary. The easiest way to install the dependencies of this program and to have reproducible results is to create a new conda environment using the environment file provided with it.

wget https://raw.githubusercontent.com/alexhbnr/ancient_metagenome_assembly/main/environment.yml
conda env create -f environment.yml

After activating the environment using

conda activate ancient_metagenome_assembly

the necessary Python environment has been created.

Next, the pipeline itself needs to be downloaded. This can be easily down by cloning this repository to your computer via git

git clone https://github.com/alexhbnr/ancient_metagenome_assembly.git

or by downloading the zip file and extracting this

wget -O ancient_metagenome_assembly.zip https://github.com/alexhbnr/ancient_metagenome_assembly/archive/refs/heads/main.zip
unzip ancient_metagenome_assembly.zip

Finally, the configuration file and sample table have to be provided. Templates for these can be found in config/config.yaml for the configuration file and in test/samples.tsv for the sample table.

To run a test case using the sequencing data of the Chimpanzee dental calculus sample CDC010 published by James Fellow Yates et al. (PNAS, 2021; doi:10.1073/pnas.2021655118), we will need to download the FastQ files from ENA:

wget -O test/ERR3579712_1.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/ERR357/002/ERR3579712/ERR3579712_1.fastq.gz
wget -O test/ERR3579712_2.fastq.gz http://ftp.sra.ebi.ac.uk/vol1/fastq/ERR357/002/ERR3579712/ERR3579712_2.fastq.gz

To start the pipeline, we run

snakemake --use-conda --conda-prefix conda -j 8

This will automatically evaluate the entries in the configuration file config/config.yaml and use the sample CDC10 as input. The temporary files are written into the folder tmp and the results in the folder results.

By activating the options --use-conda, snakemake will download and install the programs necessary to run the pipeline via conda and store them into the folder conda. This step will only happen once, at the first time or when you change the folder for storing these conda environments using --conda-prefix.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
config		config
test		test
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

test

test

workflow

workflow

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

Repository files navigation

Ancient metagenome assembly pipeline

Introduction

Quick start

Overview of the pipeline

Input files

Parameters to adjust the pipeline

Output files

About

Releases

Packages

Languages

License

alexhbnr/ancient_metagenome_assembly

Folders and files

Latest commit

History

Repository files navigation

Ancient metagenome assembly pipeline

Introduction

Quick start

Overview of the pipeline

Input files

Parameters to adjust the pipeline

Output files

About

Resources

License

Stars

Watchers

Forks

Languages