Skip to content

francesccoll/assembly_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

assembly_pipeline

The script assembly_pipeline.py is a computational pipeline to perform de novo assembly of bacterial genomes from Illumina paired-end reads. The pipeline is based on the assember SPAdes and the improve_assembly pipeline designed to improve the SPAdes assembly by scaffolding and gap filling.

Docker Installation

The easiest and recommended way to install and run assembly_pipeline.py is via its Docker implementation.

The Docker image is available on: https://hub.docker.com/r/francesccoll/assembly_pipeline/

Local Installation

assembly_pipeline.py is Python script that would work provided that all required dependencies below (both python modules and software) are installed in your local machine.

Required dependencies

Software

Python Modules

Usage

usage: assembly_pipeline.py [-h] -1 FASTQ1_FILE -2 FASTQ2_FILE -i SAMPLE_ID -r
                            RESULTS_DIR [-d DELETE_TMP] [--version]
                            [-t THREADS] [-s SPADES_DIR] [-m IMPROVED_DIR]

Pipeline for bacterial de novo assembly using Spades and improve_assembly from
paired Illumina data

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  -1 FASTQ1_FILE, --forward_reads FASTQ1_FILE
                        fastq file with forward reads
  -2 FASTQ2_FILE, --reverse_reads FASTQ2_FILE
                        fastq file with reverse reads
  -i SAMPLE_ID, --sample_id SAMPLE_ID
                        sample id used as prefix to name output files
  -r RESULTS_DIR, --results_dir RESULTS_DIR
                        directory to store pipeline's final assembly

optional arguments:
  -d DELETE_TMP, --delete_tmp DELETE_TMP
                        delete assembly files (except for contigs.fa)
  --version             show program's version number and exit

spades arguments (optional):
  -t THREADS, --spades_threads THREADS
                        number of threads used by Spades
  -s SPADES_DIR, --spades_dir SPADES_DIR
                        directory to store Spades resulting files
  -m IMPROVED_DIR, --improved_dir IMPROVED_DIR
                        directory to store improve_assembly resulting files

Usage using the Docker image

docker run --volume=/path/to/fastq/files/:/data francesccoll/assembly_pipeline:amd64 assembly_pipeline.py --forward_reads /data/sampleId_1.fastq.gz --reverse_reads /data/sampleId_2.fastq.gz --sample_id sampleId --spades_threads 8 --results_dir /data/sampleId/

NOTE: --results_dir must be specified when running the Docker image for the output assembly files to be saved locally

License

assembly_pipeline.py is a free software, licensed under GNU General Public License v3.0

Feedback/Issues

Use the issues page to report on installation and usage issues.

Citations

Not available yet

About

Pipeline for de novo assembly of bacterial genomes from paired-end Illumina reads

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published