Skip to content

garcia-nacho/FHI_SC2_Pipeline_Nanopore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FHI's SARS-CoV-2 Nanopore Pipeline

Bioinformatic pipeline for SARS-CoV-2 sequence analysis used at the Folkehelseinstituttet

Description

Docker-based solution for sequence analysis of SARS-CoV-2 Nanopore samples

Primer schemes supported

ArticV3
ArticV4 Midnight

Installation

git clone https://github.com/garcia-nacho/FHI_SC2_Pipeline_Nanopore
cd FHI_SC2_Pipeline_Nanopore
docker build -t garcianacho/fhisc2:Nanopore .

Note that building the image for the first time can take up to two hours.

Alternativetly, it is posible to pull updated builds from Dockerhub:

docker pull garcianacho/fhisc2:Nanopore

Running the pipeline

ArticV4:
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore SARS-CoV-2_Nanopore_Docker_V12.sh ArticV4

ArticV3:
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore SARS-CoV-2_Nanopore_Docker_V12.sh ArticV3

Midnight:
docker run -it --rm -v $(pwd):/home/docker/Fastq garcianacho/fhisc2:Nanopore SARS-CoV-2_Nanopore_Docker_V12.sh Midnight

Note that older versions of docker might require the flag --privileged and that multiuser systems might require the flag -u 1000 to run

The script expects the following folder structure where the fastq.gz files are placed inside independent folders for each Sample

./_   
  |-ExperimentXX.xlsx
  |-GridXXX
     |-OppsettXXX
           |-XXXXXXXXFAXXXXXXXXXX
               |-sequencing_summary_FAXXXXX.txt
               |-fastq_pass
                     |-barcode1
                            |-XXXX_pass_barcode01_XXXX.fastq
                            |-YYYY_pass_barcode01_YYYY.fastq
                     |-barcode2
                     |-barcode3
                     |-....

The script also expects a .xlsx file, that contains information about the position of the samples on a 96-well-plate, the links between Barcodes and sequenceID and the DNA concentration (alternatively this column can be used for the Ct-values). It is possible to download a template of the xlsx file here

Outputs

-Summary including mutations found, pangolin lineage, number of reads, coverage, depth, etc...
-Bam files
-Consensus sequences
-Aligned consensus sequences
-Consensus nucleotide sequence for gene S
-Indels and frameshift identification run against FHIs frameshift-database
-Quality-control plot for the plate to detect possible contaminations
-Phylogenetic-tree plot of the samples
-Noise during variant calling across the genome
-Quality-control for contaminations/low-quality samples
-Amplicon efficacy of the selected primer-set for all the samples