Skip to content

UPHL-BioNGS/Donut_Falls

Repository files navigation

Donut Falls

Named after the beautiful Donut Falls

Location: 40.630°N 111.655°W, Elevation: 7,942 ft (2,421 m), Hiking level: easy

(Image credit: User submitted photos at alltrails.com)

More information about the trail leading up to this landmark can be found at utah.com/hiking/donut-falls

Donut Falls is a Nextflow workflow developed by @erinyoung at the Utah Public Health Laborotory for long-read nanopore sequencing of microbial isolates. Built to work on linux-based operating systems. Additional config options are needed for cloud batch usage.

Donut Falls is also included in the staphb toolkit staphb-toolkit.

We made a wiki, please read it!

Wiki table of contents:

Getting started

Install dependencies

Quick start

nextflow run UPHL-BioNGS/Donut_Falls -profile <singularity or docker> --sample_sheet <sample_sheet.csv>

Sample Sheets

Sample sheet is a csv file with the name of the sample and corresponding nanopore fastq.gz file on a single row with header sample and fastq. When Illumina fastq files are available for polishing or hybrid assembly, they are added to end of each row under column header fastq_1 and fastq_2.

Option 1 : just nanopore reads

sample,fastq
test,long_reads_low_depth.fastq.gz

Option 2 : nanopore reads and at least one sample has Illumina paired-end fastq files

sample,fastq,fastq_1,fastq_2
sample1,sample1.fastq.gz,sample1_R1.fastq.gz,sample1_R2.fastq.gz
sample2,sample2.fastq.gz,,

Switching assemblers

There are currently several options for assembly

These are specified with the assembler paramater. If Illumina reads are found, then flye and raven assemblies will be polished with those reads.

Note: more than one assembler can be chosen (i.e. params.assembler = 'flye,raven'). This will run the input files on each assembler listed. Listing an assembler more than once will not create additional assemblies with that tool (i.e. params.assembler = 'flye,flye,flye' will still only run the input files through flye once).

Reading the sequencing summary file

Although not used for anything else, the sequencing summary file can be read in and put through nanoplot to visualize the quality of a sequencing run. This is an optional file and can be set with 'params.sequencing_summary'.

nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sequencing_summary <sequencing summary file>
  • WARNING : Does not work with older versions of the summary file.

Examples

# nanopore assembly with flye followed by polishing if illumina files are supplied
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet sample_sheet.csv

# or with docker and specifying the assembler
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet sample_sheet.csv --assembler flye

# hybrid assembly with unicycler where both nanopore and illumina files are required
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet sample_sheet.csv --assembler unicycler

# assembling with all three asssemblers
# specifying the results to be stored in 'donut_falls_test_results' instead of 'donut_falls'
# using docker instead of singularity
nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet sample_sheet.csv --assembler unicycler,flye,raven


# using some test files (requires internet connection)
nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet sample_sheet.csv --test

# same as above
nextflow run UPHL-BioNGS/Donut_Falls -profile docker,test --sample_sheet sample_sheet.csv

Credits

Donut Falls would not be possible without

  • bandage : visualize gfa files
  • busco : assessment of assembly quality
  • bwa : aligning reads for polypolish
  • circulocov : read depth per contig
  • dnaapler : rotation
  • fastp : cleaning illumina reads (default values) and nanopore reads (minimum length = 1,000 & minimum Q = 12)
  • flye : de novo assembly (default assembler)
  • gfastats : assessment of assembly
  • medaka : polishing with nanopore reads
  • multiqc : amalgamation of results
  • nanoplot : fastq file QC visualization
  • polypolish : reduces sequencing artefacts through polishing with Illumina reads
  • pypolca : reduces sequencing artefacts through polishing with Illumina reads
  • rasusa : subsampling nanopore reads to 150X depth
  • raven : de novo assembly option (params.assembler = 'raven')
  • unicycler : hybrid assembly option (params.assembler = 'unicycler')