Tpallidum WGS

This pipeline is intended for assembly and annotation of Treponema pallidum whole genomes.

This pipeline takes gzipped fastq files and outputs consensus fastas annotated with Prokka. Running on the cloud is recommended due to memory-intensive mapping steps.

Installation

Install nextflow.
- Make sure you move nextflow to a directory in your PATH variable.
Install docker.
If running on the cloud setup setup nextflow tower

Warning

Newer versions of java may have issues with spades, java 17.0.5 has been tested and works.

Usage

This pipeline takes the location of gzipped fastqs as the input, no metadatafile required.

Note

Fastqs must be gzipped

In your fastq directory run

gzip *.fastq

to gzip all fastqs.

Note

For paired fastqs the correct naming format is Read 1: {Base}_R1.fastq.gz Read 2: {Base}_R2.fastq.gz

Options

List commands for the pipline:

Command	Description
`--INPUT`	Input folder where gzipped fastqs are located. For current directory, `./` can be used.
`--OUTDIR`	Output folder for files produced from pipeline.
`-resume`	nextflow will pick up where it left off if the previous command was interrupted for some reason.
`-with-trace`	Outputs a trace.txt that shows which processes end up in which work/ folders.
`--REFERENCE`	Reference used to map samples to, default is SS14 (NC_021508), options are:
	`SS14` (NC_021508)
	`Nichols` (NC_021490)
	`Endemicum` (NZ_CP007548)
	`Pertenue` (NC_016842)
`--SKIP_DENOVO`	If running off the cloud, skips denovo assembly and generates fasta from mapping reads to reference
`-profile`	`standard`: For less computationally intensive systems run locally, not reccommended
	`Cloud`: For running on the cloud adds more computational power for memory intensive steps, recommended
`-c`	Add you nextflow config file to access cloud
`-with-tower`	Monitor your run with nextflow tower

Example Cloud:

nextflow run greninger-lab/Tpallidum_WGS_Pipeline -r main \
	--INPUT Example_Fastq/ \
	--OUTDIR Example_Output/ \
	-c your_nextflow_aws.config \
	-profile Cloud \
	-with-tower

Example Cloud and custom reference:

nextflow run greninger-lab/Tpallidum_WGS_Pipeline -r main \
	--INPUT Example_Fastq/ \
	--OUTDIR Example_Output/ \
	--REFERENCE Nichols \
	-c your_nextflow_aws.config \
	-profile Cloud \
	-with-tower

Example Local with skip denovo:

nextflow run greninger-lab/Tpallidum_WGS_Pipeline -r main \
	--INPUT Example_Fastq/ \
	--OUTDIR Example_Output/ \
	-profile standard \
	--SKIP_DENOVO

Workflow

graph TD;
    trimReads-->filterTp;
    filterTp-->mapUnmatchedReads;
    mapUnmatchedReads-->moreFiltering;
    moreFiltering-->mapReads;
    mapReads-->samToBam;
    samToBam-->removeDuplicates;
    removeDuplicates-->callVariants;
    callVariants-->deNovoAssembly;
    deNovoAssembly-->mergeAssemblyMapping;
    mergeAssemblyMapping-->remapReads;
    remapReads-->pilonPolishing;
    pilonPolishing-->remapPilon;
    remapPilon-->generatePilonConsensus;
    generatePilonConsensus-->annotatePilonConsensus;
    annotatePilonConsensus-->annotateVCFs;
    callVariants-- SKIP_DENOVO Option -->remapReads;

Example

Install sratoolkit

On Mac you can use brew:

brew install sratoolkit

Make an Example_Fastq folder

mkdir Example_Fastq

Enter Example folder

cd Example_Fastq

Download sample SRR24317982 from SRA and place it in the Example_Fastq folder.

fasterq-dump SRR24317982 --split-files

Rename Files

 mv SRR24317982_1.fastq SRR24317982_R1.fastq
 mv SRR24317982_2.fastq SRR24317982_R2.fastq

gzip files

gzip *.fastq

Exit folder and use one of the example workflows to run example

cd ..

Output

Example_Output/
├── TPA_filtered_fastqs                            # Unmatched reads from rRNA filter mapped TPA genome
├── VCF_Annotations                                # Annotations of snps with Snippy
│   └── SRR24317982
│       └── reference
│           ├── genomes
│           └── ref
├── deduped_bams                                    # Bams deduplicated with Picard
├── deduped_fastqs                                  # Fastqs deduplicated with Picard 
├── extra_filtered_fastqs_for_denovo                # fastqs filtered to map to tp reference with rRNA filter
├── finalconsensus_pilon_prokka_annotations         # Assembled fastas annotated with Prokka
├── finalconsensus_v2                               # Final assembled fastas
├── firstmap_sorted_bam                             # Reads that map to reference after filtering as bam file
├── mapSams                                         # Reads that map to reference after filtering as sam file
├── merged_assembly_mapping_consensus               # Merges assembly and mapping to make consensus sequence
├── pilon                                           # assemblies run through Pilon to fix misassemblies
├── rRNA_filtered_fastqs                            # First round of trimming mapping to rRNA
├── remapped_bams                                   # remap reads to assembly
│   └── SRR24317982
├── remapped_pilon_bams                             # remap reads to pilon assembly
│   └── SRR24317982
├── scaffold_bams                                   # Merges assembly and mapping to make consensus sequence 
├── trimmed_fastq                                   # Adapter trimmed fastqs using trimmomatic
├── unicycler_output                                # De novo assemble matched reads with Unicycler
│   └── SRR24317982
└── vcfs                                            # Variant annotation

Note

If running this pipeline for Greninger Lab purposes please reference the guidelines for naming conventions and storage of fastqs and assemblies available in this directory: DRAFT Guidelines for TP sample and data storage.docx

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bin		bin
refs		refs
.gitignore		.gitignore
DRAFT Guidelines for TP sample and data storage.docx		DRAFT Guidelines for TP sample and data storage.docx
README.md		README.md
main.nf		main.nf
modules.nf		modules.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

refs

refs

.gitignore

.gitignore

DRAFT Guidelines for TP sample and data storage.docx

DRAFT Guidelines for TP sample and data storage.docx

README.md

README.md

main.nf

main.nf

modules.nf

modules.nf

nextflow.config

nextflow.config

Repository files navigation

Tpallidum WGS

Table of Contents

Installation

Usage

Options

Workflow

Example

Output

About

Releases

Packages

Languages

greninger-lab/Tpallidum_WGS_Pipeline

Folders and files

Latest commit

History

Repository files navigation

Tpallidum WGS

Table of Contents

Installation

Usage

Options

Workflow

Example

Output

About

Resources

Stars

Watchers

Forks

Languages