Skip to content

greninger-lab/Tpallidum_WGS_Pipeline

Repository files navigation

Tpallidum WGS

This pipeline is intended for assembly and annotation of Treponema pallidum whole genomes.

This pipeline takes gzipped fastq files and outputs consensus fastas annotated with Prokka. Running on the cloud is recommended due to memory-intensive mapping steps.

Table of Contents

Installation

  1. Install nextflow.
    • Make sure you move nextflow to a directory in your PATH variable.
  2. Install docker.
  3. If running on the cloud setup setup nextflow tower

Warning

Newer versions of java may have issues with spades, java 17.0.5 has been tested and works.

Usage

This pipeline takes the location of gzipped fastqs as the input, no metadatafile required.

Note

Fastqs must be gzipped

In your fastq directory run

gzip *.fastq

to gzip all fastqs.

Note

For paired fastqs the correct naming format is Read 1: {Base}_R1.fastq.gz Read 2: {Base}_R2.fastq.gz

Options

List commands for the pipline:

Command Description
--INPUT Input folder where gzipped fastqs are located. For current directory, ./ can be used.
--OUTDIR Output folder for files produced from pipeline.
-resume nextflow will pick up where it left off if the previous command was interrupted for some reason.
-with-trace Outputs a trace.txt that shows which processes end up in which work/ folders.
--REFERENCE Reference used to map samples to, default is SS14 (NC_021508), options are:
SS14 (NC_021508)
Nichols (NC_021490)
Endemicum (NZ_CP007548)
Pertenue (NC_016842)
--SKIP_DENOVO If running off the cloud, skips denovo assembly and generates fasta from mapping reads to reference
-profile standard: For less computationally intensive systems run locally, not reccommended
Cloud: For running on the cloud adds more computational power for memory intensive steps, recommended
-c Add you nextflow config file to access cloud
-with-tower Monitor your run with nextflow tower

Example Cloud:

nextflow run greninger-lab/Tpallidum_WGS_Pipeline -r main \
	--INPUT Example_Fastq/ \
	--OUTDIR Example_Output/ \
	-c your_nextflow_aws.config \
	-profile Cloud \
	-with-tower 

Example Cloud and custom reference:

nextflow run greninger-lab/Tpallidum_WGS_Pipeline -r main \
	--INPUT Example_Fastq/ \
	--OUTDIR Example_Output/ \
	--REFERENCE Nichols \
	-c your_nextflow_aws.config \
	-profile Cloud \
	-with-tower 

Example Local with skip denovo:

nextflow run greninger-lab/Tpallidum_WGS_Pipeline -r main \
	--INPUT Example_Fastq/ \
	--OUTDIR Example_Output/ \
	-profile standard \
	--SKIP_DENOVO

Workflow

graph TD;
    trimReads-->filterTp;
    filterTp-->mapUnmatchedReads;
    mapUnmatchedReads-->moreFiltering;
    moreFiltering-->mapReads;
    mapReads-->samToBam;
    samToBam-->removeDuplicates;
    removeDuplicates-->callVariants;
    callVariants-->deNovoAssembly;
    deNovoAssembly-->mergeAssemblyMapping;
    mergeAssemblyMapping-->remapReads;
    remapReads-->pilonPolishing;
    pilonPolishing-->remapPilon;
    remapPilon-->generatePilonConsensus;
    generatePilonConsensus-->annotatePilonConsensus;
    annotatePilonConsensus-->annotateVCFs;
    callVariants-- SKIP_DENOVO Option -->remapReads; 

Example

Install sratoolkit

On Mac you can use brew:

brew install sratoolkit

Make an Example_Fastq folder

mkdir Example_Fastq 

Enter Example folder

cd Example_Fastq 

Download sample SRR24317982 from SRA and place it in the Example_Fastq folder.

fasterq-dump SRR24317982 --split-files

Rename Files

 mv SRR24317982_1.fastq SRR24317982_R1.fastq
 mv SRR24317982_2.fastq SRR24317982_R2.fastq

gzip files

gzip *.fastq

Exit folder and use one of the example workflows to run example

cd ..

Output

Example_Output/
├── TPA_filtered_fastqs                            # Unmatched reads from rRNA filter mapped TPA genome
├── VCF_Annotations                                # Annotations of snps with Snippy
│   └── SRR24317982
│       └── reference
│           ├── genomes
│           └── ref
├── deduped_bams                                    # Bams deduplicated with Picard
├── deduped_fastqs                                  # Fastqs deduplicated with Picard 
├── extra_filtered_fastqs_for_denovo                # fastqs filtered to map to tp reference with rRNA filter
├── finalconsensus_pilon_prokka_annotations         # Assembled fastas annotated with Prokka
├── finalconsensus_v2                               # Final assembled fastas
├── firstmap_sorted_bam                             # Reads that map to reference after filtering as bam file
├── mapSams                                         # Reads that map to reference after filtering as sam file
├── merged_assembly_mapping_consensus               # Merges assembly and mapping to make consensus sequence
├── pilon                                           # assemblies run through Pilon to fix misassemblies
├── rRNA_filtered_fastqs                            # First round of trimming mapping to rRNA
├── remapped_bams                                   # remap reads to assembly
│   └── SRR24317982
├── remapped_pilon_bams                             # remap reads to pilon assembly
│   └── SRR24317982
├── scaffold_bams                                   # Merges assembly and mapping to make consensus sequence 
├── trimmed_fastq                                   # Adapter trimmed fastqs using trimmomatic
├── unicycler_output                                # De novo assemble matched reads with Unicycler
│   └── SRR24317982
└── vcfs                                            # Variant annotation 

Note

If running this pipeline for Greninger Lab purposes please reference the guidelines for naming conventions and storage of fastqs and assemblies available in this directory: DRAFT Guidelines for TP sample and data storage.docx

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published