GitHub - gallardo-seq/MrHAMER: High accuracy single molecule Nanopore reads using the MrHAMER pipeline

Scripts for the generation of high accuracy single molecule Nanopore reads using the MrHAMER pipeline (Multi-read Hairpin Mediated Error-correction Reaction)

Dependencies

We recommend installing in a Python or Conda environment to ensure recommended versions are installed. Pipeline is compatible with Ubuntu 16 and 18 LTS.

Python 2.7.12
Python 3.5.2
Guppy basecaller v3.6.0
Porechop v0.2.4
Filtlong v0.2.0
minimap2 v2.17-r954-dirty
Racon v1.4.3
samtools v1.9 (with htslib 1.9)
Medaka v1.0.1
Pomoxis v0.2.3
CoVaMa v0.7
MAFFT v7.471
CliqueSNV v1.5.4

Installation

To install these MrHAMER scripts run the following commands:

git clone --recursive https://github.com/gallardo-seq/MrHAMER.git

After successful downloading of the scripts, a folder named MrHAMER will appear in current working directory.

Usage

Usage of MrHAMER is as following:

Combine all sequencing reads into single FASTQ file
Use Porechop to segment the concatemers based on the presence of MrHAMER hairpin sequence (this requires a custom adapters.py file, a template file is included in this repository)
```
 porechop -i [combined.fastq] -o [porechop.output] -t [threads] --extra_middle_trim_bad_side 0 --extra_middle_trim_good_side 0
```

Filter porechop.output with Filtlong

 filtlong --min_length 4000 [porechop.output] > [filtlong.output]

Demultiplexing of reads processed with Porechop and filtered with Filtlong, and filtering for minimum number of repetitive units per single molecule concatemer. This results in a folder that contains single FASTQ files, each containing a multiple number of repetitive units used for error correction in the next step.
```
 python2 ./qfilesplitterV3.1.py -i [filtlong.output] -o [output path] -b [min. number of repetitive units]

 python qfilespliter.py [Arguments]
 
     Arguments:
     -i input file
     -o output path
     -b blocks size cutoff [optional]
```

Running parallel instances of minimap2 > racon > medaka to polish each FASTQ file, resulting in high accuracy single molecule sequences. This step has been optimized for a system running 40 threads.

 python3 protocolV3.3.py -q [path to output folder from previous step] -r [path to reference sequence] -m r941_min_high_g360
 
 python protocol.py [Arguments]
 
 Arguments:
 -q fastq files
 -r reference
 -n number of iterations [Default 1]
 -m model for medaka [Default r941_min_high]
 -noMedaka if the parameter is present exclude medaka from the process
 -noRacon if the parameter is present exclude racon from the process

High accuracy single molecule sequences are output in new directory called "medaka_output", with high accuracy single molecule sequences concatenated in a single medaka_consensus.fasta file within this directory.

**A note about reference sequence used for Step 5. This pipeline is optimized for reference-based alignment. For a de-novo based approach, the outputs of Step 4 can be used with the "medaka smolecule" module, which used SPOA to generate a reference assembly for each originating FASTQ file (https://github.com/nanoporetech/medaka).

Contact information

For additional information, help and bug reports please send an email to christian.gallardo@seattlechildrens.org

Acknowledgment

This work was supported by the National Institute of Allergy and Infectious Diseases [U54AI150472 to BET and ALR, P30AI036214-26 to BET, SJL and DMS]; the National Human Genome Research Institute [R01HG009622 to BET]; the Scripps Translational Science Institute [UL1TR001114-03 to BET]; and the University of Texas System Rising STARs Award to ALR.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
aux_scripts		aux_scripts
MrHAMER_logo.png		MrHAMER_logo.png
README.md		README.md
protocolV3.3.py		protocolV3.3.py
qfilesplitterV3.1.py		qfilesplitterV3.1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aux_scripts

aux_scripts

MrHAMER_logo.png

MrHAMER_logo.png

README.md

README.md

protocolV3.3.py

protocolV3.3.py

qfilesplitterV3.1.py

qfilesplitterV3.1.py

Repository files navigation

Dependencies

Installation

Usage

Contact information

Acknowledgment

About

Packages

Languages

gallardo-seq/MrHAMER

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Installation

Usage

Contact information

Acknowledgment

About

Topics

Resources

Stars

Watchers

Forks

Languages