Skip to content

Fast-polyvalent trimmer used for several applications of next-generation sequencing

License

Notifications You must be signed in to change notification settings

guillaume-gricourt/HmnTrimmer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HmnTrimmer

Github Release Conda Release
Code style: blackGitHub Super-Linter
DOI

A trimmer of reads produced by NGS dedicated for common applications like genomic, transcriptomic, targeted metagenomic and shotgun metagenomic.

Install

Conda (recommanded)

conda install -c bioconda hmntrimmer

Docker

# From docker hub
docker pull hmntrimmer:<VERSION>
# From github
docker pull ghcr.io/guillaume-gricourt/hmntrimmer:<VERSION>

Manual

Prerequisites
Use software with debian systems :

  • yasm
  • build-essential
  • zlib1g-dev GCC used for compilation must be > 4 and < 9.

Test software

  • python3

Create statistic report
With conda :

  • python3 django matplotlib seaborn packaging
    With ubuntu/debian using pip :
  • python3-pip
  • django matplotlib seaborn packaging

Compile

Install first igzip
hmndir=./HmnTrimmer
cd ./lib/igzip-042/igzip && make slib0c
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD

Then
make

Test

make test

Getting started

Software is available by :
HmnTrimmer [OPTIONS] [TRIMMERS]

Minimal example :

./HmnTrimmer \
  --input-fastq-forward INPUT_FILE \
  --output-fastq-forward OUTPUT_FILE \
  --length-min 50

Commands

Input/Output

Files are indicated with these differents commands :

  --input-fastq-forward INPUT_FILE
  --input-fastq-reverse INPUT_FILE
  --input-fastq-interleaved INPUT_FILE

  --output-fastq-forward OUTPUT_FILE
  --output-fastq-reverse OUTPUT_FILE
  --output-fastq-interleaved OUTPUT_FILE

Discarded sequences are optionnaly output with this command. If sequencing is paired, file produced is interleaved.

  --output-fastq-discard OUTPUT_FILE

Trimmers

Several categories : quality, length and information.
Firstly trimmers based on information are applied, then based on quality finaly based on length.

Quality Tail
Based on a successive number of bases from end of read which are below a cut off.
Two parameters : quality, optionaly the number of bases below the quality firstly indicated (default 1 base) and the length percent cut off request to keep read if it was truncated (default not removed).
Format : <int>:<int>:<int>

  --quality-tail STRING

Quality Sliding Window
Based on a sliding window of bases from end of read which are below a minimal mean.
Two parameters : mean quality and size of window.
Format : <int>:<int>

  --quality-sliding-window STRING

Length Min
Minimal length to keep a read.

  --length-min INTEGER

Information Dust
Based on Dust score.

  --information-dust INTEGER

Performance/Other Options

Report
Optionaly save a report, with differents statistics. Format Json.

  --output-report OUTPUT_FILE

Threads
Specify number of threads to use.

  --threads 1..8

Reads batch
Reads are read in batch. Defined size of batch.

  --reads-batch 100..50000000

Verbose
Log level to use.

  --verbose 1..6 (error..trace)

Statistic report

To create HTML report :

# Clone the repository
git clone git@github.com:guillaume-gricourt/HmnTrimmer.git
# Run
HmnTrimmerReport \
  --template-file ./HmnTrimmer/script/template.html \
  --input-file JSON_FILE \
  --output-file HTML_FILE

Use docker

Trimming

docker run \
    -it \
    --rm \
    -v $PWD:$PWD \
    hmntrimmer:<VERSION> \
    --input-fastq-forward $PWD/test/GoldInput/BIG.R1.fastq \
    --output-fastq-forward $PWD/test/DockerTest.R1.fastq.gz \
    --output-report $PWD/test/DockerTest.json \
    --length-min 50

Statistic report with docker

docker run \
    -it \
    --rm \
    -v $PWD:$PWD \
    --entrypoint /opt/HmnTrimmer/script/RenderingReportFile.py \
    hmntrimmer:<VERSION> \
    --input-file $PWD/test/DockerTest.json \
    --output-file $PWD/test/DockerTest.html \
    --template-file /opt/HmnTrimmer/script/template.html

Built with these main libraries

  • SeqAn - Essential library to work with HTS files, algorithms
  • rapidjson - Read/Write Json files efficiently
  • spdlog - Nice log manager
  • igzip - Very fast deflate algorithm

Versioning

SemVer is used for versioning.

Authors

  • Guillaume Gricourt