samsa2-nf

Nextflow translation for SAMSA2 metatranscriptomics pipeline. Tailored for running on WEHI Milton. The translation is of the master script in the original pipeline.

Usage

Setup

After following the Samsa2 setup instructions, the working directory structure should look like:

.
├── full_databases
│   ├── New_Bac_Vir_Arc_RefSeq.dmnd
│   ├── New_Bac_Vir_Arc_RefSeq.fa
│   ├── RefSeq_bac.fa
│   ├── subsys_db.dmnd
│   ├── subsys_db.fa
│   └── viral_reference
├── input_files
│   ├── 48E_S68_L001_R1_001.fastq # input files follow *_R{1,2}_* pattern
│   ├── 48E_S68_L001_R2_001.fastq
│   ├── 48F_S69_L001_R1_001.fastq
│   ├── 48F_S69_L001_R2_001.fastq
|   ├── ...other input files...
├── programs
│   ├── diamond
│   ├── diamond-linux64.tar.gz
│   ├── diamond_manual.pdf
│   ├── diamond-sse2
│   ├── pear-0.9.10-linux-x86_64
│   ├── pear-0.9.10-linux-x86_64.tar.gz
│   ├── sortmerna-2.1
│   ├── sortmerna-2.1.tar.gz
│   ├── Trimmomatic-0.36
│   └── Trimmomatic-0.36.zip
└── python_scripts
    ├── DIAMOND_analysis_counter_mp.py # this is a parallel version of DIAMOND_analysis_counter.py
    ├── DIAMOND_analysis_counter.py
    ├── DIAMOND_subsystems_analysis_counter.py
    ├── raw_read_counter.py
    └── subsys_reducer.py

NOTE that samsa2-master.nf makes use of DIAMOND_analysis_counter_mp.py in this repository (not the original DIAMOND_analysis_counter.py). See the modifications section below.

Run the pipeline

nextflow run samsa2-master.nf

Parameters

nextfow run samsa2-master.nf
  --input_files <directory with pairs of reads>
  --python_scripts <directory with python scripts>
  --diamond_database <path to RefSeq db>
  --subsys_database <path to SubSys db>
  --output_dir <directory to store linked output files>

Modifications from original

DIAMOND_analysis_counter_mp.py is modified from the original SAMSA2 pipeline. It's modified to make use of Python's multiprocessing module. Number of processes is set with -t <no. of processes> option. Currently, this produces identical results to the original in -O and -F mode (used in the original master script), and should produce the correct results for -R, but this hasn't been verified. -SO mode has not been checked either.

The DIAMOND blastx step has the -b 12 and -c 1 performance options added. Respectively, these increase the number of block size and decrease the number of chunks from the default. Both of these changes have the net effect of increasing memory usage substantially, but reducing run times substantially. The maximum observed memory usage is 170GB, which Milton's nodes can easily satisfy, but may be problematic on other hardware.

The DIAMOND blastx step is also making use of /vast/scratch/users/$USER/tmp. This should be parameterised in the future.

Output files

The output files by default will be placed in output_files with the step_{1..5}_output folders which should contain the same results as the original master script.

NOTE as Nextflow on WEHI's Milton HPC is configured to use the VAST Scratch as the working directory, to get around the 14-day deletion policy,samsa2-master.nf copies the results from the working directories to the output_files folder.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
DIAMOND_analysis_counter_mp.py		DIAMOND_analysis_counter_mp.py
LICENSE		LICENSE
README.md		README.md
samsa2-master.nf		samsa2-master.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

DIAMOND_analysis_counter_mp.py

DIAMOND_analysis_counter_mp.py

LICENSE

LICENSE

README.md

README.md

samsa2-master.nf

samsa2-master.nf

Repository files navigation

samsa2-nf

Usage

Setup

Run the pipeline

Parameters

Modifications from original

Output files

About

Releases

Packages

Languages

License

WEHI-ResearchComputing/samsa2-nf

Folders and files

Latest commit

History

Repository files navigation

samsa2-nf

Usage

Setup

Run the pipeline

Parameters

Modifications from original

Output files

About

Topics

Resources

License

Stars

Watchers

Forks

Languages