guppy_on_slurm

March 2024 - only use the Guppy basecaller if you really need to. Guppy has been replaced by the Dorado basecaller by Oxford Nanopore.

Splitting and accelerating the Oxford Nanopore CPU basecaller guppy using SLURM.

These scripts move FAST5s into subdirectories, then run CPU guppy on each subdirectory independently using a SLURM cluster.

Performance note on GPU vs CPU: important!

Guppy is really slow on CPU, but incredibly quick on GPU (100-1000X faster on GPU!). After torturing our 1000+ core cluster for multiple days with CPU basecalling for MinION runs, we finally moved to a performant Nvidia GPU graphics card for basecalling. We use the script runbatch_gpu_guppy.sh for this. When using the GPU version, you want to have all your FAST5 files in a single directory, i.e. you do not need to run bash batch_split_to_subdirs.sh to split the FAST5 files into subdirectories.

Dr. Colin Davenport, June 2019 - August 2021

Warning: only tested on Ubuntu 16.04 and 20.04 to date (not Windows). Guppy works fine on Ubuntu 20.04, use a singularity container or the native version from ONT.

Requirements

either a working guppy CPU guppy_basecaller (the command guppy_basecaller should provide output)
or a singularity guppy CPU container with guppy_basecaller installed in it.
or a Nvidia GPU with CUDA installed

Howto - CPU version:

Clone the repository
Edit the run_guppy_SLURM.sh script with your local SLURM

a) queue / partition
b) cpus (Go easy on your HPC because guppy very efficiently uses resources. Request 10 HPC cores if you're going to run guppy on 8 cores).

Copy the bash scripts into the directory containing your reads
Make sure the Guppy software from ONT is installed and in your PATH. This following command should provide program usage:

guppy_basecaller

Split FAST5 files into batches for cluster using the bash script. You may/will need to optimize batch size depending on your run and server hardware to get optimal runtimes.

bash batch_split_to_subdirs.sh
Run guppy to submit a 8 core job (default) for each subdir directory concurrently.

runbatch_singularity_guppy.sh (preferred) or bash runbatch_guppy.sh

Finally, gather FASTQ data. Warning: This will combine barcodes, and all pass (but not fail) reads together into one file so might only be appropriate for those doing one sample per flowcell ! find exec cat. Warning - needs to write output into a directory above the find, eg using ../x.fastq for the output file, otherwise creates an infinite loop.

srun find *.guppy/pass -name "*.fastq.gz" -exec cat {} > ../guppy_part1.fastq.gz \; &

Done

If you have any questions please raise an issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
LICENSE		LICENSE
README.md		README.md
batch_split_to_subdirs.sh		batch_split_to_subdirs.sh
bc_collector.ont.sh		bc_collector.ont.sh
run_guppy_SLURM.sh		run_guppy_SLURM.sh
run_guppy_barcoder.sh		run_guppy_barcoder.sh
runbatch_gpu_guppy.sh		runbatch_gpu_guppy.sh
runbatch_guppy.sh		runbatch_guppy.sh
runbatch_singularity_guppy.sh		runbatch_singularity_guppy.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

LICENSE

LICENSE

README.md

README.md

batch_split_to_subdirs.sh

batch_split_to_subdirs.sh

bc_collector.ont.sh

bc_collector.ont.sh

run_guppy_SLURM.sh

run_guppy_SLURM.sh

run_guppy_barcoder.sh

run_guppy_barcoder.sh

runbatch_gpu_guppy.sh

runbatch_gpu_guppy.sh

runbatch_guppy.sh

runbatch_guppy.sh

runbatch_singularity_guppy.sh

runbatch_singularity_guppy.sh

Repository files navigation

guppy_on_slurm

March 2024 - only use the Guppy basecaller if you really need to. Guppy has been replaced by the Dorado basecaller by Oxford Nanopore.

Splitting and accelerating the Oxford Nanopore CPU basecaller guppy using SLURM.

Performance note on GPU vs CPU: important!

Warning: only tested on Ubuntu 16.04 and 20.04 to date (not Windows). Guppy works fine on Ubuntu 20.04, use a singularity container or the native version from ONT.

Requirements

Howto - CPU version:

About

Releases

Packages

Languages

License

colindaven/guppy_on_slurm

Folders and files

Latest commit

History

Repository files navigation

guppy_on_slurm

March 2024 - only use the Guppy basecaller if you really need to. Guppy has been replaced by the Dorado basecaller by Oxford Nanopore.

Splitting and accelerating the Oxford Nanopore CPU basecaller guppy using SLURM.

Performance note on GPU vs CPU: important!

Warning: only tested on Ubuntu 16.04 and 20.04 to date (not Windows). Guppy works fine on Ubuntu 20.04, use a singularity container or the native version from ONT.

Requirements

Howto - CPU version:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages