Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing out on minimap-nd tasks #203

Open
wgallin opened this issue Apr 1, 2024 · 4 comments
Open

Timing out on minimap-nd tasks #203

wgallin opened this issue Apr 1, 2024 · 4 comments

Comments

@wgallin
Copy link

wgallin commented Apr 1, 2024

My assembly job is failing with Time Limit being exceeded during some of the minimap-nd jobs

It appears that when parallel tasks are being run the time allocated to their running is shorter than it time it takes to complete them.

An example log entry for a single job ( it appears that 10 of these have failed out of 100 submitted) is shown here:

Error message
hostname

  • hostname
    cd /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/03.raw_align.sh.work/raw_align100
  • cd /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/03.raw_align.sh.work/raw_align100
    ( time /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/nextdenovo/2.5.2/bin/minimap2-nd --step 1 -I 3G -t 8 -x ava-ont /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembl
    y/01.raw_align/input.seed.004.2bit /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/input.seed.004.2bit -o input.seed.004.2bit.99.ovl; )
  • /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/nextdenovo/2.5.2/bin/minimap2-nd --step 1 -I 3G -t 8 -x ava-ont /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.r
    aw_align/input.seed.004.2bit /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/input.seed.004.2bit -o input.seed.004.2bit.99.ovl
    [M::mm_idx_gen::64.6861.84] collected minimizers
    [M::mm_idx_gen::75.200
    2.64] sorted minimizers
    [M::main::75.2002.64] loaded/built the index for 107322 target sequence(s)
    [M::mm_mapopt_update::77.544
    2.59] mid_occ = 1212
    [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 107322
    [M::mm_idx_stat::78.6582.57] distinct minimizers: 95367629 (42.05% are singletons); average occurrences: 8.194; average spacing: 2.931
    [M::worker_pipeline::1280.746
    7.56] mapped 25749 sequences
    [M::worker_pipeline::2627.600*7.78] mapped 20748 sequences
    slurmstepd: error: *** JOB 18227135 ON gra1100 CANCELLED AT 2024-03-30T08:38:49 DUE TO TIME LIMIT ***
    Genome characteristics
    genome size, heterozygous rate, repeat content...

Input data This is the relevant part of the slurm.out file

[100999 INFO] 2024-03-30 02:52:07 NextDenovo start...
[100999 INFO] 2024-03-30 02:52:08 version:Unknown logfile:pid100999.log.info
[100999 WARNING] 2024-03-30 02:52:09 Re-write workdir
[100999 INFO] 2024-03-30 02:52:09 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly
[100999 INFO] 2024-03-30 02:52:10 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align
[100999 INFO] 2024-03-30 02:52:10 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/02.cns_align
[100999 INFO] 2024-03-30 02:52:10 mkdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/03.ctg_graph
[100999 INFO] 2024-03-30 02:52:18 Total jobs: 1
[100999 INFO] 2024-03-30 02:52:18 Submitted jobID:[18223332] jobCmd:[/scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/01.db_stat.sh.work/db_stat1/Trial02.sh] in the slur
m_cycle.
[100999 INFO] 2024-03-30 02:54:20 db_stat done
[100999 INFO] 2024-03-30 02:54:20 updated options:
rerun: 3
task: all
deltmp: 1
rewrite: 1
read_type: ont
job_type: slurm
input_type: raw
read_cutoff: 1k
pa_correction: 5
seed_cutfiles: 5
parallel_jobs: 32
seed_depth: 38.12
genome_size: 300m
seed_cutoff: 10000
job_prefix: Trial02
blocksize: 983465750
ctg_cns_options: -p 30
nextgraph_options: -a 1
sort_options: -m 50g -t 30 -k 40
minimap2_options_map: -x map-ont
minimap2_options_raw: -t 8 -x ava-ont
input_fofn: /scratch/wgallin/NextDeNovo_Test01/input.fofn
correction_options: -p 30 -max_lq_length 10000 -r ont -min_len_seed 5000
workdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly
minimap2_options_cns: -t 8 -x ava-ont -k 17 -w 17 --minlen 1000 --maxhan1 5000
raw_aligndir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align
cns_aligndir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/02.cns_align
ctg_graphdir: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/03.ctg_graph
[100999 INFO] 2024-03-30 02:54:20 summary of input data:
file: /scratch/wgallin/NextDeNovo_Test01/Trial_02_Ppen_NextDenovo_Assembly/01.raw_align/input.reads.stat
[Read length stat]
Types Count (#) Length (bp)
N10 49686 39610
N20 138374 24804
N30 277076 15991
N40 488598 10686
N50 795459 7571
N60 1219406 5562
N70 1792624 4116
N80 2576448 2961
N90 3705002 1970

Types Count (#) Bases (bp) Depth (X)
Raw 7575648 28638422273 95.46
Filtered 1971087 1286477110 4.29
Clean 5604561 27351945163 91.17

*Suggested seed_cutoff (genome size: 300.00Mb, expected seed depth: 45, real seed depth: 38.12): 10000 bp

Config file
[General]
job_type = slurm
job_prefix = Trial02
task = all
rewrite = yes
deltmp = yes
parallel_jobs = 32
input_type = raw
read_type = ont # clr, ont, hifi
input_fofn = input.fofn
workdir = Trial_02_Ppen_NextDenovo_Assembly

[correct_option]
read_cutoff = 1k
genome_size = 300m # estimated genome size
sort_options = -m 50g -t 30
minimap2_options_raw = -t 8
pa_correction = 5
correction_options = -p 30

[assemble_option]
minimap2_options_cns = -t 8
nextgraph_options = -a 1

Operating system
LSB Version: n/a
Distributor ID: Gentoo
Description: Gentoo Base System release 2.6
Release: 2.6
Codename: n/a

GCC
gcc version 9.3.0 (GCC)

Python
3.11

NextDenovo
What version of NextDenovo are you using?
2.5.2

@moold
Copy link
Member

moold commented Apr 2, 2024

Two solutions

  1. It seems that your system limits the running time of a job, so you can reduce blockize and increase seed_cutfiles to reduce the size of each subfile and speed up the map task. But the total runing time maybe will longer.
  2. see here or here to adjust the submit command.

@DaniPaulo
Copy link

Hi @wgallin . I'm still trying to figure out how to run NextDenovo in a HPC environment using SLURM. Would you be able to share your script.slurm.sh with me?

@wgallin
Copy link
Author

wgallin commented Apr 22, 2024 via email

@DaniPaulo
Copy link

DaniPaulo commented Apr 25, 2024

Hi @wgallin,

Thanks for your response. Let's see if I understand.
So basically, you set up your script.slurm.sh to:

#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 32
#SBATCH --mem 256G
#SBATCH --time 7-00:00:00

# MODULES
module load nextdenovo

# MAIN
nextDenovo run.cfg

And your run.cfg to use local, one parallel job and -t / -p to 32:

[General]
job_type = local
parallel_jobs = 1

[correct_option]
sort_options = -t 32
minimap2_options_raw = -t 32
pa_correction = 3
correction_options = -p 32

[assemble_option]
minimap2_options_cns = -t 32

Could you please verify this?
I appreciate your help,
Dani.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants