Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembly is running around a month and going strong - or is it stalled? #170

Open
000generic opened this issue Mar 21, 2023 · 15 comments
Open

Comments

@000generic
Copy link

000generic commented Mar 21, 2023

Describe the bug
Unsure if assembly of octopus (human-sized) genome with 43x seed is active or stalled after running almost a month with 500 Gb RAM 60 CPU and 2 Tb disk.

Error message
There is no error but a month ago I used NextDenovo on the machine to successfully assemble a sponge genome 1/10 the size overnight - vs - the current octopus genome is only 10x larger but running very long now.

Memory on 8 jobs running with ~7 CPUs are each cycling between 3.8 to 5.6 Gb RAM over hours - so seems like it could be active - using a very steady 87% of all CPUs on machine and 40% of memory. However Glances and Top indicate a stalled status of S running the MiniMap2-nd step (see attached screenshots). Every once and a while one of the jobs will drop for minutes to maybe an hour from 7 to 1 CPU - but then return to 7.

Previous jobs unrelated to NextDenovo sometimes have a status of S but finish no problem - so I wasn't sure how critical the status is - it is a very steady S.

I previously restarted the job after 2 weeks, given it was more than 10x longer in run time than sponge at that point - but restart went almost all the way back to the beginning, as there is no output / update from the minimap2-nd step. And I did a fresh start with a few short (minute or less) initial restarts before the current month-long run - so fresh from the initial 2-week run.

The last pid log readout indicates 36 jobs for cns_align.sh - with the largest job number of the 8 jobs at the start being 59306 (see below). Within a day or so the largest job was 59311 (see screenshot) - suggesting nextDenovo is on the last round of jobs to reach the allotted 36 - but then things have simply stayed here for weeks.

Here are details on this:

[59245 INFO] 2023-02-24 12:04:29 skip step: db_split
[59245 INFO] 2023-02-24 12:04:29 skip step: raw_align
[59245 INFO] 2023-02-24 12:04:29 skip step: sort_align
[59245 INFO] 2023-02-24 12:04:29 skip step: seed_cns
[59245 INFO] 2023-02-24 12:04:29 seed_cns finished, and final corrected reads file:
[59245 INFO] 2023-02-24 12:04:29 ESC[35m /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns*/cns.fasta ESC[0m
[59245 INFO] 2023-02-24 12:04:29 Total jobs: 36
[59245 INFO] 2023-02-24 12:04:29 Submitted jobID:[59246] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align01/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:29 Submitted jobID:[59252] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align02/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:30 Submitted jobID:[59261] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align03/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:30 Submitted jobID:[59270] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align04/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:31 Submitted jobID:[59279] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align05/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:31 Submitted jobID:[59288] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align06/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:32 Submitted jobID:[59297] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align07/nextDenovo.sh] in the local_cycle.
[59245 INFO] 2023-02-24 12:04:32 Submitted jobID:[59306] jobCmd:[/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align08/nextDenovo.sh] in the local_cycle.

Ram usage is 40% and CPU usage is 87% - the general set up is similar but rescaled to the new genome size from what I did for sponge. I wonder if somehow my calculations might have been off and its doesn't have the resources to output or finish at this point...?

Genome characteristics
Genome size is estimated around 3 Gb - high repeat content - likely high heterozygosity.

Input data
Total base count, sequencing depth, average/N50 read length...

rerun: 3
task: all
deltmp: 1
rewrite: 1
read_type: clr
job_type: local
input_type: raw
parallel_jobs: 8
read_cutoff: 15k
pa_correction: 7
seed_cutfiles: 7
seed_depth: 43.64
genome_size: 2.8g
seed_cutoff: 15001
blocksize: 11726373
job_prefix: nextDenovo
ctg_cns_options: -p 7
nextgraph_options: -a 1
sort_options: -m 70g -t 8 -k 38
minimap2_options_map: -x map-pb
minimap2_options_raw: -t 8 -x ava-pb
correction_options: -p 7 -max_lq_length 1000 -min_len_seed 7500
minimap2_options_cns: -t 7 -x ava-pb -k 17 -w 17 --minlen 1500 --maxhan1 5000
input_fofn: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/input.fofn
workdir: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly
raw_aligndir: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/01.raw_align
cns_aligndir: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align
ctg_graphdir: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/03.ctg_graph
[59245 INFO] 2023-02-24 12:04:29 summary of input data:
file:ESC[35m /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/01.raw_align/input.reads.stat ESC[0m
[Read length stat]
Types Count (#) Length (bp)
N10 266015 39329
N20 608403 32855
N30 1007493 28710
N40 1458869 25618
N50 1961372 23141
N60 2515311 21068
N70 3122091 19277
N80 3783903 17704
N90 4503641 16293

Types Count (#) Bases (bp) Depth (X)
Raw 28758338 245628751872 87.72
Filtered 23472855 123430622516 44.08
Clean 5285483 122198129356 43.64

*Suggested seed_cutoff (genome size: 2800.00Mb, expected seed depth: 45, real seed depth: 43.64): 15001 bp

Config file
Please paste the complete content of the Config file (run.cfg) to here.

[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 8 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = clr # clr, ont, hifi
input_fofn = input.fofn
workdir = output/3-nextDenovo-assembly

[correct_option]
read_cutoff = 15k
genome_size = 2.8g # estimated genome size
sort_options = -m 70g -t 8
minimap2_options_raw = -t 8
pa_correction = 7 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -p 7

[assemble_option]
minimap2_options_cns = -t 7
nextgraph_options = -a 1

see https://nextdenovo.readthedocs.io/en/latest/OPTION.html for a detailed introduction about all the parameters

Operating system
Which operating system and version are you using?
You can use the command lsb_release -a to get it.

Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster

GCC
What version of GCC are you using?
You can use the command gcc -v to get it.

Salk :) gcc -v
Reading specs from /nadata/mnlsc/home/eedsinger/anaconda3/bin/../lib/gcc/x86_64-conda-linux-gnu/7.5.0/specs
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/nadata/mnlsc/home/eedsinger/anaconda3/bin/../libexec/gcc/x86_64-conda-linux-gnu/7.5.0/lto-wrapper
Target: x86_64-conda-linux-gnu
Configured with: /home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/.build/x86_64-conda-linux-gnu/src/gcc/configure --build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu --target=x86_64-conda-linux-gnu --prefix=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/gcc_built --with-sysroot=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/gcc_built/x86_64-conda-linux-gnu/sysroot --enable-languages=c,c++,fortran,objc,obj-c++ --with-pkgversion='crosstool-NG 1.24.0.131_87df0e6_dirty' --enable-__cxa_atexit --disable-libmudflap --enable-libgomp --disable-libssp --enable-libquadmath --enable-libquadmath-support --enable-libsanitizer --enable-libmpx --with-gmp=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/.build/x86_64-conda-linux-gnu/buildtools --with-mpfr=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/.build/x86_64-conda-linux-gnu/buildtools --with-mpc=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/.build/x86_64-conda-linux-gnu/buildtools --with-isl=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/.build/x86_64-conda-linux-gnu/buildtools --enable-lto --enable-threads=posix --enable-target-optspace --enable-plugin --enable-gold --disable-nls --disable-multilib --with-local-prefix=/home/conda/feedstock_root/build_artifacts/ctng-compilers_1596267513165/work/gcc_built/x86_64-conda-linux-gnu/sysroot --enable-long-long --enable-default-pie
Thread model: posix
gcc version 7.5.0 (crosstool-NG 1.24.0.131_87df0e6_dirty)

Python
What version of Python are you using?
You can use the command python --version to get it.

Python 3.8.12

NextDenovo
What version of NextDenovo are you using?
You can use the command nextDenovo -v to get it.

nextDenovo v2.5.0

Screenshot 2023-03-20 at 9 14 21 PM

Screenshot 2023-03-20 at 9 12 32 PM

Any suggestions would be greatly appreciated - NextDenovo did simply fantastic on sponge - just not sure what is going on now with octopus. Some sort of user error but I am just stuck as to what it might be at this point.

Thank you very much :)
Eric

@moold
Copy link
Member

moold commented Mar 21, 2023

Hi, could you paste the content of some files: /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align*/nextDenovo.sh.e to here?

@000generic
Copy link
Author

000generic commented Mar 21, 2023

Sure! Here is the last one (09 to 36 are like this one - only an sh script in the folder):

/scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align36/nextDenovo.sh

#!/bin/bash
set -xveo pipefail
hostname
cd /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align36
( time /nadata/mnlsc/home/eedsinger/software/nextdenovo/NextDenovo/bin/minimap2-nd -I 6G --step 2 -t 7 -x ava-pb -k 17 -w 17 --minlen 1500 --maxhan1 5000 /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns8/cns.fasta /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns8/cns.fasta -o cns.filt.dovt.ovl; )
touch /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align36/nextDenovo.sh.done

@000generic
Copy link
Author

000generic commented Mar 21, 2023

And here is the first one (01-08 are similar to this):

hostname

  • hostname
    cd /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align01
  • cd /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align01
    ( time /nadata/mnlsc/home/eedsinger/software/nextdenovo/NextDenovo/bin/minimap2-nd -I 6G --step 2 -t 7 -x ava-pb -k 17 -w 17 --minlen 1500 --maxhan1 5000 /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns1/cns.fasta /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns1/cns.fasta -o cns.filt.dovt.ovl; )
  • /nadata/mnlsc/home/eedsinger/software/nextdenovo/NextDenovo/bin/minimap2-nd -I 6G --step 2 -t 7 -x ava-pb -k 17 -w 17 --minlen 1500 --maxhan1 5000 /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns1/cns.fasta /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns1/cns.fasta -o cns.filt.dovt.ovl
    [M::mm_idx_gen::105.0281.53] collected minimizers
    [M::mm_idx_gen::122.324
    2.06] sorted minimizers
    [M::main::122.3242.06] loaded/built the index for 277844 target sequence(s)
    [M::mm_mapopt_update::122.951
    2.06] mid_occ = 4956
    [M::mm_idx_stat] kmer size: 17; skip: 17; is_hpc: 1; #seq: 277844
    [M::mm_idx_stat::123.1392.06] distinct minimizers: 17360146 (11.64% are singletons); average occurrences: 32.766; average spacing: 10.551
    [M::worker_pipeline::72105.959
    6.97] mapped 24095 sequences
    [M::worker_pipeline::123796.8416.97] mapped 22584 sequences
    [M::worker_pipeline::163518.825
    6.97] mapped 23730 sequences
    [M::worker_pipeline::211273.2746.97] mapped 23955 sequences
    [M::worker_pipeline::314292.484
    6.98] mapped 22417 sequences
    [M::worker_pipeline::411274.5766.96] mapped 22807 sequences
    [M::worker_pipeline::501517.608
    6.97] mapped 23115 sequences
    [M::worker_pipeline::585301.0136.97] mapped 23656 sequences
    [M::worker_pipeline::670440.754
    6.97] mapped 21757 sequences
    [M::worker_pipeline::748560.1606.97] mapped 23842 sequences
    [M::worker_pipeline::819483.171
    6.97] mapped 21878 sequences
    [M::worker_pipeline::882363.3176.97] mapped 23942 sequences
    [M::worker_pipeline::939265.937
    6.97] mapped 23083 sequences
    [M::worker_pipeline::995858.9556.96] mapped 22595 sequences
    [M::worker_pipeline::1049528.872
    6.96] mapped 23604 sequences
    [M::worker_pipeline::1101771.5016.97] mapped 23865 sequences
    [M::worker_pipeline::1151945.558
    6.97] mapped 22873 sequences
    [M::worker_pipeline::1201137.0606.97] mapped 21705 sequences
    [M::worker_pipeline::1204845.650
    6.97] mapped 1746 sequences
    [M::mm_idx_gen::1204929.9906.97] collected minimizers
    [M::mm_idx_gen::1204936.822
    6.97] sorted minimizers
    [M::main::1204936.8226.97] loaded/built the index for 139405 target sequence(s)
    [M::mm_mapopt_update::1204936.822
    6.97] mid_occ = 4956
    [M::mm_idx_stat] kmer size: 17; skip: 17; is_hpc: 1; #seq: 139405
    [M::mm_idx_stat::1204937.1186.97] distinct minimizers: 15591590 (19.94% are singletons); average occurrences: 18.457; average spacing: 10.557
    [M::worker_pipeline::1274102.253
    6.96] mapped 24095 sequences
    [M::worker_pipeline::1330700.7566.96] mapped 22584 sequences
    [M::worker_pipeline::1330736.564
    6.96] mapped 23730 sequences
    [M::worker_pipeline::1366275.4366.96] mapped 23955 sequences
    [M::worker_pipeline::1464472.315
    6.96] mapped 22417 sequences
    [M::worker_pipeline::1558781.5876.96] mapped 22807 sequences
    [M::worker_pipeline::1651975.492
    6.96] mapped 23115 sequences
    [M::worker_pipeline::1740237.9726.96] mapped 23656 sequences
    [M::worker_pipeline::1839223.678
    6.95] mapped 21757 sequences
    [M::worker_pipeline::1932349.6276.96] mapped 23842 sequences
    [M::worker_pipeline::2028878.546
    6.96] mapped 21878 sequences

@moold
Copy link
Member

moold commented Mar 21, 2023

Try to increase -k -w in minimap2_options_cns, such as minimap2_options_cns = -t 7 -k 31 -w 17

@000generic
Copy link
Author

Ok! I'll kill things and restart fresh....

@000generic
Copy link
Author

000generic commented Mar 21, 2023

Actually - rather than starting totally fresh, I updated the run file and just deleted folders 2 and 3 to save a little time and see how things go with your update more quickly.

Given the set up - how long would you expect things to run - just so I can know when its going over.

@moold
Copy link
Member

moold commented Mar 21, 2023

I don't know how long it will take , but you can try it first. No need to start fresh, just continue running from the breakpoint . You can also set -f 0.0004 in minimap2_options_cns to speed up.

PS: try to check the value (mid_occ = 4956 ) of mid_occ in log files cns_align*/nextDenovo.sh.e, if it less than 1000 I think it is acceptable.

@000generic
Copy link
Author

minimap2-nd started up - first 8 of 36 jobs again - mid_occ is now under 1000 at 427 - jobs are still running under status S but that might be ok.

hostname

  • hostname
    cd /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align01
  • cd /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align01
    ( time /nadata/mnlsc/home/eedsinger/software/nextdenovo/NextDenovo/bin/minimap2-nd -I 6G --step 2 -t 7 -k 31 -w 17 -x ava-pb --minlen 1500 --maxhan1 5000 /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns1/cns.fasta /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns1/cns.fasta -o cns.filt.dovt.ovl; )
  • /nadata/mnlsc/home/eedsinger/software/nextdenovo/NextDenovo/bin/minimap2-nd -I 6G --step 2 -t 7 -k 31 -w 17 -x ava-pb --minlen 1500 --maxhan1 5000 /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns1/cns.fasta /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/01.seed_cns.sh.work/seed_cns1/cns.fasta -o cns.filt.dovt.ovl
    [M::mm_idx_gen::145.8231.38] collected minimizers
    [M::mm_idx_gen::155.600
    1.59] sorted minimizers
    [M::main::155.6001.59] loaded/built the index for 277836 target sequence(s)
    [M::mm_mapopt_update::159.143
    1.58] mid_occ = 427
    [M::mm_idx_stat] kmer size: 31; skip: 17; is_hpc: 1; #seq: 277836
    [M::mm_idx_stat::161.9281.57] distinct minimizers: 166455590 (41.04% are singletons); average occurrences: 3.201; average spacing: 11.263
    [M::worker_pipeline::10681.570
    6.89] mapped 24095 sequences
    /scratch2/eedsinger/projects/genomes/zanfona-5x-50x/octopus-sinensis/output/3-nextDenovo-assembly/02.cns_align/02.cns_align.sh.work/cns_align01/nextDenovo.sh.e (END)

@000generic
Copy link
Author

Well - things haven't advanced past the first 8 jobs after several days now. It feels like it might be similar to before... Is there anything you might suggest I check or try?

@moold
Copy link
Member

moold commented Mar 25, 2023

You can try to increase -k -w -f --kn --wn, or set --mode 0 or --mode 1, or --cn 1000 in minimap2_options_cns.
BTW, such parameter settings may produce inaccurate result, we do not test before.

@000generic
Copy link
Author

000generic commented Mar 27, 2023

Good news - I left things running and one of the initial 8 jobs finished after 4 days and a second one after 5 - so two new jobs now running - and I'm hoping the next 6 will finish soon to advance through the remaining 26 jobs or so - currently they are at 850 CPU hours each. Seems like it will be 2-3 weeks to finish them all.

@000generic
Copy link
Author

000generic commented Mar 29, 2023

The remaining 6 jobs of the initial round finished at around 7 days / 1300 CPUs hours per 7-CPU job - so second round of 4-5 rounds is now fully underway. Not sure if this is normal timeframe for ~human genome size and ~45x coverage with 60 CPUs and half a Tb RAM. I'm estimating 5 weeks for this stage in the pipeline start to finish.

As long as it can finish, I'm very happy! If you have ideas for making it more efficient without going outside what is tested / known on your side for the parameters - I'd love to hear - but also I think you might have covered everything.

Thank you for your help on this!

@moold
Copy link
Member

moold commented Mar 29, 2023

The running time is largely determined by genome complexity and input data size. For a ~human genome, it usually completes within 1-2 days. Obviously, the genome you assembled is highly repetitive (you can check this by k-mer spectrum using short reads), so you can try wtdbg2, which should be able to finish assembly very quickly.

@000generic
Copy link
Author

Wow - so this is really running long already and still weeks to go.

I'm actually trying to improve on a wtdbg2 assembly - do you think NextDenovo is likely to offer improvement? It did great with the sponge data and was super fast. A different octopus with ONT reads took around 4-5 weeks a few months ago and it seemed reasonably good overall - very good considering the data going in I thought. I'll probably let it finish regardless out of curiosity at this point, as long as the machine isn't needed otherwise.

I can update how it goes! Thanks again.

@moold
Copy link
Member

moold commented Mar 29, 2023

The assembly result is hard to say, because the genome you assembled is not normal, and the default parameters may not be suitable. But anyway, wait to finish this assembly task first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants