Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[morecore] insufficient memory #33

Open
mictadlo opened this issue Oct 28, 2019 · 9 comments
Open

[morecore] insufficient memory #33

mictadlo opened this issue Oct 28, 2019 · 9 comments

Comments

@mictadlo
Copy link

Hi,
The Contigs.txt and NbV1ChF.fasta are 2.8 G and 2.6G. Minimap2 seems to run out of memory on 2TB machine.

$ ragoo.py -t 8 -g 100 -s -b -gff augustus.hints_utr.gff3 Contigs.txt NbV1ChF.fasta
Mon Oct 28 01:45:52 2019 --- Running : minimap2 -k19 -w19 -t8 ../NbV1ChF.fasta ../Contigs.txt > contigs_against_ref.paf 2> contigs_against_ref.paf.log
Mon Oct 28 01:49:23 2019 --- Reading alignments
Mon Oct 28 01:52:45 2019 --- Getting gff features
Mon Oct 28 01:53:07 2019 --- Getting contigs
Mon Oct 28 01:53:25 2019 --- Finding interchromosomally chimeric contigs
Mon Oct 28 01:53:25 2019 --- Finding break points and breaking interchromosomally chimeric contigs
Mon Oct 28 01:53:45 2019 --- Running : minimap2 -k19 -w19 -t8 ../../NbV1ChF.fasta Contigs.inter.chimera.broken.fa > inter_contigs_against_ref.paf 2> inter_contigs_against_ref.paf.log
Mon Oct 28 01:57:22 2019 --- Reading interchromosomal chimera broken alignments
Mon Oct 28 02:00:58 2019 --- Finding intrachromosomally chimeric contigs
Mon Oct 28 02:01:52 2019 --- Running : minimap2 -k19 -w19 -t8 ../../NbV1ChF.fasta Contigs.intra.chimera.broken.fa > intra_contigs_against_ref.paf 2> intra_contigs_against_ref.paf.log
Mon Oct 28 02:05:25 2019 --- Reading intrachromosomal chimera broken alignments
Mon Oct 28 02:09:19 2019 --- The total number of interchromasomally chimeric contigs broken is 0
Mon Oct 28 02:09:19 2019 --- The total number of intrachromasomally chimeric contigs broken is 6
Mon Oct 28 02:09:19 2019 --- Assigning contigs
Mon Oct 28 02:09:40 2019 --- Ordering and orienting contigs
Mon Oct 28 02:11:01 2019 --- Creating pseudomolecules
Mon Oct 28 05:26:02 2019 --- Aligning pseudomolecules to reference
Mon Oct 28 05:26:02 2019 --- Running : minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
/bin/bash: line 1: 10389 Aborted                 (core dumped) minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/ragoo/bin/ragoo.py", line 4, in <module>
    __import__('pkg_resources').run_script('RaGOO==1.1', 'ragoo.py')
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1469, in run_script
    exec(script_code, namespace, namespace)
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/RaGOO-1.1-py3.7.egg/EGG-INFO/scripts/ragoo.py", line 754, in <module>
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/RaGOO-1.1-py3.7.egg/EGG-INFO/scripts/ragoo.py", line 439, in align_pms
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/RaGOO-1.1-py3.7.egg/ragoo_utilities/utilities.py", line 25, in run
RuntimeError: Failed : minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
less pm_contigs_against_ref.sam.log 
[M::mm_idx_gen::55.511*1.79] collected minimizers
[M::mm_idx_gen::60.192*2.17] sorted minimizers
[M::main::60.192*2.17] loaded/built the index for 19 target sequence(s)
[M::mm_mapopt_update::64.517*2.10] mid_occ = 342
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 19
[M::mm_idx_stat::69.857*2.01] distinct minimizers: 143929053 (83.21% are singletons); average occurrences: 1.928; average spacing: 9.884
[M::worker_pipeline::3024.805*3.40] mapped 5 sequences
[M::worker_pipeline::4921.171*3.81] mapped 6 sequences
[M::worker_pipeline::7761.560*3.75] mapped 4 sequences
[morecore] insufficient memory

How important is the the allignment from Aligning pseudomolecules to reference or can ragoo.fasta been used? How much more memory do you think I would need?

Thank you in advance,

Michal

@malonge
Copy link
Owner

malonge commented Oct 29, 2019

Hmm that is interesting. Firstly, yes you can still use ragoo.fasta. The SV calling step is independent.

I am not sure why Minimap2 is running out of memory. For debugging, you can just run the same or similar command outside of RaGOO. What are you assembling?

Thanks

@mictadlo
Copy link
Author

The reference is a plant genome and to assembled it we used for it PacBio and Hi-C data. On the other hand, the 'Contigs.txt` is a pure 50x Illumina assembly created by SparseAssembler.

Additionally, I will run minimap2 manually.

Michal

@malonge
Copy link
Owner

malonge commented Oct 29, 2019

At this point, if you would like to call SVs, I suggest you do your SV calling manually. You can either use minimap2/paftools or nucmer/assemblytics. If you use paftools, you might as well just write your minimap2 alignments to a PAF file rather than SAM format. I also have a wiki page about this here though you would have to generate your own alignments.

@mictadlo
Copy link
Author

Hi,
I think the problem was that the assembly contains many small contigs which could not have been assigned to the reference. Additionally, I asked Ragoo to put 100 N's between the unmapped contigs which lead to a Chr0 size of 550,412,653 bp.

The solution was to use -C which Ragoo.

Michal

@lucventurini
Copy link

@malonge
Seen as well when trying to use RaGOO on a species with a large genome (>10Gbps). The out-of-memory problem also happens, in my experience, at the earlier stage of creating pseudomolecules.

I am having a work at some of this in my fork (https://github.com/lucventurini/RaGOO), when I will be done, may I open a pull request?

@malonge
Copy link
Owner

malonge commented Nov 20, 2019

@mictadlo glad to hear you have resolved it. @lucventurini absolutely thanks so much for contributing.

@malonge malonge closed this as completed Nov 20, 2019
@cmonat
Copy link

cmonat commented Apr 20, 2020

Hi,

I'm wondering if the pull request as been done and so if it should work without problem on large genomes?
Thank you and have a great day
Cheers

C.

@malonge
Copy link
Owner

malonge commented Apr 20, 2020

Hi there,

I am currently working on v2, which uses pysam to dramatically reduce the memory requirements. In fact, the memory for small and large genomes should be roughly the same.

I am hoping to come out with v2 in the next month or so. I will reopen the issue so that I can send a note when the new version is ready.

Thanks

@malonge malonge reopened this Apr 20, 2020
@malonge
Copy link
Owner

malonge commented Jun 9, 2020

Hi there,

RagTag, the successor to RaGOO, is now available here:

https://github.com/malonge/RagTag

RagTag now uses pysam to query read coverage, so the memory requirement is dramatically reduced.

Thanks,
Mike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants