Clarification about Block Size and Estimating Memory Usage #704

YasirKusay · 2023-05-23T23:27:19Z

YasirKusay
May 23, 2023

Hello,

According to your paper, -c refers to how many chunks we want to divide an individual seed when processing it and -B refers to the max amount of the index/query sequence letters to load and compare at a time. You give an estimate of the total memory usage to be 2(B + 8 x B/C + const).

I have a few questions regarding this. I have a 495 Mb fasta file with 2243452 sequences that I aligned against a 98GB index. When using the parameters -b 6 and -c 1, with 60GB of memory, the program (as expected) ran out of memory because 2(6e9 + 8*e9) = 108 GB.

However, I have ran other alignments with smaller query files (with the same index database and parameters) without any issues so I am confused about how block-size actually works. Could you let me know more about how --block-size works? I would for example like to know how it handles loading both the query and index.

Also, as a general rule, should I specify the memory of my job as calculated via the 2(B + 6B/C) algorithm.

bbuchfink · 2023-05-30T11:00:51Z

bbuchfink
May 30, 2023
Maintainer

It does not mean that this much memory will always be used. -b6 will process the database and queries in chunks of 6 gigabytes, or less if it is smaller.

Also, as a general rule, should I specify the memory of my job as calculated via the 2(B + 6B/C) algorithm.

Yes, that makes sense with a bit of memory left for tolerance (it is not a hard upper limit).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification about Block Size and Estimating Memory Usage #704

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Clarification about Block Size and Estimating Memory Usage #704

YasirKusay May 23, 2023

Replies: 1 comment

bbuchfink May 30, 2023 Maintainer

YasirKusay
May 23, 2023

bbuchfink
May 30, 2023
Maintainer