Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: One Simple Trick to significantly improve the speed and memory usage of Occ #448

Merged
merged 10 commits into from Aug 23, 2021

Conversation

Daniel-Liu-c0deb0t
Copy link
Contributor

Occ wastes a lot of memory because it reserves memory for all alphabets up to the one with the max rank. Without compression, rank is based on the ASCII value, so this means that a small alphabet like DNA nucleotides would take ~10x more memory than necessary. The FMD index also does not support compressed alphabets, so there is no easy way to sidestep this issue. The fix is simple: transpose the Occ vector so that the first dimension is the alphabet and the second dimension is the sampled Occ values. Then, for characters not in the alphabet, the second dimension can simply be empty, and no space would be wasted. This should massively improve speed and memory usage.

This will need to be rebased after #447 is merged.

@Daniel-Liu-c0deb0t Daniel-Liu-c0deb0t changed the title One Simple Trick to significantly improve the speed and memory usage of Occ fix: One Simple Trick to significantly improve the speed and memory usage of Occ Aug 23, 2021
@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 88.087% when pulling 789de1a on Daniel-Liu-c0deb0t:fix-occ into 125bf20 on rust-bio:master.

@thomasmulvaney
Copy link
Member

Nice!

@pmarks pmarks self-requested a review August 23, 2021 14:12
Copy link
Contributor

@pmarks pmarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks Daniel!

@pmarks pmarks merged commit 9aa79cb into rust-bio:master Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants