Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing index size #145

Open
bricoletc opened this issue May 7, 2019 · 1 comment
Open

Reducing index size #145

bricoletc opened this issue May 7, 2019 · 1 comment

Comments

@bricoletc
Copy link
Member

Index made by gramtools build is memory hungry.

Two things:

i)Disk serialisation of index is significantly smaller than in-RAM

For eg on TB dataset:
(yoda: /nfs/leia/research/iqbal/bletcher/Gramtools/profiling_gramtools/simulated_reads_150_30_reference_9/gramtools_runs/gram_k10_04157)

Disk RAM
0.2GB 1.5GB

This is most likely simply due to sdsl::bit_compress called on each of the paths, the sa_intervals and the kmer_stats (which allow matching up sa_intervals and paths for each instance of a kmer mapping to graph)

But why not keep them compressed in RAM?
The compression seems only to be reducing number of bits to represent the integer's value:
http://algo2.iti.kit.edu/gog/docs/html/namespacesdsl_1_1util.html#ad5528f84e3036b9be3faf43a49f15b76

ii) Absolute index size

Most of the memory seems to lie in SearchStates (cf #142 )

For TB genome of 4MB, we have an index of 1.5 GB in memory

For Plasmodium genome of 23MB, we have an index of ~60 GB in memory

Note in the latter case, it was ~80GB before cutting each uint64 in the SearchState struct to uint32.

How can we do better?

Ideas:

  • Nesting
  • Tractable enumeration of kmers overlapping variant sites (right now have to use --all-kmers)
@bricoletc
Copy link
Member Author

#149 reduced index size by roughly a factor of the average number of alleles per site

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant