Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate "sroar": Dgraph's Serialized Roaring Bitmaps #298

Open
lemire opened this issue Mar 16, 2021 · 7 comments
Open

Investigate "sroar": Dgraph's Serialized Roaring Bitmaps #298

lemire opened this issue Mar 16, 2021 · 7 comments

Comments

@lemire
Copy link
Member

lemire commented Mar 16, 2021

Dgraph has made public their "Serialized Roaring Bitmaps" implementation at
https://github.com/dgraph-io/sroar

sroar is a re-written version of Roaring Bitmaps in Go, with the aim to have equality between in-memory representation and on-disk representation. An sroar.Bitmap does not need to be marshalled or unmarshalled, as the underlying represetation is a byte slice. Therefore, it can be written to disk, brought to memory, or shipped over the network immediately.

I would encourage the Roaring community to go have a look.

@lemire
Copy link
Member Author

lemire commented May 18, 2021

This has been an update with competitive benchmark results:
https://github.com/dgraph-io/sroar#benchmarks

@lemire
Copy link
Member Author

lemire commented May 18, 2021

Here are the specific claims:

4x faster
4x less memory
25x fewer allocations (-96% p50).

@evanoberholster
Copy link

I found this library https://github.com/kelindar/bitmap it is 32-bit only. It is backed by a similar in-memory storage concept.

@lemire
Copy link
Member Author

lemire commented Jun 23, 2021

@evanoberholster It is likely similar to https://github.com/bits-and-blooms/bitset in that it implements an uncompressed bitset. The use cases are different.

@comunidadio
Copy link

FYI Dgraph recently posted an article explaining some details of their implementation https://dgraph.io/blog/post/serialized-roaring-bitmaps-golang/

@richardartoul
Copy link
Contributor

Just to provide some random insight, I benchmarked migrating my teams custom storage system from this library to sroar and saw the following:

  1. We had dramatically less allocations when using the sroar library.
  2. We never keep that many bitmaps in memory at once (everything is done streaming) so I didn’t observe any difference in memory usage, although I bet if we needed to keep a lot of them in memory at once our memory usage would have dropped.
  3. The serialized format seemed to be significantly less compressed than this library (our output files grew 4x larger)
  4. I can’t remember if it was faster. I think the actual operations may have been due to way less allocations, but the increased file size killed any performance gains there.

For those reasons we decided to stick with this library :)

That’s just 1 datapoint though! In general I do think that sroars strategy of using a small number of bytes slices to represent the bitmaps and “dirty casting” them as needed is a winning one that this library could benefit from just in terms of generally reducing allocations/pointers which reduces pressure in the GC and allocator.

@Tony-Kwan
Copy link

Dgraph and I face the same situation: In-memory index with high memory usage and high gc pressure.
But the implement and performance of sroar is not good enough in my opinion.
So I am writing a 32bit version sroar library with the concepts of dgraph sorar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants