Skip to content

jeffrey-xiao/probabilistic-collections-rs

Repository files navigation

probabilistic-collections-rs

probabilistic-collections Documentation License: MIT License: Apache 2.0 Pipeline Status Coverage Report

probabilistic-collections contains various implementations of collections that use approximations to improve on running time or memory, but introduce a certain amount of error. The error can be controlled under a certain threshold which makes these data structures extremely useful for big data and streaming applications.

The following types of collections are implemented:

  • Approximate Membership in Set: BloomFilter, PartitionedBloomFilter, CuckooFilter, QuotientFilter
  • Scalable Approximate Membership in Set: ScalableBloomFilter, ScalableCuckooFilter
  • Approximate Membership in Stream: BSBloomFilter, BSSDBloomFilter, RLBSBloomFilter
  • Approximate Item Count: CountMinSketch
  • Approximate Distinct Item Count: HyperLogLog
  • Set similarity: MinHash, SimHash

Usage

Add this to your Cargo.toml:

[dependencies]
probabilistic-collections = "*"

For serde support, include the serde feature:

[dependencies]
probabilistic-collections = { version = "*", features = ["serde"] }

Add this to your crate root if you are using Rust 2015:

extern crate probabilistic_collections;

Caveats

If you are using this crate to create collections to be used across different platforms, you must be careful not to use keys of type [T]. The Rust standard library implementation of Hash for [T] first hashes the length of the slice, then the contents of the slice. The length is platform specific because it is of type usize. Therefore, a collection with keys of type [T] will have unexpected results when used across platforms. For example, if you were to generate and serilize a BloomFilter on i686 compiled code where the keys are of type [u8], the filter will not have the correct results when deserialized on x86_64 compiled code.

The recommended work-around to this problem is to a define a wrapper struct around a [T] with a Hash implementation that only hashes the contents of the slice.

Changelog

See CHANGELOG for more details.

References

License

probabilistic-collections-rs is dual-licensed under the terms of either the MIT License or the Apache License (Version 2.0).

See LICENSE-APACHE and LICENSE-MIT for more details.

About

No description, website, or topics provided.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

No packages published