Skip to content

High performance approximate algorithms in Go (e.g. morris counter, count min, etc.)

License

Notifications You must be signed in to change notification settings

kelindar/approx

Repository files navigation

kelindar/approx
Go Version PkgGoDev Go Report Card License Coverage

Probabilistic Data Structures

This Go package provides several data structures for approximate counting and frequency estimation, leveraging Morris's algorithm and other techniques for efficient, probabilistic computations. It is suitable for applications where exact counts are unnecessary or where memory efficiency is paramount. The package includes implementations for 8-bit and 16-bit counters, Count-Min Sketch, and a Top-K frequent elements tracker.

  • 8-bit and 16-bit Approximate Counters: Implements Morris's algorithm to provide approximate counts with low memory usage. Tuned to count up to ~100k (8-bit) and ~2 billion (16-bit) with acceptable error rates.
  • Count-Min Sketch: A probabilistic data structure for estimating the frequency of events in a stream of data. It allows users to specify desired accuracy and confidence levels.
  • Top-K Frequent Elements Tracking: Maintains a list of the top-K frequent elements in a stream, using a combination of Count-Min Sketch for frequency estimation and a min-heap for maintaining the top elements.

Advantages

  • Memory Efficiency: The package uses probabilistic data structures which offer significant memory savings compared to exact counting methods, especially beneficial for large datasets or streams.
  • Performance: Incorporates fast, thread-local random number generation and efficient hash functions, optimizing performance for high-throughput applications.
  • Scalability: Suitable for scaling to large datasets with minimal computational and memory footprint increases.
  • Thread-Safety: Features such as atomic operations ensure thread safety, making the package suitable for concurrent applications.

Disadvantages

  • Probabilistic Nature: As the package relies on probabilistic algorithms, there is an inherent trade-off between memory usage and accuracy. Exact counts are not guaranteed, and there is a specified error margin.

Usage

8-bit and 16-bit Counters

Instantiate a counter and use the Increment method to increase its value probabilistically. The Estimate method returns the current approximate count.

var counter approx.Count8 // or approx.Count16 for 16-bit
counter.Increment()
fmt.Println(counter.Estimate())

Count-Min Sketch

Create a new Count-Min Sketch with default or custom parameters. Update the sketch with observed items and query their estimated frequencies.

cms, err := approx.NewCountMin()
if err != nil {
    log.Fatal(err)
}
cms.UpdateString("example_item")
fmt.Println(cms.CountString("example_item"))

Top-K Frequent Elements

Track the top-K frequent elements in a stream by creating a TopK instance and updating it with observed items.

topK, err := approx.NewTopK(10)
if err != nil {
    log.Fatal(err)
}
topK.UpdateString("example_item")
for _, v := range topK.Values() {
    fmt.Printf("Value: %s, Count: %d\n", v.Value, v.Count)
}

Dependencies

This package depends on external libraries for hashing and HyperLogLog implementations:

  • github.com/zeebo/xxh3: For fast hashing.
  • github.com/axiomhq/hyperloglog: For cardinality estimation in the Top-K tracker.

License

Please review the license agreement for this package before using it in your projects.

About

High performance approximate algorithms in Go (e.g. morris counter, count min, etc.)

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Languages