Skip to content

uddsketch data structure for fast and accurate tracking of quantiles in data stream.

License

Notifications You must be signed in to change notification settings

jettify/uddsketch-py

Repository files navigation

uddsketch

GitHub Actions status for master branch

image

image

image

Documentation Status

uddsketch data structure for fast and accurate tracking of quantiles in data streams. Implantation is based on ideas described in DDSketch and UDDSketch papers.

Simple example

from uddsketch import UDDSketch

hist = UDDSketch(initial_error=0.01)

# Populate structure with dummy data
for i in range(0, 100):
    hist.add(0.1 * i)

# Estimate p95 percentile
q = hist.quantile(0.95)
print(f"quantie: {q}")
# quantie: 9.487973051696695

# Estimate median
m = hist.median()
print(f"median: {m}")
# median: 4.903763642930295

Merging Two Sketches

import numpy as np

from uddsketch import UDDSketch


def main():
    # fix seed for reproducible results
    rs = np.random.RandomState(seed=42)
    # generate dummy data
    data = ((rs.pareto(3.0, 5000) + 1) * 2.0).tolist()

    # add first half to first sketch
    hist1 = UDDSketch(initial_error=0.01, max_buckets=128)
    for v in data[:500]:
        hist1.add(v)

    # add second half to other sketch
    hist2 = UDDSketch(initial_error=0.01, max_buckets=128)
    for v in data[500:]:
        hist2.add(v)

    hist1.merge(hist2)
    combined_count = hist1.num_values
    print(f"Num values added to combined sketch: {combined_count}")
    # Num values added to combined sketch: 5000

    # Estimate combined p95 percentile
    q = hist1.quantile(0.95)
    print(f"quantie: {q}")
    # quantie: 5.419515033387234

    # Estimate combined median
    m = hist1.median()
    print(f"median: {m}")
    # median: 2.5344610207788048


if __name__ == "__main__":
    main()

Installation

Installation process is simple, just:

$ pip install uddsketch

References

1. Charles Masson, Jee E. Rim and Homin K. Lee. DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees. [https://www.vldb.org/pvldb/vol12/p2195-masson.pdf]

2. I. Epicoco, C. Melle, M. Cafaro, M. Pulimeno and G. Morleo. UDDSketch: Accurate Tracking of Quantiles in Data Streams. [https://arxiv.org/abs/2004.08604]

About

uddsketch data structure for fast and accurate tracking of quantiles in data stream.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published