Skip to content

hstats is a Rust library for parallelizable online histogram creation and statistical analysis.

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

antimora/hstats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hstats: Online Statistics and Histograms for Data Streams

GitHub Workflow Status GitHub tag Crates.io Docs.rs

hstats is a streamlined and high-performance library engineered for online statistical analysis and histogram generation from data streams. With a focus on multi-threaded environments, hstats facilitates parallel operations that can later be merged into a single Hstats instance.

During the histogram creation process, the number and width of bins are predetermined. The bin width is calculated using the formula (end - start)/nbins, based on the parameters provided by the user. Values that fall within the range of [start, end) are assigned to the appropriate bins, while values outside this range are counted in underflow and overflow bins, which allows for subsequent adjustments to the histogram's range.

hstats utilizes Welford's algorithm via the rolling-stats library to compute mean and standard deviation statistics. The hstats library is compatible with no_std environments that support alloc.

To simplify the output of statistics and histograms, hstats implements the Display trait for Hstats. This allows users to define the floating-point precision (default is 2) for the printed statistics and choose the character used for the histogram bars (default is ).

Getting Started

Add the following to your Cargo.toml:

[dependencies]
hstats = "0.1.0"

Examples

  1. Single thread example: See examples/single-thread.rs Run the example with:
    time cargo run --example single-thread --release
  2. Multi-thread example: See examples/multi-thread.rs Run the example with:
    time cargo run --example multi-thread --release

Here is a sample output from the multi-thread example:

Number of random samples: 50000000
Number of bins: 30
Start: -8
End: 10
Thread count: 20
Chunk size: 2500000
Number of hstats to merge: 20
Start |  End
------|-------
 -inf | -8.00 |  21553 (0.04%)
-8.00 | -7.40 |  21752 (0.04%)
-7.40 | -6.80 |  40523 (0.08%)
-6.80 | -6.20 | ░ 73078 (0.15%)
-6.20 | -5.60 | ░ 125206 (0.25%)
-5.60 | -5.00 | ░░░ 207593 (0.42%)
-5.00 | -4.40 | ░░░░ 331470 (0.66%)
-4.40 | -3.80 | ░░░░░░░ 508330 (1.02%)
-3.80 | -3.20 | ░░░░░░░░░░░ 745836 (1.49%)
-3.20 | -2.60 | ░░░░░░░░░░░░░░░ 1054228 (2.11%)
-2.60 | -2.00 | ░░░░░░░░░░░░░░░░░░░░░ 1433304 (2.87%)
-2.00 | -1.40 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 1868758 (3.74%)
-1.40 | -0.80 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2339425 (4.68%)
-0.80 | -0.20 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2819448 (5.64%)
-0.20 |  0.40 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3261165 (6.52%)
 0.40 |  1.00 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3623683 (7.25%)
 1.00 |  1.60 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3875841 (7.75%)
 1.60 |  2.20 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3980760 (7.96%)
 2.20 |  2.80 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3928130 (7.86%)
 2.80 |  3.40 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3725868 (7.45%)
 3.40 |  4.00 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3393196 (6.79%)
 4.00 |  4.60 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2971132 (5.94%)
 4.60 |  5.20 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2497847 (5.00%)
 5.20 |  5.80 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2022084 (4.04%)
 5.80 |  6.40 | ░░░░░░░░░░░░░░░░░░░░░░░ 1569143 (3.14%)
 6.40 |  7.00 | ░░░░░░░░░░░░░░░░░ 1171523 (2.34%)
 7.00 |  7.60 | ░░░░░░░░░░░░ 841536 (1.68%)
 7.60 |  8.20 | ░░░░░░░░ 579675 (1.16%)
 8.20 |  8.80 | ░░░░░ 383658 (0.77%)
 8.80 |  9.40 | ░░░ 243884 (0.49%)
 9.40 | 10.00 | ░░ 148977 (0.30%)
10.00 |   inf | ░░ 191394 (0.38%)

Total Count: 50000000 Min: -14.19 Max: 18.04 Mean: 2.00 Std Dev: 3.00


real    0m1.905s
user    0m9.727s
sys     0m0.127s

License

hstats is licensed under your choice of either the Apache License, Version 2.0, or the MIT license.

About

hstats is a Rust library for parallelizable online histogram creation and statistical analysis.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Languages