Skip to content
This repository has been archived by the owner on Mar 27, 2021. It is now read-only.

Evaluate Histogram Aggregation speed #677

Open
ao2017 opened this issue Jul 13, 2020 · 1 comment
Open

Evaluate Histogram Aggregation speed #677

ao2017 opened this issue Jul 13, 2020 · 1 comment

Comments

@ao2017
Copy link
Contributor

ao2017 commented Jul 13, 2020

We are evaluating 4 histograms implementations to improve Heroic percentile accuracy.
An important factor in the selection process is the aggregation speed.
We will do aggregation before injection and during percentile computation.

Histogram implementations under consideration:

  1. t-digest
  2. DDSketch
  3. HdrHistogram
  4. cirlhisto
@project-bot project-bot bot added this to Inbox in Observability Kanban Jul 13, 2020
@lmuhlha lmuhlha moved this from Inbox to To do in Observability Kanban Jul 13, 2020
@lmuhlha lmuhlha moved this from To do to In progress in Observability Kanban Jul 13, 2020
@ao2017
Copy link
Contributor Author

ao2017 commented Aug 5, 2020

Our evaluation is completed. Our selection criteria included:

  1. Serialized histogram size
  2. Accuracy
  3. Merge and compute speed.

We decided to go with T-digest. Here is how ?

We quickly remove cirlhist and DDSketch java implementation from consideration because both implementations weren't ready for our use case. Cirlhist and DDSketch don't have built-in serialization and compression.

When recorded data point count is around 1000, HdrHistogram sketches are smaller compared to Tdigest histogram but Tdigest performs better as the number of recorded data points increases. The size of Tdigest histogram is also stable around 2000 bytes. Therefore when it comes to histogram size Tdigest is a better choice for us.

HdrHistogram and Tdigest are both accurate with P99 and even P99.999 when the dataset has uniform distribution. On the dataset with Pareto distribution, Tdigest didn’t perform as well as HdrHistogram, the relative error on P99.999 was close to 6.7% but the P99 was fairly good.

Our benchmark test shows that Tdigest is 4 times faster than HrdHistogram .

We decided to go with tdigest because we handle large volumes of data, computation speed and storage cost is very important to us.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
Development

No branches or pull requests

2 participants