Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Histograms

dilipdevaraj-sfdc edited this page Feb 19, 2020 · 26 revisions

User Guide


Working With Histograms

Histograms are time series entities that contain the raw bucket count for a given scope, metric name, tag combination. When users calculate percentile on the client side like p50, p90 and publish these as metrics, it is a summary. With this we can no longer compute some other percentile value like p99, or percentile aggregated across some other tags. By publishing histograms with the raw bucket count, any percentile can be calculated, and this will be aggregated across multiple tags.

Usecase

Users are making web requests to our service, hitting a couple of endpoints. We are measuring the request query latency. Our service is deployed with two versions, running on two devices.

endpoint = /api1, /api2

version = v1, v2

device = device1-1, device1-2

Histogram Write

Users can construct the buckets information (bucket range and corresponding count) at some regular interval and send this to Argus. The upper and lower bounds of consecutive buckets must overlap. Eg) Histogram data is sent in 5 mins interval, starting from 11am,11:05am,11:10am..... The histogram object at 11:05am, will correspond to the previous 5 minute counts starting from 11:01am-11:05am. At 11:10am, a new histogram object is constructed for counts from 11:06am-11:10am.

POST request to /collection/histograms endpoint

  • Support maximum of 100 buckets per histogram
  • tags are optional, but it is preferred to populate this (similar to metrics).
  • buckets cannot be empty. At least one key, value must be supplied.
  • buckets key is a tuple made of <upper , lower bound> of type float each.
  • buckets value is count of type long
  • overflow, underflow are optional of type long - If values fall outside the buckets range this field is populated.
Example: Histogram Payload
[
  {
   "scope":"scope",
   "metric":"query.latency",
   "timestamp":1537804800000,
   "overflow":1,
   "underflow":0,
   "buckets":{
      "0,50":100,
      "50,100":60,
      "100,300":25,
      "300,1000":10,
      "1000,10000":5
   },
   "tags":{  
      "endpoint":"/api1",
      "version":"v1",
      "device":"device1-1"
   }
 }
]

"0,50":100 , means 100 requests fell in the query latency bucket of 0-50 ms.

api1-v1-device1-1

Querying Histograms

histogram-percentiles and histogram-buckets are optional fields to retrieve histogram data. It appears after the optional downsampler field.

  • pNth histogram percentile gives the corresponding bucket's average.
  • Only sum aggregator is supported for histogram query.
  • Only buckets value is used for computing percentile.
  • Optional overflow, underflow fields are not used in percentile calculation, since this is generally meant for handling exception cases, and should not skew percentile results. However if you still want to use it in percentile calculation, you would make a conscious choice of defining a sentinel bucket.
Example: Histogram percentiles and buckets
Get histogram percentile aggregated across endpoints, versions, devices
-1d:-5m:scope:query.latency:sum:1m-sum:histogram-percentiles[50]

Get histogram percentile for an endpoint and version
-1d:-5m:scope:query.latency{endpoint=/api1,version=v1}:sum:1m-sum:histogram-percentiles[50|90]

Get histogram buckets for an endpoint and version
-1d:-5m:scope:query.latency{endpoint=/api1,version=v1}:sum:1m-sum:histogram-buckets