Skip to content
This repository has been archived by the owner on Apr 2, 2024. It is now read-only.

Improvements to batching #1739

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Improvements to batching #1739

wants to merge 6 commits into from

Commits on Dec 13, 2022

  1. Add reservation system for metrics

    This change adds a new reservation system for metric batching.
    The reservation system forms a queue of metric batchers for
    copiers to pick up. The queue is ordered by the time a request
    hits Promscale. That ordering allows copiers to combine a
    request together better.
    
    This solves a few problems:
    1. Previously the ordering of reservations was based on a channel
    and the ordering was based on when the batcher first made the reservation.
    While this may correlate with the time the request hit the connector,
    this ordering is less exact to what we actually want. This solves
    that problem by ordering explicitly on request start time.
    
    2. It allows for Peeking into info associated with the first reservation.
    That is not possible with a channel-based approach and is used here for
    ensuring that we don't stop batching until a min duration havs passed since the
    oldest request in the batch hit the system. That operates as an alternative
    backpressure mechanism to the regular mechanism of copiers becoming
    available. While in most cases it shouldn't be necessary, because
    of the sharding adjustment Prometheus does, it can be useful in some
    cases. Note that the minimum latency is measured from the time the request
    hits the connector and NOT the time taken in the batcher. This better controls
    the latency penalty of this alternate backpressure system.
    
    Peeking can also be used to have more advanced logic for batch sizing.
    For example, we may want to batch based on number of samples. Without
    peeking, you'd need to pop the queue before learning the number of
    samples, in which case it is too late.
    cevian committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    7a56204 View commit details
    Browse the repository at this point in the history
  2. Add better metrics for the batching system

    Should give us better observability.
    cevian committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    0d2461e View commit details
    Browse the repository at this point in the history
  3. Improve batching

    - Significantly increase number of samples batched per metric. Heavy
      metrics are very likely to come from many requests, so flushing
      them sooner is better. Having to wait multiple times to flush is
      no bueno.
    - To prevent copy batches from getting to large in the face of potentially
      more samples/metric change that batch criteria to batch not just
      with a limit to # metrics but also a limit of # samples.
    - change preference from try to flush before batching to try to batch before
      flushing. This should increase batch sizes
    - For two metric with the same initial request time, flush the bigger one
      first, hoping the smaller one can become bigger before flush.
    cevian committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    b93ea2b View commit details
    Browse the repository at this point in the history
  4. Prevent flushing a batch with less than one request

    Previously, when a new set of requests came in, we often
    flushed a very small first batch because we flush as soon
    as any data becomes available in Peek. Since small batches
    are ineffecient we fix this by waiting till the entire first
    request is batched to flush (with a timeout).
    cevian committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    80eec93 View commit details
    Browse the repository at this point in the history
  5. trace request logging

    cevian committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    efdf869 View commit details
    Browse the repository at this point in the history
  6. bug fix

    cevian committed Dec 13, 2022
    Configuration menu
    Copy the full SHA
    4c34323 View commit details
    Browse the repository at this point in the history