Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

"limit exhausted" message due to max-series-per-req limit (for untagged requests) #1930

Open
Dieterbe opened this issue Oct 28, 2020 · 3 comments
Labels
Milestone

Comments

@Dieterbe
Copy link
Contributor

the new max-series limit for untagged requests (see #1926 , #1929 ), specifically the limit fanned out to cluster peers (which is proportional to how much data the node has), has an issue. because metrictank has the limit enabled by default, this means that deploying master can return in "limit exhausted" messages well before the number of series is actually hit.

Here's why:
first of all note that UnpartitionedMemoryIdx.Find takes in the limit parameter which is this proportional limit.
(if PartitionedMemoryIdx.Find is used it further divides the limit by the number of partitions before it calls each UnpartitionedMemoryIdx.Find). This Find method relies on UnpartitionedMemoryIdx.findMaybeCached to use the find helper with the find cache in front of it. Note also that Find does "from filtering", meaning e.g. for a query from now-1h to now, metric definitions with lastUpdate < from are known to have no data and are not included in the result set.

But:

  • find() does a complete find (without 'from' filtering) these results can nicely be cached. If we from-filtered the results we'd need to add them to cache key and probably cache similar data many times over because on repeated queries the from would typically vary constantly. (but on the other hand, if there are typically many definitions that haven't been updated in a while - e.g. heavy churn situations - this approach means our cached results contain a large amount of definitions we don't need)
  • In Find() is where we do 'from' filtering.

We implemented the "series limiter" in find() because we don't want to first assemble the entire result set only to then check the limit, as that would make the limit moot, effectively. But implementing the limiter prior to from-filtering makes it impossible to apply correctly, as typically most entries will not be included in the final resultset due to the 'from' filtering. This means a benign request that only asks for a reasonable amount of (from filtered) series, may hit the limiter if there's more entries with older lastUpdate timestamps.

There's actually a 2nd kind of bug, which seems more rare, but anyway. Because find() does a breath-first search, as it's progressing down the tree it may collect a lot of branches that match the expression so far. We have to keep the branches while traversing further down the tree (and the pattern) to see whether the branches (and ultimately the leaves) fully match the path.
We precisely want to want to avoid only applying the limit until we've assembled the full response, which means we currently trigger a breach condition when the amount of "candidate branches" exceeds the limit, but it may well be that those branches would be dropped.

We want to avoid loading too much data in RAM (apply limit as early possible), while caching the find response body in a way that's agonstic wrt the 'from' filter, and in a correct way.

My proposal:

  1. change the find() algorithm to be depth first rather than breadth first. This means we can apply the limit as early possible (without loading too much data in RAM), in a correct way, although the limit needs to be moved outside of the main find() algorithm, so it can be applied on the from-filtered output. (e.g. in Find)
  2. rather than (assuming the uncached scenario) first finishing the complete find() - which may result in too much data -before doing the from filtering and applying the limit, change the calling convention to be a lockstep approach with an iterator.

find() becomes an iterator which feeds data to its caller (Find) which can do the from filtering and apply the limit while 'find' is running. the new find() can keep a full copy of its unfiltered output, so that - assuming the iteration is not aborted by its caller - it can add the full output to the cache after the iteration completes.

To make the find iteration depth first rather than breadth first. we compute all matchers up front (we wouldn't want to recompute them everytime we revisit a certain level of the tree)
this implies:

  • con: we compute all matchers up front, in particular we may compute matchers needlessly if we would otherwise have found out "early" that a query had no matches
  • pro: bail out early if a match expression is malformed. no need to traverse a section of the tree before detecting a bad matcher.
@Dieterbe Dieterbe added the bug label Oct 28, 2020
@Dieterbe Dieterbe added this to the sprint-18 milestone Oct 28, 2020
@Dieterbe
Copy link
Contributor Author

Dieterbe commented Oct 28, 2020

also, because we added the limit based filtering in find() we had to add the limit to the cache key, to cache results by both find pattern as well as the provided limit (see #1926).
as i now plan to move the limiter to after the from-filtering (which can be arbitrarily aggressive or permissive wrt which defs exist in the index), we can (and should!) revert the findCache to how it used to be (cache only by pattern), but I note that upon repeated queries, as long as the limit is being breached, we will not cache, and re-attempt the find from scratch each time, until we have one that succeeds. We might be able to make this smarter in the future (e.g. track breach failures with their from, if a request comes in with an earlier from it should fail too.

@Dieterbe
Copy link
Contributor Author

There's more:

  1. Find() (and thus the fanned out /index/find requests) return Nodes that could be either branches and leaves (or both), although executePlan only iterates the .Defs, e.g. Nodes that are leaves.as those are the only one that correspond to actual series. Thus, the limit checking in findSeries is too strict. it should only look at leaves
  2. Note that when using PartionedIndex, the amount of branches gets amplified. leaf nodes can only live in exactly 1 of its partitions, but because leaves that have branches in common are spread out across partitions, many branches will re-occur in many of the partitions. For the PartitionedIndex' proportial limits to work, we must make sure to only limit leaf nodes, not branch nodes (this is the sensible thing to do anyway)

@fkaleo
Copy link
Contributor

fkaleo commented Nov 4, 2020

@fkaleo will review if robert is not able to atm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants