index lastUpdated timestamps are only stored on leaf nodes #714

woodsaj · 2017-08-29T17:26:28Z

When searching the index, we allow users to pass a "from" param. Only series that have been updated since the from timestamp are returned.

However we only filter leaf nodes and not branch nodes.

so if you have a series a.b.c.d.e that has not been updated in 24hours you will still get results when quering everything updated in the last hour with a query expression that matches a branch node.
eg. metrics/find?from=-1h&query=a.b.c.* will return "a.b.c.d"
but metrics/find?from=-1h&query=a.b.c.d.* wont return anything as a.b.c.d.e has not been updated for 24hours.

to resolve this we will need to keep a lastUpdated attribute on branch nodes. This would not be hard to implement but having to walk down the tree on every update (which happens every time a metrics is received) could be quite expensive.

The text was updated successfully, but these errors were encountered:

Dieterbe · 2017-08-29T17:49:33Z

alternatively, at read time could we not omit all branches that have no children (or only other empty branches) ?

shanson7 · 2017-08-29T17:54:36Z

This could be considerably more expensive to do at read time.

Think about something with ten nodes, each node cardinality ten. Just thinking about doing a DFS on the initial * query short circuiting only on finding a leaf within the time range. The cost is exponential. (In practice, this might be ok since most data is updated frequently).

Now, with regards to the update path. I did implement a tree based memory idx a couple months back to try to reduce the memory footprint. The performance was comparable (if I recall correctly). I abandoned it because I couldn't get the memory improvement because the entire series name was stored in the archive object anyway :(

Dieterbe · 2017-09-30T18:08:26Z

@shanson7 :

Just thinking about doing a DFS

what's a dfs?
our index is ripe for a refactor (best after #729 is merged), we can work out & discuss the goals once we have some experience running #729 live.

note that this issue can be worked around by finding a pattern that includes all required nodes.
but it's still annoying and confusing

shanson7 · 2017-10-01T19:31:24Z

I just meant doing a Depth-first search to try to find a leaf within the time boundaries could be very expensive for a large enough index.

Dieterbe · 2017-10-02T09:04:34Z

So are you saying basically that if a subtree hasn't received an update in a long time, we will go through too much effort going through the entire subtree only to figure out at the end that it can be left out?
this is true, but it is basically the same as what we did in the past before we had the timefilter (traverse the entire subtree), just with the extra cost of some len(slice) calls, which seems to only use slightly more time.
I think in practice this should be infrequent, and is also alleviated via the index max-stale time after which entries are deleted. (it defaults to 0 = disabled, we currently run 48h in our environments - which can be argued is too aggressive)

either way, it seems the tradeoff basically comes down to :

at read time, is efficient for sufficiently up to date data, and inefficient if significant subtrees are stale
at write time, is efficient if points don't come in frequently, and inefficient for workloads with more ingestion than queries.

Typical workload has a much higher ingest rate than render request rate, so seems to me we should do it at read time. even with very high max-stale or with index pruning disabled this seems the right way to me.

Dieterbe · 2017-12-21T13:45:08Z

I think this also affects pruning.
nodes without children (e.g. branch-only) aren't subject to pruning
nodes that only have children aren't affected
but nodes that are branches as well as leafs are affected: if the pruning routine marks the node as to delete, we will recursively delete that node, including all its children / subtrees.
e.g. foo.bar is outdated but foo.bar.baz has recent data, it will be deleted also.
in practice I think we haven't noticed because we nodes that are branch+leaf are uncommon, and the leaves would probably be readded quickly after the delete anyway

- this issue was discovered in discussions of issue #714 - this also fixes #797

- this issue was discovered in discussions of issue grafana#714 - this also fixes grafana#797

Dieterbe added this to the 0.8.1 milestone Sep 30, 2017

woodsaj mentioned this issue Nov 21, 2017

optimize index for heavy churn / rework index pruning #764

Closed

This was referenced Dec 21, 2017

update memory index prune to hold write lock as briefly as possible #787

Merged

statMetricsActive is incorrect because we prune recursively. #797

Closed

woodsaj pushed a commit that referenced this issue Dec 21, 2017

prune correctly when series is both leaf and branch

848f484

- this issue was discovered in discussions of issue #714 - this also fixes #797

woodsaj pushed a commit that referenced this issue Dec 21, 2017

prune correctly when series is both leaf and branch

f4d098f

- this issue was discovered in discussions of issue #714 - this also fixes #797

woodsaj mentioned this issue Dec 21, 2017

prune correctly when series is both leaf and branch #799

Merged

woodsaj pushed a commit that referenced this issue Dec 21, 2017

prune correctly when series is both leaf and branch

34cea4f

- this issue was discovered in discussions of issue #714 - this also fixes #797

woodsaj pushed a commit that referenced this issue Dec 21, 2017

prune correctly when series is both leaf and branch

09fab79

- this issue was discovered in discussions of issue #714 - this also fixes #797

Aergonus pushed a commit to bloomberg/metrictank that referenced this issue Jan 19, 2018

prune correctly when series is both leaf and branch

a421d4e

- this issue was discovered in discussions of issue grafana#714 - this also fixes grafana#797

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index lastUpdated timestamps are only stored on leaf nodes #714

index lastUpdated timestamps are only stored on leaf nodes #714

woodsaj commented Aug 29, 2017

Dieterbe commented Aug 29, 2017

shanson7 commented Aug 29, 2017 •

edited

Dieterbe commented Sep 30, 2017

shanson7 commented Oct 1, 2017

Dieterbe commented Oct 2, 2017 •

edited

Dieterbe commented Dec 21, 2017

index lastUpdated timestamps are only stored on leaf nodes #714

index lastUpdated timestamps are only stored on leaf nodes #714

Comments

woodsaj commented Aug 29, 2017

Dieterbe commented Aug 29, 2017

shanson7 commented Aug 29, 2017 • edited

Dieterbe commented Sep 30, 2017

shanson7 commented Oct 1, 2017

Dieterbe commented Oct 2, 2017 • edited

Dieterbe commented Dec 21, 2017

shanson7 commented Aug 29, 2017 •

edited

Dieterbe commented Oct 2, 2017 •

edited