Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Use datapoint timestamp to determine persist to cassandra idx #1281

Open
shanson7 opened this issue Apr 12, 2019 · 9 comments
Open

Use datapoint timestamp to determine persist to cassandra idx #1281

shanson7 opened this issue Apr 12, 2019 · 9 comments
Labels

Comments

@shanson7
Copy link
Collaborator

shanson7 commented Apr 12, 2019

In the cassandra index current time is used to determine if lastUpdate should be saved to the cassandra table. When doing an initial backfill, this means that we need to wait 3h (by default) to publish the final datapoint to make sure that the lastUpdate is properly persisted.

If the datapoint timestamp was used instead, we wouldn't need to worry about it.

@stale
Copy link

stale bot commented Apr 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 4, 2020
@stale stale bot closed this as completed Apr 11, 2020
@shanson7
Copy link
Collaborator Author

We had another customer complaining about this. They publish forecasted data every Sunday for the next week. Because of the compressed publishing window, the lastUpdate is roughly 7 days behind. If there are any restarted instances, the forecast time-series is not returned.

One thing to think about is that accelerated publish rate could also cause a lot of writes to be triggered. Imagine, for example, backfilling ten years of data. If the update interval is 3 hours, we might trigger a lot of useless intermediary updates.

@shanson7 shanson7 reopened this Jun 14, 2021
@stale stale bot removed the stale label Jun 14, 2021
@Dieterbe
Copy link
Contributor

original link from OP doesn't work anymore, but it was pointing to this code - which is still unchanged

// check if we need to save to cassandra.

You mention LastUpdate, but that code uses LastSave.
the difference is documented here: https://github.com/grafana/metrictank/blob/master/docs/metadata.md#lastsave-vs-lastupdate
(TLDR LastUpdate uses date time, LastSave uses wallclock)
In both of your comments/cases, did you mean LastSave ?

@shanson7
Copy link
Collaborator Author

You mention LastUpdate, but that code uses LastSave.

Yes, I said and meant LastUpdate. When LastUpdate is significantly out of date, some series get stripped from queries where data exists. From a user perspective, LastSave is not relevant.

@shanson7
Copy link
Collaborator Author

To be very explicit:

The Issue

Data published at an accelerated rate will have datapoints published faster than real-time (e.g. backfill or forecast data). This means that the value of LastUpdate can be very inaccurate. This is because LastSave is checked against the wallclock, not the datapoint time.

Generally, the stored LastUpdate value for real-time data which can be off by at most update-interval. For faster-than-real-time data there isn't a meaningful upperbound on how inaccurate LastUpdate can be.

Proposed Solution

LastSave should use the datapoint value (or LastUpdate value) to determine if an update needs to be written to the database. This keeps the window of inaccuracy to the specified update-interval. And post #1983 that is well handled by metrictank.

@Dieterbe
Copy link
Contributor

Dieterbe commented Sep 1, 2021

OK I get it now.

One thing to think about is that accelerated publish rate could also cause a lot of writes to be triggered. Imagine, for example, backfilling ten years of data. If the update interval is 3 hours, we might trigger a lot of useless intermediary updates.

are you referring to 1) current status, or 2) with your proposal applied?
I think currently this is not an issue - it would trigger an update every 3 hours, regardless of how fast you send data, precisely because we use wallclock time.
but with the proposal applied - looking at the data time - it would trigger (10*365*24)/3=30kupdates. per series

I'm having trouble finding any issues with your suggestion, other than the above, which is the main reason I want to explore alternative options

To be clear, the issue is with young metrictank instances that haven't seen any of the data for the given series from kafka.
(instances that have seen data, always have an accurate lastUpdate in memory, regardless of what the persistent index says)
Thus, (note I keep lastSave wallclock based in this thinking) the issue appears when a series observed significant lastUpdate jumps in a short time (such that the lastSave update didn't kick in) and the series expired out of kafka. Conceivably then, we could track when an index entry was last seen from kafka, and do a periodic scan for those entries that are about to expire from kafka and persist them. Although I'm not too excited about such a scan. Though we already have index pruning which scans the index, we could conceivably add it in there, but i would rather not overload the pruning system, and some deployments may not currently have it enabled.

How about this:
first of all, note that the only field that can really change for a given index entry is lastUpdate and partition. but dynamic partition changing never worked I think - this is described elsewhere. And all other fields contribute to the Id/MKey. So only lastUpdate can get "stale"

We replace the cassandra writeQueue with a smart queue

  • remove idx.Archive.LastSave and add a new field to track the last lastUpdate that we saved to the smart queue (e.g. lastUpdateSaved)
  • whenever lastUpdate > lastUpdateSaved + updateInterval we trigger a save into the new smart queue (like your proposal, this fixes the main problem)

The smart queue is not just a channel, it has a staging area

  • keep incoming metricdef updates staged for a duration that is less than kafka retention minus however long we waited before triggering saves (updateInterval) minus how long saves to the persistent index take in the worst case. (suggestion: we add wording to update-interval to instruct users take kafka retention, subtract the worst case save duration from it, divide that number by two and use that as updateInterval, then we can use updateInterval here as well)
  • If a new save operation comes in for an already seen-MKey, replace the old one with it (IOW, update the lastUpdate field to whatever the new save request instructed because that's the only new field)
  • a worker goroutine periodically scans the staging area and flushes updates that have been staged for a while. These updates themselves may need to go through a simple worker queue like we have now. (note, this requires each entry to have the wallclock time of when the last save was initiated)

This is obviously more complicated, and doing it right means we need to be aware of how long saves take (in the write queue, but also the staging background routine), but in exchange we won't overload cassandra with lots of intermediary updates

@shanson7
Copy link
Collaborator Author

shanson7 commented Sep 2, 2021

are you referring to 1) current status, or 2) with your proposal applied?

That was with the proposal applied. I was pointing out a potential issue.

we trigger a save into the new smart queue

I sort of had a similar idea to this, but not quite as involved. My thought was that instead of writing exactly what was pulled off the queue the cassandra writer could do a lookup in the index to get the latest values. If this latest value in the index has a LastSave > the LastUpdate in the request from the queue, then skip it (the idea being that a newer request has been enqueued). This would reduce the number of writes to Cassandra if the queue starts to grow too large.

I think this version is less work than your proposal but not as "complete" in preventing duplicate writes.

@Dieterbe
Copy link
Contributor

Dieterbe commented Sep 2, 2021

So, a .Get() on the in-memory index for every write operation. That seems like a favorable tradeoff. Way simpler than my idea and seems like it will solve the problem just fine.

@stale
Copy link

stale bot commented Jan 8, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 8, 2022
@shanson7 shanson7 removed the stale label Jan 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants