Separate `epoch_interval` and `snapshot_interval` #353

davidselassie · 2023-12-20T17:53:01Z

tl;dr: We need two knobs so you can independently keep rate limiting / backpressure working, while still reducing recovery overhead.

Background

The epoch interval currently does two things:

Timely has queues between each operator and they are all unbounded, so we need some way of signaling to the input sources to slow down as to not cause these to grow without bound. We currently do this by at the end of each epoch, all inputs do not start emitting items in the next epoch until the previous epoch has fully closed. This is not optimal for performance, as the entire dataflow stops and drains all queues before starting again, but is easy to implement.
We also use the end of the epoch as the coordinated snapshot points. All stateful operators snapshot their state at the end of the epoch and send the snapshots downstream into the recovery system to be routed, serialized, and written to SQLite DBs. This results in the recovery overhead tradeoff: the more often you do this the larger the overhead, but the smaller the window of time you'll have to replay on resume.

Because we are currently using epoch_interval = snapshot_interval, if you want to reduce the overhead due to recovery, you are inherently giving up control over the backpressure system. E.g. if you only want to snapshot every day (since replaying a day's worth of data is fine), it means that all inputs will race and read as fast as they can for a whole day until the epoch is complete, snapshots happen, and then the backpressure can occur.

Suggestion

I think we should re-introduce epoch_interval as a param you set directly, I also think we can find a default that is fine for almost all workloads (probably 10 sec). Then, the snapshot_interval is used to derive an n so on the nth epoch is the only time we tell the operators to snapshot. E.g. if the epoch interval is 10 sec and the snapshot interval is 1 day (1 day is 86400 sec / 10 sec then n is 8640) only run the snapshot code every 8640 epochs.

It will be important to only write out progress info to the recovery DB on the n because we need to save the fact that only those epochs are the resume-able ones.

Further Work

I think this also sets us up to do some sort of "eager pacing". Instead of waiting for the entire old epoch to clear before emitting new data, the input could use previous estimates of how long it took for the epoch to clear to eagerly start to emit some items in the new epoch before the previous one is clear. Set up a feedback loop so this converges on the correct amount and then the epoch interval will be sort of self adjusting. A lot more details to work out here, but could be a good performance boost.

The text was updated successfully, but these errors were encountered:

github-actions bot added the needs triage New issue, needs triage label Dec 20, 2023

davidselassie added type:feature New feature request and removed needs triage New issue, needs triage labels Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate `epoch_interval` and `snapshot_interval` #353

Separate `epoch_interval` and `snapshot_interval` #353

davidselassie commented Dec 20, 2023

Separate epoch_interval and snapshot_interval #353

Separate epoch_interval and snapshot_interval #353

Comments

davidselassie commented Dec 20, 2023

Background

Suggestion

Further Work

Separate `epoch_interval` and `snapshot_interval` #353

Separate `epoch_interval` and `snapshot_interval` #353