You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tl;dr: We need two knobs so you can independently keep rate limiting / backpressure working, while still reducing recovery overhead.
Background
The epoch interval currently does two things:
Timely has queues between each operator and they are all unbounded, so we need some way of signaling to the input sources to slow down as to not cause these to grow without bound. We currently do this by at the end of each epoch, all inputs do not start emitting items in the next epoch until the previous epoch has fully closed. This is not optimal for performance, as the entire dataflow stops and drains all queues before starting again, but is easy to implement.
We also use the end of the epoch as the coordinated snapshot points. All stateful operators snapshot their state at the end of the epoch and send the snapshots downstream into the recovery system to be routed, serialized, and written to SQLite DBs. This results in the recovery overhead tradeoff: the more often you do this the larger the overhead, but the smaller the window of time you'll have to replay on resume.
Because we are currently using epoch_interval = snapshot_interval, if you want to reduce the overhead due to recovery, you are inherently giving up control over the backpressure system. E.g. if you only want to snapshot every day (since replaying a day's worth of data is fine), it means that all inputs will race and read as fast as they can for a whole day until the epoch is complete, snapshots happen, and then the backpressure can occur.
Suggestion
I think we should re-introduce epoch_interval as a param you set directly, I also think we can find a default that is fine for almost all workloads (probably 10 sec). Then, the snapshot_interval is used to derive an n so on the nth epoch is the only time we tell the operators to snapshot. E.g. if the epoch interval is 10 sec and the snapshot interval is 1 day (1 day is 86400 sec / 10 sec then n is 8640) only run the snapshot code every 8640 epochs.
It will be important to only write out progress info to the recovery DB on the n because we need to save the fact that only those epochs are the resume-able ones.
Further Work
I think this also sets us up to do some sort of "eager pacing". Instead of waiting for the entire old epoch to clear before emitting new data, the input could use previous estimates of how long it took for the epoch to clear to eagerly start to emit some items in the new epoch before the previous one is clear. Set up a feedback loop so this converges on the correct amount and then the epoch interval will be sort of self adjusting. A lot more details to work out here, but could be a good performance boost.
The text was updated successfully, but these errors were encountered:
tl;dr: We need two knobs so you can independently keep rate limiting / backpressure working, while still reducing recovery overhead.
Background
The epoch interval currently does two things:
Timely has queues between each operator and they are all unbounded, so we need some way of signaling to the input sources to slow down as to not cause these to grow without bound. We currently do this by at the end of each epoch, all inputs do not start emitting items in the next epoch until the previous epoch has fully closed. This is not optimal for performance, as the entire dataflow stops and drains all queues before starting again, but is easy to implement.
We also use the end of the epoch as the coordinated snapshot points. All stateful operators snapshot their state at the end of the epoch and send the snapshots downstream into the recovery system to be routed, serialized, and written to SQLite DBs. This results in the recovery overhead tradeoff: the more often you do this the larger the overhead, but the smaller the window of time you'll have to replay on resume.
Because we are currently using
epoch_interval = snapshot_interval
, if you want to reduce the overhead due to recovery, you are inherently giving up control over the backpressure system. E.g. if you only want to snapshot every day (since replaying a day's worth of data is fine), it means that all inputs will race and read as fast as they can for a whole day until the epoch is complete, snapshots happen, and then the backpressure can occur.Suggestion
I think we should re-introduce
epoch_interval
as a param you set directly, I also think we can find a default that is fine for almost all workloads (probably 10 sec). Then, thesnapshot_interval
is used to derive ann
so on then
th epoch is the only time we tell the operators to snapshot. E.g. if the epoch interval is 10 sec and the snapshot interval is 1 day (1 day is 86400 sec / 10 sec thenn
is 8640) only run the snapshot code every 8640 epochs.It will be important to only write out progress info to the recovery DB on the
n
because we need to save the fact that only those epochs are the resume-able ones.Further Work
I think this also sets us up to do some sort of "eager pacing". Instead of waiting for the entire old epoch to clear before emitting new data, the input could use previous estimates of how long it took for the epoch to clear to eagerly start to emit some items in the new epoch before the previous one is clear. Set up a feedback loop so this converges on the correct amount and then the epoch interval will be sort of self adjusting. A lot more details to work out here, but could be a good performance boost.
The text was updated successfully, but these errors were encountered: