Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to measure the time it take for log from being dispatched to being committed and applied to the FSM #510

Open
dhiaayachi opened this issue Jun 14, 2022 · 3 comments

Comments

@dhiaayachi
Copy link
Contributor

Currently raft have multiple metrics to measure the time it take for a log to go through different stages:

  • raft.commitTime: measure the time it take for log to be written to disk and replicated
  • raft.fsm.enqueue: measure the time it take for a log to be enqueued to the queue of logs that are to be applied to the FSM
  • raft.fsm.apply: measure the time it take for a log to be applied to the FSM

Those metrics measure separately different stages of the log in raft but don't account for the time that a log spend in the FSM queue waiting to be applied.

Also, the replication time is aggregated with the disk write time, which make it harder to deduce the reason of a high raft.commitTime. That said, a high raft.commitTime in conjunction with a high raft.LastContact can be safely attributed to the replication taking too long.

I suggest to add 3 metrics:

  • raft.replicationTime: measure the time it take to replicate the log to enough nodes
  • raft.applyTime: measure the time from a log being dispatched to after it's applied to the FSM
  • raft.fsm.queueWait: measure the time from the log get enqueued to the fsm queue to being picked up to be applied to the fsm
@banks
Copy link
Member

banks commented Jun 14, 2022

Also, the replication time is aggregated with the disk write time
We also have raft.rpc.appendEntries.storeLogs which more directly measures disk IO timing and since Consul 1.11 we have raft.boltdb.storeLogs which I think comes from the BoltDB store directly too.

Just adding this for completeness if others come across this.

Overall I like the suggestions here though I have a few questions:

raft.replicationTime: measure the time it take to replicate the log to enough nodes

I'm not quite clear how this is different to the existing commitTime? Is it not including the write to disk on the leader? but that does still include the time taken to write to disk on the followers so it's maybe a little hard to explain the value of that 🤔 . We already have timing for the appendEntries RPC on the followers which is useful for debugging follower slowness and general timing around remote writes. I guess this is a bit different in that it includes time spent queuing on the leader waiting for replication and it would show actual timing it took to get a quorum of responses etc. Not against it as such just wondering how useful and how we could document it so that the semantics are useful to operators?

raft.applyTime: measure the time from a log being dispatched to after it's applied to the FSM

Names are hard... but it seems confusing to have two timers raft.applyTime and raft.fsm.apply 😄 I get that this is a timer for the total time spent blocked in the raft.Apply call by the caller and sits alongside the raft.apply counter... do you think raft.applyTotalTime is any better?

@dhiaayachi
Copy link
Contributor Author

I'm not quite clear how this is different to the existing commitTime? Is it not including the write to disk on the leader? but that does still include the time taken to write to disk on the followers so it's maybe a little hard to explain the value of that 🤔 . We already have timing for the appendEntries RPC on the followers which is useful for debugging follower slowness and general timing around remote writes. I guess this is a bit different in that it includes time spent queuing on the leader waiting for replication and it would show actual timing it took to get a quorum of responses etc. Not against it as such just wondering how useful and how we could document it so that the semantics are useful to operators?

The perspective I had when writing this is to be able to pin point the source of slowness in raft. By adding raft.replicationTime it will help steer the debugging to one of the followers, then we can review the specific metrics for each followers to have a better understanding. Otherwise if this metric is fine that mean that we need to focus on the leader. I would explain it from the leader perspective. Basically, this is the time it take to get a log replicated in enough followers before getting committed.

Names are hard... but it seems confusing to have two timers raft.applyTime and raft.fsm.apply 😄 I get that this is a timer for the total time spent blocked in the raft.Apply call by the caller and sits alongside the raft.apply counter... do you think raft.applyTotalTime is any better?

yes I agree, raft.applyTotalTime is definitely a better name for this. Naming things is hard ...and I'm not good in it 😆

@banks
Copy link
Member

banks commented Aug 15, 2022

Thanks Dhia. Happy to approve with that changed name.

Happy to take a look at the follow up PR for Consul's telemetry page where we explain the new metrics too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants