Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce batch time metrics #704

Open
owenhaynes opened this issue Apr 11, 2024 · 3 comments
Open

Produce batch time metrics #704

owenhaynes opened this issue Apr 11, 2024 · 3 comments

Comments

@owenhaynes
Copy link
Contributor

To be able to see how long it took to to produce a batch to pass to monitoring systems to work out if producing timeout is to low or Kafka is being overloaded.

Expired issues where batch's get stuck in a retry loop because the batch is timing out and no way to diagnose this.

@twmb twmb added the TODO label May 23, 2024
@twmb
Copy link
Owner

twmb commented May 26, 2024

What span of time are you attempting to capture, and with what information?
I was thinking to add this to ProduceBatchMetrics, but the ProduceBatchWritten hook is only called on batches that are successfully produced.

Do you want to capture from the moment a record enters Produce through ... when? If you want to capture failures, do you mean for the batch duration to be called on every failure&retry, or only the final failure at which point the promise is called?

@owenhaynes
Copy link
Contributor Author

Yeah I had a look at produce batch metrics and thought it was the wrong place.

More interested on the Kafka request time then when a message gets put on the produce queue as this is when the ProduceRequestTimeout value is used and what causes a retry to happen. So would be good for this time taken to be recorded for each retry.

I am not interested in capturing failures at the moment, but maybe its worth tracking these somehow generally, as RecordRetries can be left unbounded and the promise may never be called to allow for tracking of produce errors. So you could just end up with producing being stuck and no way to investigate.

I like the hook system as its useful to switch in and out different tools but maybe making harder to add error case handling.

@twmb
Copy link
Owner

twmb commented May 28, 2024

So, do you want essentially the time that a record was in the client? If so, couldn't you set r.Context before producing with a key indicating "producing now", and then check time.Since that key once the promise is called?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants