Skip to content
This repository has been archived by the owner on Apr 18, 2023. It is now read-only.

Add Summary metrics #57

Open
markeijsermans opened this issue Jul 26, 2018 · 5 comments
Open

Add Summary metrics #57

markeijsermans opened this issue Jul 26, 2018 · 5 comments

Comments

@markeijsermans
Copy link

markeijsermans commented Jul 26, 2018

I've POC'd adding summary metrics in a similar way as histograms. It's enabled via a similar method EnableHandlingTimeSummary(). Is there interest in me submitting a PR?

Motivation:
Our monitoring situation is in transition as we move to prometheus/grafana. We currently export metrics to a hosted solution via a prometheus remote adapter. In that hosted solution calculating a p99 from histogram buckets is not possible (normally you would use histogram_quantile())

@brancz
Copy link
Collaborator

brancz commented Jul 26, 2018

Thanks for opening this issue to start the dicussion! 🙂

My feeling is that we would probably do more harm if we add it, as most people don't understand that summaries are meant for cases where you can't control latencies at all, and would just enable it without understanding what they're doing.

I understand your case and I think what you're doing makes sense in your situation, and you understand that the histogram is actually the appropriate use, but I think I'd prefer people do it as explicitly as you and really understand the implications that has.

I'd still like to hear @Bplotka's thoughts as well.

@bwplotka
Copy link
Collaborator

bwplotka commented Jul 27, 2018

I have mixed feelings, I agree with both @brancz arguments and @markeijsermans motivation. What about doing.. both?

Instead, of adding explicit Method for EnableHandlingTimeSummary because users may shoot themselves in the foot if they are not careful, maybe we should just accept custom serverReporters and clientReporters so advanced users like @markeijsermans can write down summary metric if needed? That would allow for more flexibility maybe.

On the other hand, Prometheus has summaries for some reason, so maybe we should actually trust people to understand the differences between this and histograms? Or do you @brancz think that users will use it extensively and blame Prometheus for not being efficient?

What do you think about adding just extension like custom reporters? That would unblock you @markeijsermans?

@brancz
Copy link
Collaborator

brancz commented Jul 27, 2018

I think extensibility would be a good compromise.

@markeijsermans
Copy link
Author

markeijsermans commented Jul 27, 2018

My understanding of the hesitance to directly support summary metrics is that it is less flexible and performant. The quantiles are fixed and need to be computed for each observation. I agree with your reasoning. An extension point works for us.

@Bplotka if I understand you correctly, it would look something like this

  • create ServerReporter interface
type ServerReporter interface {
	ReceivedMessage()
	SentMessage()
	Handled(code codes.Code)
}
  • add an extension function WithCustomServerReporter(s ServerReporter)
  • update interceptors to handle the customServerReporter in the same fashion as the default
  • same for the client side

@bwplotka
Copy link
Collaborator

bwplotka commented Jul 27, 2018 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants