Version v2.0.0 - draft #63

piotrkowalczuk · 2018-10-21T18:54:05Z

This PR is related to #60, it introduces:

No global state (client metrics).
Shared monitor struct.
Changed metric names to be more consistent and intuitive. breaking
- counters
  - started_total -> requests_total
  - handled_total -> responses_total
  - msg_received_total -> received_messages_total
  - msg_sent_total -> sent_messages_total
- gauges (new)
  - in_flight_requests // partially
  - connections // partially
- histograms
  - handling_seconds->request_duration_histogram_seconds
Request duration metric (histogram) is not optional anymore. breaking
clientReporter API become private
CounterOption and HistogramOption replaced by CollectorOption breaking
- WithConstLabels
- WithNamespace
- WithSubsystem
- WithBuckets

TODO:
[ ] if accepted, apply similar changes to ServerMetrics

codecov-io · 2018-10-24T15:32:02Z

Codecov Report

Merging #63 into draft-v2.0.0 will decrease coverage by 23.25%.
The diff coverage is 68.58%.

@@                Coverage Diff                @@
##           draft-v2.0.0      #63       +/-   ##
=================================================
- Coverage         78.59%   55.34%   -23.26%     
=================================================
  Files                 8        9        +1     
  Lines               271      374      +103     
=================================================
- Hits                213      207        -6     
- Misses               49      157      +108     
- Partials              9       10        +1

Impacted Files	Coverage Δ
client.go	`100% <ø> (ø)`	⬆️
server.go	`71.42% <ø> (-28.58%)`	⬇️
server_metrics.go	`34.78% <100%> (-45.42%)`	⬇️
client_reporter.go	`100% <100%> (ø)`	⬆️
metric_options.go	`13.04% <13.63%> (-3.63%)`	⬇️
monitor.go	`65% <65%> (ø)`
client_metrics.go	`83.05% <90.62%> (+8.85%)`	⬆️
server_reporter.go	`0% <0%> (-100%)`	⬇️
util.go	`30.76% <0%> (-38.47%)`	⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93bf462...d784bec. Read the comment docs.

bwplotka · 2018-10-24T23:07:52Z

Thanks for this, but why are you changing all in same PR? ((: Basically would be nice to discuss each of change separately. Some of them sounds good from first glance, some are not that obvious:

histogram not being optional? Why?
just curious why creating new abstraction for collectorOpts?
metric names changes. This is actually the trickiest change, because this is part of API has huge consequences for users. Some of us are having huge number of dashboards and alerts and rules that depend on metric name. Even with explicit new version and verbose changelog this will be challenge for users and major reason NOT to update. So if we are changing those we need to make sure there is a value that justifies the problems that it will cause. (: Not sure if consistency is a good reason here.
The inflight could be also problematic to use. This could be potentially changing a lot (faster than scrape interval) so we could not see the spikes. That's why sum(rate(grpc_server_started_total[5m])) by (job) - sum(rate(grpc_server_handled_total[5m])) by (job)) might be a better overview of in flight request

piotrkowalczuk · 2018-10-27T14:29:06Z

@bwplotka Yes PR is intentionally way bigger than we agreed on. I would like to explore what are necessary public API changes.

The inflight could be also problematic to use. This could be potentially changing a lot (faster than scrape interval) so we could not see the spikes. That's why sum(rate(grpc_server_started_total[5m])) by (job) - sum(rate(grpc_server_handled_total[5m])) by (job)) might be a better overview of in-flight request

AFAIK it is recommended to use gauges instead of direct arithmetic on counters: prometheus/client_golang#101 (comment). Having a reliable source for example for predict_linear is necessary. Does anything change regarding counters recently, are they reliable for those usecasese?

histogram not being optional? Why?

just curious why creating new abstraction for collectorOpts?

You are right, backward compatibility and configurability should be kept. On another hand, current public API seems to be a little bit inconsistent. Prometheus authors got into the same conclusion when it comes to HTTP monitoring. By following a similar approach like in promhttp new public API could be kept backwards compatible and allow an even higher level of configurability. It could be implemented like:

func Dialer(*prometheus.CounterVec) func(string, time.Duration) (net.Conn, error) {}

type InFlightStatsHandler 	struct {
    stats.Handler
}

type Subsystem int

var (
    Server Subsystem = 1
    Client Subsystem = 2
)

// NewInFlightGaugeVecV1 exists for backward compatibility.
func NewInFlightGaugeVecV1(Subsystem) *prometheus.GaugeVec {}

func NewInFlightGaugeVec(Subsystem) *prometheus.GaugeVec {}

// NewInFlightStatsHandler ...
// The GaugeVec must have zero, one, two, three or four non-const non-curried labels. 
// For those, the only allowed label names are "fail_fast", "handler", "service" and "user_agent". 
func NewInFlightStatsHandler(Subsystem, *prometheus.GaugeVec) *InFlightStatsHandler {}

type StatsHandler struct {
    stats.Handler
    prometheus.Collector
}

// NewStatsHandler allows to pass various number of handlers.
func NewStatsHandler(...stats.Handler) *StatsHandler {}

This draft presents a low-level public API. Plenty of handful function could be implemented on the top of it to make the library easier to use if custom configuration or backward compatibility is not required.

Not every metric can be implemented using stats.Handler interface. A similar approach would have to be followed for interceptors.

Example:

ssh := NewStatsHandler(
    NewConnectionsStatsHandler(Server, NewConnectionsGaugeVec(Server)),
    NewInFlightStatsHandler(Server, NewInFlightGaugeVec(Server)),
)
csh := NewStatsHandler(
    NewConnectionsStatsHandler(Client, NewConnectionsGaugeVec(Client)),
    NewInFlightStatsHandler(Client, NewInFlightGaugeVec(Client)),
)

prometheus.DefaultRegisterer.MustRegister(ssh)
prometheus.DefaultRegisterer.MustRegister(csh)

brancz · 2018-10-29T10:46:39Z

I think cutting a major release is exactly for the ability to break the API for positive changes. I think the changes to make the structs actually comply with prometheus.Collector sounds like a good idea, but histograms should be able to be turned off, simply because of possibilities of cardinality explosions.

I think splitting the in-flight gauge into two counters for started and finished counters would might more sense.

googlebot · 2018-11-15T11:00:34Z

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

googlebot · 2018-11-15T16:17:34Z

CLAs look good, thanks!

pacoguzman · 2019-10-11T06:35:40Z

We're starting using grpc_prometheus and would be great to have this already. Any blockers?

brancz

Sorry for taking so long until having a look at this again.

I'm not against a cleanup per se, but some things like

msg_sent_total -> sent_messages_total

seem like they're mostly style, and I wouldn't break that if not for a better reason. I think request/response and duration metrics should follow the best practices and we shouldn't have global state at all in my opinion. I don't think I agree with the "shared struct" approach, as the amount of switch/case statements show I think we're better off with treating client and server separately.

brancz · 2019-10-14T09:08:37Z

draft.go

+
+var (
+	Server Subsystem = 1
+	Client Subsystem = 2


this seems unnecessary, why not do distinct functions/structs for server/client here?

bwplotka · 2019-10-14T09:14:48Z

+1 for @brancz

What exactly would be nice @pacoguzman? Just different metric names? Or anything in particular?

For metrics name, I think it would be nice to have, but it feels like a lot of effort for users to switch to new metrics for not really high value (IMO). Even with v2, it is really hard for users to switch all the monitoring
to different metric names - at once. So I would be careful.

I can agree with other improvements like code API etc - we might be able to improve things here & there. That we can definitely focus on the straightaway. But knowing what you are looking for @pacoguzman would be nice (:

AFAIK it is recommended to use gauges instead of direct arithmetic on counters: prometheus/client_golang#101 (comment). Having a reliable source for example for predict_linear is necessary. Does anything change regarding counters recently, are they reliable for those usecasese?

Well @piotrkowalczuk , what I meant that you cannot replace really started and handled metrics with this inflight. Now if we add just inflight on top this will mean many ways of achieving the same goal, which is not ideal, but if predict_linear is important to users, we can definitely discuss adding it (:

I would say let's split the work and discuss those separately. Given the limited time we have, I would focus on what's heavily needed by users of this library first. (:

piotrkowalczuk · 2019-10-14T11:00:22Z

Thanks, @brancz @bwplotka for having a look! This PR is slightly outdated. My current state of work/design choices can be found here: https://godoc.org/github.com/piotrkowalczuk/promgrpc/v4

I'll leave a few notes before this PR is closed:

Well @piotrkowalczuk , what I meant that you cannot replace really started and handled metrics with this inflight. Now if we add just inflight on top this will mean many ways of achieving the same goal, which is not ideal, but if predict_linear is important to users, we can definitely discuss adding it (:

inflight is addition, started and handled are replaced by received_total and sent_total.

For metrics name, I think it would be nice to have, but it feels like a lot of effort for users to switch to new metrics for not really high value (IMO). Even with v2, it is really hard for users to switch all the monitoring
to different metric names - at once. So I would be careful.

IMHO the goal should be to introduce consistent naming with the ability to use legacy one. e.g. by having additional constructors.

piotrkowalczuk force-pushed the draft-v2.0.0-change-1 branch from d784bec to ce2a716 Compare November 15, 2018 11:00

piotrkowalczuk force-pushed the draft-v2.0.0-change-1 branch from 381aceb to 16eccd4 Compare November 15, 2018 16:13

draft v2.0.0 (separate stats handlers) - initial commit

28015bb

piotrkowalczuk force-pushed the draft-v2.0.0-change-1 branch from 16eccd4 to 28015bb Compare November 15, 2018 16:17

draft v2.0.0 (separate stats handlers) - go vet, megacheck fixes

c4453c2

piotrkowalczuk changed the title ~~promgrpc merge - change 1~~ Version v2.0.0 - draft Nov 16, 2018

brancz reviewed Oct 14, 2019

View reviewed changes

piotrkowalczuk closed this Oct 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version v2.0.0 - draft #63

Version v2.0.0 - draft #63

piotrkowalczuk commented Oct 21, 2018

codecov-io commented Oct 24, 2018

bwplotka commented Oct 24, 2018 •

edited

piotrkowalczuk commented Oct 27, 2018 •

edited

brancz commented Oct 29, 2018 •

edited

googlebot commented Nov 15, 2018

googlebot commented Nov 15, 2018

pacoguzman commented Oct 11, 2019

brancz left a comment •

edited

brancz Oct 14, 2019

bwplotka commented Oct 14, 2019

piotrkowalczuk commented Oct 14, 2019

Version v2.0.0 - draft #63

Version v2.0.0 - draft #63

Conversation

piotrkowalczuk commented Oct 21, 2018

codecov-io commented Oct 24, 2018

Codecov Report

bwplotka commented Oct 24, 2018 • edited

piotrkowalczuk commented Oct 27, 2018 • edited

brancz commented Oct 29, 2018 • edited

googlebot commented Nov 15, 2018

googlebot commented Nov 15, 2018

pacoguzman commented Oct 11, 2019

brancz left a comment • edited

Choose a reason for hiding this comment

brancz Oct 14, 2019

Choose a reason for hiding this comment

bwplotka commented Oct 14, 2019

piotrkowalczuk commented Oct 14, 2019

bwplotka commented Oct 24, 2018 •

edited

piotrkowalczuk commented Oct 27, 2018 •

edited

brancz commented Oct 29, 2018 •

edited

brancz left a comment •

edited