Skip to content
This repository has been archived by the owner on Apr 18, 2023. It is now read-only.

Version v2.0.0 - draft #63

Conversation

piotrkowalczuk
Copy link

This PR is related to #60, it introduces:

  • No global state (client metrics).
  • Shared monitor struct.
  • Changed metric names to be more consistent and intuitive. breaking
    • counters
      • started_total -> requests_total
      • handled_total -> responses_total
      • msg_received_total -> received_messages_total
      • msg_sent_total -> sent_messages_total
    • gauges (new)
      • in_flight_requests // partially
      • connections // partially
    • histograms
      • handling_seconds->request_duration_histogram_seconds
  • Request duration metric (histogram) is not optional anymore. breaking
  • clientReporter API become private
  • CounterOption and HistogramOption replaced by CollectorOption breaking
    • WithConstLabels
    • WithNamespace
    • WithSubsystem
    • WithBuckets

TODO:
[ ] if accepted, apply similar changes to ServerMetrics

@codecov-io
Copy link

Codecov Report

Merging #63 into draft-v2.0.0 will decrease coverage by 23.25%.
The diff coverage is 68.58%.

Impacted file tree graph

@@                Coverage Diff                @@
##           draft-v2.0.0      #63       +/-   ##
=================================================
- Coverage         78.59%   55.34%   -23.26%     
=================================================
  Files                 8        9        +1     
  Lines               271      374      +103     
=================================================
- Hits                213      207        -6     
- Misses               49      157      +108     
- Partials              9       10        +1
Impacted Files Coverage Δ
client.go 100% <ø> (ø) ⬆️
server.go 71.42% <ø> (-28.58%) ⬇️
server_metrics.go 34.78% <100%> (-45.42%) ⬇️
client_reporter.go 100% <100%> (ø) ⬆️
metric_options.go 13.04% <13.63%> (-3.63%) ⬇️
monitor.go 65% <65%> (ø)
client_metrics.go 83.05% <90.62%> (+8.85%) ⬆️
server_reporter.go 0% <0%> (-100%) ⬇️
util.go 30.76% <0%> (-38.47%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93bf462...d784bec. Read the comment docs.

@bwplotka
Copy link
Collaborator

bwplotka commented Oct 24, 2018

Thanks for this, but why are you changing all in same PR? ((: Basically would be nice to discuss each of change separately. Some of them sounds good from first glance, some are not that obvious:

  • histogram not being optional? Why?
  • just curious why creating new abstraction for collectorOpts?
  • metric names changes. This is actually the trickiest change, because this is part of API has huge consequences for users. Some of us are having huge number of dashboards and alerts and rules that depend on metric name. Even with explicit new version and verbose changelog this will be challenge for users and major reason NOT to update. So if we are changing those we need to make sure there is a value that justifies the problems that it will cause. (: Not sure if consistency is a good reason here.
    The inflight could be also problematic to use. This could be potentially changing a lot (faster than scrape interval) so we could not see the spikes. That's why sum(rate(grpc_server_started_total[5m])) by (job) - sum(rate(grpc_server_handled_total[5m])) by (job)) might be a better overview of in flight request

@piotrkowalczuk
Copy link
Author

piotrkowalczuk commented Oct 27, 2018

@bwplotka Yes PR is intentionally way bigger than we agreed on. I would like to explore what are necessary public API changes.

The inflight could be also problematic to use. This could be potentially changing a lot (faster than scrape interval) so we could not see the spikes. That's why sum(rate(grpc_server_started_total[5m])) by (job) - sum(rate(grpc_server_handled_total[5m])) by (job)) might be a better overview of in-flight request

AFAIK it is recommended to use gauges instead of direct arithmetic on counters: prometheus/client_golang#101 (comment). Having a reliable source for example for predict_linear is necessary. Does anything change regarding counters recently, are they reliable for those usecasese?

  • histogram not being optional? Why?
  • just curious why creating new abstraction for collectorOpts?

You are right, backward compatibility and configurability should be kept. On another hand, current public API seems to be a little bit inconsistent. Prometheus authors got into the same conclusion when it comes to HTTP monitoring. By following a similar approach like in promhttp new public API could be kept backwards compatible and allow an even higher level of configurability. It could be implemented like:

func Dialer(*prometheus.CounterVec) func(string, time.Duration) (net.Conn, error) {}

type InFlightStatsHandler 	struct {
    stats.Handler
}

type Subsystem int

var (
    Server Subsystem = 1
    Client Subsystem = 2
)

// NewInFlightGaugeVecV1 exists for backward compatibility.
func NewInFlightGaugeVecV1(Subsystem) *prometheus.GaugeVec {}

func NewInFlightGaugeVec(Subsystem) *prometheus.GaugeVec {}

// NewInFlightStatsHandler ...
// The GaugeVec must have zero, one, two, three or four non-const non-curried labels. 
// For those, the only allowed label names are "fail_fast", "handler", "service" and "user_agent". 
func NewInFlightStatsHandler(Subsystem, *prometheus.GaugeVec) *InFlightStatsHandler {}

type StatsHandler struct {
    stats.Handler
    prometheus.Collector
}

// NewStatsHandler allows to pass various number of handlers.
func NewStatsHandler(...stats.Handler) *StatsHandler {}

This draft presents a low-level public API. Plenty of handful function could be implemented on the top of it to make the library easier to use if custom configuration or backward compatibility is not required.

Not every metric can be implemented using stats.Handler interface. A similar approach would have to be followed for interceptors.

Example:

ssh := NewStatsHandler(
    NewConnectionsStatsHandler(Server, NewConnectionsGaugeVec(Server)),
    NewInFlightStatsHandler(Server, NewInFlightGaugeVec(Server)),
)
csh := NewStatsHandler(
    NewConnectionsStatsHandler(Client, NewConnectionsGaugeVec(Client)),
    NewInFlightStatsHandler(Client, NewInFlightGaugeVec(Client)),
)

prometheus.DefaultRegisterer.MustRegister(ssh)
prometheus.DefaultRegisterer.MustRegister(csh)

@brancz
Copy link
Collaborator

brancz commented Oct 29, 2018

I think cutting a major release is exactly for the ability to break the API for positive changes. I think the changes to make the structs actually comply with prometheus.Collector sounds like a good idea, but histograms should be able to be turned off, simply because of possibilities of cardinality explosions.

I think splitting the in-flight gauge into two counters for started and finished counters would might more sense.

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

@googlebot
Copy link

CLAs look good, thanks!

@piotrkowalczuk piotrkowalczuk changed the title promgrpc merge - change 1 Version v2.0.0 - draft Nov 16, 2018
@pacoguzman
Copy link

We're starting using grpc_prometheus and would be great to have this already. Any blockers?

Copy link
Collaborator

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long until having a look at this again.

I'm not against a cleanup per se, but some things like

msg_sent_total -> sent_messages_total

seem like they're mostly style, and I wouldn't break that if not for a better reason. I think request/response and duration metrics should follow the best practices and we shouldn't have global state at all in my opinion. I don't think I agree with the "shared struct" approach, as the amount of switch/case statements show I think we're better off with treating client and server separately.


var (
Server Subsystem = 1
Client Subsystem = 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems unnecessary, why not do distinct functions/structs for server/client here?

@bwplotka
Copy link
Collaborator

+1 for @brancz

What exactly would be nice @pacoguzman? Just different metric names? Or anything in particular?

For metrics name, I think it would be nice to have, but it feels like a lot of effort for users to switch to new metrics for not really high value (IMO). Even with v2, it is really hard for users to switch all the monitoring
to different metric names - at once. So I would be careful.

I can agree with other improvements like code API etc - we might be able to improve things here & there. That we can definitely focus on the straightaway. But knowing what you are looking for @pacoguzman would be nice (:

AFAIK it is recommended to use gauges instead of direct arithmetic on counters: prometheus/client_golang#101 (comment). Having a reliable source for example for predict_linear is necessary. Does anything change regarding counters recently, are they reliable for those usecasese?

Well @piotrkowalczuk , what I meant that you cannot replace really started and handled metrics with this inflight. Now if we add just inflight on top this will mean many ways of achieving the same goal, which is not ideal, but if predict_linear is important to users, we can definitely discuss adding it (:

I would say let's split the work and discuss those separately. Given the limited time we have, I would focus on what's heavily needed by users of this library first. (:

@piotrkowalczuk
Copy link
Author

Thanks, @brancz @bwplotka for having a look! This PR is slightly outdated. My current state of work/design choices can be found here: https://godoc.org/github.com/piotrkowalczuk/promgrpc/v4

I'll leave a few notes before this PR is closed:

Well @piotrkowalczuk , what I meant that you cannot replace really started and handled metrics with this inflight. Now if we add just inflight on top this will mean many ways of achieving the same goal, which is not ideal, but if predict_linear is important to users, we can definitely discuss adding it (:

inflight is addition, started and handled are replaced by received_total and sent_total.

For metrics name, I think it would be nice to have, but it feels like a lot of effort for users to switch to new metrics for not really high value (IMO). Even with v2, it is really hard for users to switch all the monitoring
to different metric names - at once. So I would be careful.

IMHO the goal should be to introduce consistent naming with the ability to use legacy one. e.g. by having additional constructors.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants