metrics: use contexts #9949

jrockway · 2024-04-12T17:23:44Z

This avoids a dpanic in the grpc client logger while collecting internal metrics. It also puts a limit on how long metrics will be collected for.

I don't know if we even get these metrics anymore, but at least pachd doesn't die when PACHYDERM_DEVELOPMENT_LOGGER=1.

robert-uhl · 2024-04-12T17:30:36Z

src/internal/metrics/metrics.go

@@ -146,27 +146,31 @@ func FinishReportAndFlushUserAction(action string, err error, start time.Time) f
 	return wait
 }

-func (r *Reporter) reportClusterMetrics(ctx context.Context) {
+func (r *Reporter) reportClusterMetrics(rctx context.Context) {


There’s no need for *r*ctx here.

robert-uhl · 2024-04-12T17:34:57Z

src/internal/metrics/metrics.go

 			return
 		case <-ticker.C:
 		}
 		metrics := &Metrics{}
-		r.internalMetrics(metrics)
-		externalMetrics(r.env.GetKubeClient(), metrics) //nolint:errcheck
+		ctx, c := context.WithTimeout(rctx, reportingInterval/2)


If all this is put into the ticker case of the select (which I think makes more sense anyway), the the scoping here is more obvious.

robert-uhl · 2024-04-12T17:44:55Z

src/internal/metrics/metrics.go

-		r.internalMetrics(metrics)
-		externalMetrics(r.env.GetKubeClient(), metrics) //nolint:errcheck
+		ctx, c := context.WithTimeout(rctx, reportingInterval/2)
+		r.internalMetrics(ctx, metrics)


Why dedicate half the time to collecting and half to reporting, instead of trying to do both in total? time.Ticker “will adjust the time interval or drop ticks to make up for slow receivers.”

var ( metrics = new(Metrics) ctx, cancel = context.WithTimeout(ctx, reportingInterval) ) defer cancel() r.internalMetrics(ctx, metrics) externalMetrics(ctx, r.env.GetKubeClient(), metrics) //nolint:errcheck metrics.ClusterId = r.clusterID metrics.PodId = uuid.NewWithoutDashes() metrics.Version = version.PrettyPrintVersion(version.Version) r.router.reportClusterMetricsToSegment(metrics)

Edit: deleted mistake.

robert-uhl · 2024-04-12T17:50:02Z

src/internal/metrics/metrics.go

@@ -248,12 +252,9 @@ func inputMetrics(input *pps.Input, metrics *Metrics) {
 	}
 }

-func (r *Reporter) internalMetrics(metrics *Metrics) {
+func (r *Reporter) internalMetrics(ctx context.Context, metrics *Metrics) {


There is a call to pctx.TODO in here as well (and a bonus typo!).

Also, the TTL and context lifetime differential below is a bit surprising.

metrics: use contexts

7aa100f

jrockway requested a review from robert-uhl April 12, 2024 17:24

robert-uhl requested changes Apr 12, 2024

View reviewed changes

Merge branch 'master' into jonathan/contexts-for-metrics

4daea6f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics: use contexts #9949

metrics: use contexts #9949

jrockway commented Apr 12, 2024

robert-uhl Apr 12, 2024

robert-uhl Apr 12, 2024

robert-uhl Apr 12, 2024 •

edited

robert-uhl Apr 12, 2024

robert-uhl Apr 12, 2024

metrics: use contexts #9949

Are you sure you want to change the base?

metrics: use contexts #9949

Conversation

jrockway commented Apr 12, 2024

robert-uhl Apr 12, 2024

Choose a reason for hiding this comment

robert-uhl Apr 12, 2024

Choose a reason for hiding this comment

robert-uhl Apr 12, 2024 • edited

Choose a reason for hiding this comment

robert-uhl Apr 12, 2024

Choose a reason for hiding this comment

robert-uhl Apr 12, 2024

Choose a reason for hiding this comment

robert-uhl Apr 12, 2024 •

edited