Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: use contexts #9949

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

metrics: use contexts #9949

wants to merge 2 commits into from

Conversation

jrockway
Copy link
Member

This avoids a dpanic in the grpc client logger while collecting internal metrics. It also puts a limit on how long metrics will be collected for.

I don't know if we even get these metrics anymore, but at least pachd doesn't die when PACHYDERM_DEVELOPMENT_LOGGER=1.

@@ -146,27 +146,31 @@ func FinishReportAndFlushUserAction(action string, err error, start time.Time) f
return wait
}

func (r *Reporter) reportClusterMetrics(ctx context.Context) {
func (r *Reporter) reportClusterMetrics(rctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no need for *r*ctx here.

return
case <-ticker.C:
}
metrics := &Metrics{}
r.internalMetrics(metrics)
externalMetrics(r.env.GetKubeClient(), metrics) //nolint:errcheck
ctx, c := context.WithTimeout(rctx, reportingInterval/2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all this is put into the ticker case of the select (which I think makes more sense anyway), the the scoping here is more obvious.

r.internalMetrics(metrics)
externalMetrics(r.env.GetKubeClient(), metrics) //nolint:errcheck
ctx, c := context.WithTimeout(rctx, reportingInterval/2)
r.internalMetrics(ctx, metrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why dedicate half the time to collecting and half to reporting, instead of trying to do both in total? time.Ticker “will adjust the time interval or drop ticks to make up for slow receivers.”

var (
  metrics = new(Metrics)
  ctx, cancel = context.WithTimeout(ctx, reportingInterval)
)
defer cancel()
r.internalMetrics(ctx, metrics)
externalMetrics(ctx, r.env.GetKubeClient(), metrics) //nolint:errcheck
metrics.ClusterId = r.clusterID
metrics.PodId = uuid.NewWithoutDashes()
metrics.Version = version.PrettyPrintVersion(version.Version)
r.router.reportClusterMetricsToSegment(metrics)

Edit: deleted mistake.

@@ -248,12 +252,9 @@ func inputMetrics(input *pps.Input, metrics *Metrics) {
}
}

func (r *Reporter) internalMetrics(metrics *Metrics) {
func (r *Reporter) internalMetrics(ctx context.Context, metrics *Metrics) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a call to pctx.TODO in here as well (and a bonus typo!).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the TTL and context lifetime differential below is a bit surprising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants