Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry traces misused by transport/http #1573

Open
adg opened this issue Jun 7, 2022 · 5 comments
Open

OpenTelemetry traces misused by transport/http #1573

adg opened this issue Jun 7, 2022 · 5 comments
Assignees
Labels
needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@adg
Copy link
Contributor

adg commented Jun 7, 2022

The transport/http package by default wraps the HTTP transport with an OpenTelemetry ochttp.Transport. That transport wrapper will record HTTP traces for a subset of all requests if OpenCensus tracing is enabled.

However, it is misconfigured. The Transport type has this field, which is unset google-api-go-client:

	// NameFromRequest holds the function to use for generating the span name
	// from the information found in the outgoing HTTP Request. By default the
	// name equals the URL Path.
	FormatSpanName func(*http.Request) string

The default is for the transport to name each span after the request URL path. In practice, this means a proliferation of differently named spans. For example, if you use the Google Cloud Storage library, you will end up with a span for every file you access. This is not how spans are supposed to be named - they should be named after a function or API endpoint, perhaps including some small, finite set of user-specified options.

This becomes pathological if you actually record the traces somewhere. For instance, if you have the OpenCensus zpages debug endpoints enabled (common in production systems), a sample of those traces will be stored in a local store. While old traces for a given span are purged, the spans themselves are never purged, and so a running process will accumulate traces indefinitely for every HTTP request path made by google-api-go-client. Loading the OpenCensus tracez page in such cases is pretty funny: a production service of mine had accumulated several gigabytes of traces, and its tracez endpoint served so much HTML that it crashed my browser.

I think the fix here is to specify a FormatSpanName function when setting up the ochttp.Transport. I'm not sure how exactly the spans should be named - probably in some service-specific way - but an immediate remedy would be to give all requests the same span name (google-api-go-client?).

@adg adg added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Jun 7, 2022
@adg
Copy link
Contributor Author

adg commented Jun 7, 2022

A related issue: googleapis/google-cloud-go#1573

@adg
Copy link
Contributor Author

adg commented Jun 7, 2022

Also related: census-instrumentation/opencensus-go#1180

@codyoss codyoss added the status: investigating The issue is under investigation, which is determined to be non-trivial. label Jun 7, 2022
@codyoss
Copy link
Member

codyoss commented Jun 9, 2022

This is a bit tricky... I think we could easily do as you suggest, but then there could be no aggregations on method level calls. In general most of our libraries are not nicely producing their own spans, so we rely on the naming to capture the context of the call. Making this change would force nearly all API calls to have the same span name. Do you think think is much better than many span names?(maybe it is)

For storage, on the reader side we do create custom spans. It seems to me that we should also do the same on the writer side, that would alleviate the example you provided at least.

If we did aggregate these I suppose people could always examine the http.path attribute for reporting, but I am not sure that is much better than what is there today in this case. Thoughts?

@codyoss codyoss removed the status: investigating The issue is under investigation, which is determined to be non-trivial. label Jun 9, 2022
@apesternikov
Copy link

workaround that works for me:

import (
         htransport "google.golang.org/api/transport/http"
)
httpClient, _, err := htransport.NewClient(ctx, option.WithTelemetryDisabled())
	if err != nil {
		return nil, err
	}
	httpClient.Transport = &ochttp.Transport{
		Base: httpClient.Transport,
		Propagation:    &propagation.HTTPFormat{},
		FormatSpanName: tracingutils.FormatHttpSpanName,
	}
	client, err := storage.NewClient(ctx, option.WithHTTPClient(httpClient))

in tracingutils package:

func FormatHttpSpanName(req *http.Request) string {
	path := req.URL.Path
	if len(path) < 1 {
		return path
	}
	if path[0] == '/' {
		path = path[1:]
	}
	switch req.Host {
	case "storage.googleapis.com":
		path, _, _ = strings.Cut(path, "/")
	default:
	}
	return req.Host + "/" + path
}

@codyoss codyoss added the needs more info This issue needs more information from the customer to proceed. label Jun 27, 2022
@adg
Copy link
Contributor Author

adg commented Jun 27, 2022

@codyoss

Making this change would force nearly all API calls to have the same span name. Do you think think is much better than many span names?(maybe it is)

I think it would be better for the Cloud APIs that have user-generated content in the paths. In our case we saw a memory leak of several GB in one of our production services because of this default. So perhaps this suggestion of yours is the way to go:

For storage, on the reader side we do create custom spans. It seems to me that we should also do the same on the writer side, that would alleviate the example you provided at least.

And perhaps also worth inspecting the other Cloud APIs beyond Storage to see whether they have user-generated content in the Paths. Hopefully it's just Storage writes that are a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants