Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus exporter sometimes produces malformed data #5515

Closed
hez2010 opened this issue Apr 8, 2024 · 4 comments · Fixed by #5623
Closed

Prometheus exporter sometimes produces malformed data #5515

hez2010 opened this issue Apr 8, 2024 · 4 comments · Fixed by #5623
Labels
bug Something isn't working pkg:OpenTelemetry.Exporter.Prometheus.AspNetCore Issues related to OpenTelemetry.Exporter.Prometheus.AspNetCore NuGet package

Comments

@hez2010
Copy link

hez2010 commented Apr 8, 2024

Bug Report

List of all OpenTelemetry NuGet packages and version that you are using (e.g. OpenTelemetry 1.0.2):

    <PackageVersion Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.8.0" />
    <PackageVersion Include="OpenTelemetry.Exporter.Prometheus.AspNetCore" Version="1.8.0-rc.1" />
    <PackageVersion Include="OpenTelemetry.Extensions.Hosting" Version="1.8.0" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.8.0" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.EntityFrameworkCore" Version="1.0.0-beta.11" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.Http" Version="1.8.0" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.Process" Version="0.5.0-beta.5" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.Runtime" Version="1.8.0" />
    <PackageVersion Include="OpenTelemetry.Instrumentation.StackExchangeRedis" Version="1.0.0-rc9.14" />

Runtime version: net8.0

Symptom

Sometimes the exporter produces a malformed line:

dns_lookup_duration_seconds_bucket{otel_scope_name="System.Net.NameRe# TYPE otel_scope_info info

and causes Prometheus to fail:

expected label value, got "\"System.Net.NameRe# TYPE otel_scope_info info\n" ("INVALID") while parsing: "dns_lookup_duration_seconds_bucket{otel_scope_name=\"System.Net.NameRe# TYPE otel_scope_info info\n"

What is the expected behavior?

There should not be malformed data in the metrics.

# TYPE target info
# HELP target Target metadata
target_info{service_name="foo",service_instance_id="7272cdc6-1b69-4ee3-9bed-4005847686d5",telemetry_sdk_name="opentelemetry",telemetry_sdk_language="dotnet",telemetry_sdk_version="1.8.0"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="System.Net.NameResolution"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="OpenTelemetry.Instrumentation.Runtime"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="OpenTelemetry.Instrumentation.Process"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="Microsoft.AspNetCore.Server.Kestrel"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="Microsoft.AspNetCore.Hosting"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="System.Net.Http"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="Microsoft.AspNetCore.RateLimiting"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="Microsoft.AspNetCore.Routing"} 1
# TYPE dns_lookup_duration_seconds histogram
# UNIT dns_lookup_duration_seconds seconds
# HELP dns_lookup_duration_seconds Measures the time taken to perform a DNS lookup.
dns_lookup_duration_seconds_bucket{otel_scope_name="System.Net.NameResolution",dns_question_name="localhost",le="0.005"} 7 1712551059.936
...

What is the actual behavior?

# TYPE dns_lookup_duration_seconds histogram
# UNIT dns_lookup_duration_seconds seconds
# HELP dns_lookup_duration_seconds Measures the time taken to perform a DNS lookup.
dns_lookup_duration_seconds_bucket{otel_scope_name="System.Net.NameRe# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="System.Net.NameResolution"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="OpenTelemetry.Instrumentation.Runtime"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="OpenTelemetry.Instrumentation.Process"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="Microsoft.AspNetCore.Server.Kestrel"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="Microsoft.AspNetCore.Hosting"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="System.Net.Http"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="Microsoft.AspNetCore.RateLimiting"} 1
# TYPE otel_scope_info info
# HELP otel_scope_info Scope metadata
otel_scope_info{otel_scope_name="Microsoft.AspNetCore.Routing"} 1
# TYPE dns_lookup_duration_seconds histogram
# UNIT dns_lookup_duration_seconds seconds
# HELP dns_lookup_duration_seconds Measures the time taken to perform a DNS lookup.
dns_lookup_duration_seconds_bucket{otel_scope_name="System.Net.NameResolution",dns_question_name="localhost",le="0.005"} 10 1712549381.922
...

Reproduce

Simply adding some instrumentations.

            var otel = services.AddOpenTelemetry();

            otel.ConfigureResource(resource => resource.AddService("foo"));

            otel.WithMetrics(metrics =>
            {
                metrics.AddAspNetCoreInstrumentation();
                metrics.AddHttpClientInstrumentation();
                metrics.AddRuntimeInstrumentation();
                metrics.AddProcessInstrumentation();

                metrics.AddPrometheusExporter(options =>
                {
                    options.DisableTotalNameSuffixForCounters = true;
                });
            });

            otel.WithTracing(tracing =>
            {
                tracing.AddAspNetCoreInstrumentation();
                tracing.AddHttpClientInstrumentation();
                tracing.AddEntityFrameworkCoreInstrumentation();
                tracing.AddRedisInstrumentation();
            });

            otel.UseOtlpExporter(...);

            // ...
            // var app = builder.Build();

            app.UseOpenTelemetryPrometheusScrapingEndpoint();

Additional Context

It looks like the buffer is not cleared correctly and results in some previous data remaining in the buffer:

# TYPE dns_lookup_duration_seconds histogram
# UNIT dns_lookup_duration_seconds seconds
# HELP dns_lookup_duration_seconds Measures the time taken to perform a DNS lookup.
dns_lookup_duration_seconds_bucket{otel_scope_name="System.Net.NameRe
@hez2010 hez2010 added the bug Something isn't working label Apr 8, 2024
@hez2010 hez2010 changed the title Prometheus exported sometimes produces malformed data Prometheus exporter sometimes produces malformed data Apr 8, 2024
@hez2010
Copy link
Author

hez2010 commented Apr 8, 2024

It seems the target info is being overwritten unexpectedly.

@Ruud-cb
Copy link

Ruud-cb commented May 6, 2024

@hez2010 I've seen your MR's, looking forward to the fix as well. Out of curiosity: Besides the OP's issue the format also seems to be changed to add things like process_runtime_dotnet_gc_collections_count_total**{otel_scope_name="OpenTelemetry.Instrumentation.Runtime",otel_scope_version="1.8.0",generation="gen2"}** 3 1714993562978

Is that text between brackets expected by Prometheus ?

Perhaps you can decide if this has something to do with it as well:
ts=2024-05-06T09:57:43.000Z caller=scrape.go:1415 level=debug component="scrape manager" scrape_pool=hrgt-t-api-metrics target=http://myserveraddress.svc.cluster.local:9090/metrics msg="Append failed" err="unit \"seconds\" not a suffix of metric \"process_cpu_time_seconds_total\""

@samsp-msft
Copy link

I think this issue is maybe what is causing open-telemetry/opentelemetry-dotnet-contrib#1617?

@reyang
Copy link
Member

reyang commented May 13, 2024

#5517 is trying to fix the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg:OpenTelemetry.Exporter.Prometheus.AspNetCore Issues related to OpenTelemetry.Exporter.Prometheus.AspNetCore NuGet package
Projects
None yet
5 participants