Metrics reporting failing after update to v8.x #460

daverant · 2023-12-06T09:54:15Z

We have a combination of metrics, some using Prometheus.Metrics collectors, and some using System.Diagnostics.Metrics instruments.

We're running .NET 7, and wanting to update to .NET 8, but need to take this bugfix because we're seeing that error crop up a lot when trying to run in .NET 8:

System.FormatException: Input string was not in a correct format. Failure to parse near offset 2. Expected an ASCII digit.
at System.Text.StringBuilder.AppendFormatHelper(IFormatProvider provider, String format, ReadOnlySpan`1 args)
at System.Text.StringBuilder.AppendFormat(String format, Object[] args)
at Prometheus.MeterAdapter.TranslateInstrumentDescriptionToPrometheusHelp(Instrument instrument)
at Prometheus.MeterAdapter.OnInstrumentPublished(Instrument instrument, MeterListener listener)
at System.Diagnostics.Metrics.Instrument.Publish()
at System.Diagnostics.Metrics.Meter.GetOrCreateInstrument[T](Type instrumentType, String name, String unit, String description, IEnumerable`1 tags, Func`1 instrumentCreator)
at System.Net.Http.Metrics.MetricsHandler..ctor(HttpMessageHandler innerHandler, IMeterFactory meterFactory, Meter& meter)

We initialise everything using KestrelMetricServer:

DotNetRuntimeStatsBuilder.Default().StartCollecting();
using var server = new KestrelMetricServer(hostname, port);
server.Start();

When upgrading from v7.0.0 to v8.x of prometheus-net, we start to see strange behaviour with various collectors and instruments:

MassTransit metrics, using Prometheus.Metrics collectors - running v7 until 17:05, then running v8.1.1 after 17:05 where they fail to increment and are then dropped.:

Our own System.Diagnostics instruments - running v7 until 17:05, then running v8.1.1 after 17:05 where they fail to increment and are then dropped:

This is unfortunately blocking us from taking .NET 8 because we need to take the above bugfix - any help or suggestions in rooting out a potential cause would be appreciated.

The text was updated successfully, but these errors were encountered:

sandersaares · 2023-12-06T18:28:00Z

Can you provide a minimal sample app to reproduce the problem? Are there any exceptions visible? If you attach a debugger, do you see any exceptions listed (e.g. in the Visual Studio "Output" panel)?

daverant · 2023-12-06T23:43:58Z

Can you provide a minimal sample app to reproduce the problem? Are there any exceptions visible? If you attach a debugger, do you see any exceptions listed (e.g. in the Visual Studio "Output" panel)?

Thanks @sandersaares, yes I'm currently figuring out a minimal repro for this, will share when I have it 👍

daverant · 2023-12-13T17:29:04Z

Thought I'd just add some info in trying to repro this. After upgrading to 8.x it looks like we're seeing a decrease in the overall number of metrics being scraped from prometheus endpoints, and that metrics appear to be dropped from those endpoints over time. In the graph below you can see when we deploy with 8.x in version 2.0.26693.1 and when we deploy back to 7.0.0 in 2.0.26698.1. It's not selective for instruments shipped with prometheus-net instruments or system.diagnostics instruments.

I stood up a basic app to try and repro (most recent versions are in v7 and v8 branches), and there is some potentially interesting behaviour differences between 7.0.0 and 8.x in terms of total metrics shipped over time. Both pods are running identical code, other than having a different prometheus-net dependency. I think I'd expect those two graphs to be more closely aligned? I've added an event source filter to try and maintain parity on that front between the two prometheus-net versions.

I'm going to see whether the metrics server itself is throwing any errors at all next, but just need to get ducks in a row to attach a remote debugger.

sandersaares · 2023-12-18T20:11:36Z

The app you shared includes some metrics from the .NET Meters API, based on one of the prometheus-net samples. This sample code emits different timeseries over time, so after a while the old ones will expire and be dropped. This might explain the nature of the fluctuation in the last graph you shared.

This expiration of metrics is not super precise - the lifetime is just a minimum lifetime guarantee (5 minutes by default), with cleanup happening at an unspecified point after that. The specifics of this logic have changed in recent versions, so some difference in when exactly garbage is cleaned up is not surprising.

I was not able to detect any other metrics going away on a random sampling over 30 minutes. Looking forward with interest to more details!

sandersaares · 2023-12-18T23:53:17Z

Could it be that your metrics are not being updated at a fast enough interval to keep them alive? Although, even in this case they should come back as soon as the next value is recorded - your original screenshot shows the timeseries disappearing for good. Still, perhaps an angle to explore?

daverant · 2024-01-11T08:54:57Z

Hi @sandersaares thanks for the input and apologies for a slow reply, took a chunk of time off over the holidays! I'll read through this and be taking a fresh look at it

daverant · 2024-01-11T08:58:45Z

Could it be that your metrics are not being updated at a fast enough interval to keep them alive? Although, even in this case they should come back as soon as the next value is recorded - your original screenshot shows the timeseries disappearing for good. Still, perhaps an angle to explore?

The particularly interesting behaviour as you say is some metrics get dropped absolutely, where some of those metric values appear to freeze and no longer increment or decrement despite those code paths being hit. I think that's a thread worth tugging at too - why does a metric stop increasing in value despite the code path being hit repeatedly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics reporting failing after update to v8.x #460

Metrics reporting failing after update to v8.x #460

daverant commented Dec 6, 2023

sandersaares commented Dec 6, 2023

daverant commented Dec 6, 2023 •

edited

daverant commented Dec 13, 2023 •

edited

sandersaares commented Dec 18, 2023 •

edited

sandersaares commented Dec 18, 2023

daverant commented Jan 11, 2024

daverant commented Jan 11, 2024 •

edited

Metrics reporting failing after update to v8.x #460

Metrics reporting failing after update to v8.x #460

Comments

daverant commented Dec 6, 2023

sandersaares commented Dec 6, 2023

daverant commented Dec 6, 2023 • edited

daverant commented Dec 13, 2023 • edited

sandersaares commented Dec 18, 2023 • edited

sandersaares commented Dec 18, 2023

daverant commented Jan 11, 2024

daverant commented Jan 11, 2024 • edited

daverant commented Dec 6, 2023 •

edited

daverant commented Dec 13, 2023 •

edited

sandersaares commented Dec 18, 2023 •

edited

daverant commented Jan 11, 2024 •

edited