Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Histogram Collection Creates Errors on Each Scrape With Older Redis Versions #784

Closed
Calebjh opened this issue Apr 5, 2023 · 11 comments · Fixed by #913
Closed

New Histogram Collection Creates Errors on Each Scrape With Older Redis Versions #784

Calebjh opened this issue Apr 5, 2023 · 11 comments · Fixed by #913
Assignees

Comments

@Calebjh
Copy link

Calebjh commented Apr 5, 2023

Describe the problem
When running redis 6.x, the new histogram metrics cause Redis error counter to increase on every scrape, even though the error message logged is limited to one.

time="2023-04-05T18:32:08Z" level=info msg="Redis Metrics Exporter v1.49.0    build date:     sha1: b8555085b299a28fd67642302072c2cf77afbce8    Go: go1.20.2    GOOS: linux    GOARCH: amd64"
time="2023-04-05T18:32:08Z" level=info msg="Providing metrics at :9121/metrics"
time="2023-04-05T18:32:32Z" level=error msg="WARNING, LOGGED ONCE ONLY: cmd LATENCY HISTOGRAM, err: ERR Unknown subcommand or wrong number of arguments for 'HISTOGRAM'. Try LATENCY HELP."

It would be convenient to have a way to disable the new histogram via ENV or automatically if the command is failing.

What version of redis_exporter are you running?
Please run redis_exporter --version if you're not sure what version you're running.
[ ] 0.3x.x
[x] 1.48+

Running the exporter
What's the full command you're using to run the exporter? (please remove passwords and other sensitive data)

Expected behavior
Testing an upgrade from an older version to the newest versions. Expected things to continue working, with new metrics available.

Screenshots
Periods where error counter is increasing are running v1.49, and when it is stable are v1.47
Screen Shot 2023-04-05 at 11 53 20 AM

Additional context

@oliver006
Copy link
Owner

Thank for opening this issue.
This indeed looks wrong. I think there are multiple ways to handle this

  1. disable checking for the latency histogram after thr first error. tricky cause now we have to keep state in the exporter across scrapes and the way the /scrape endpoint works won't work any longer
  2. start making the exporter version-aware of the redis instance it's scraping and only take certain actions if we know they're supported. rather clean but adds complexity and possibly a lot of "if version > x.y" code
  3. add a flag to disable accessing the latency histogram - easiest to implement but user needs to take action to configure the exporter right

I'm leaning towards 3. - what do you think?

@Calebjh
Copy link
Author

Calebjh commented Apr 10, 2023

I would be inclined toward the 3rd option, as well. I'm sure there would be advantages toward the 2nd, but it sounds like a lot of unnecessary work, and the 1st sounds finicky, since you might hit some transient error and disable it accidentally.
From a backwards compatibility standpoint, it might make more sense in my opinion for the default state to be disabled with an option to enable. Otherwise I might not be the only one to try upgrading to the latest version and start seeing unexpected errors.

@oliver006
Copy link
Owner

Makes sense.
I'm a little short on time the next few weeks but if you're interested in submitting a PR I can review and merge it.

@gzivdo
Copy link

gzivdo commented Apr 19, 2023

2 option is right way. redis_version present in "info all"
The fast fix

func (e *Exporter) extractLatencyHistogramMetrics(outChan chan<- prometheus.Metric, infoAll string, redisConn redis.Conn) {
+       if !(strings.Contains(infoAll, "redis_version:7.")) {
+               return
+       }

@tdewolff
Copy link

@gzivdo that looks like the best fix, is there a possibility to get this merged please?

@oliver006
Copy link
Owner

@tdewolff - if you submit a PR I can have a look and review it to get it merged.

@grafanalf
Copy link

ping?

grafanalf added a commit to grafanalf/redis_exporter that referenced this issue Nov 3, 2023
@grafanalf
Copy link

#847

@azhurbilo
Copy link

@oliver006 I see your comment #847 (comment)

The log-line should show up only once - is it that big of an issue?

for us for example it's an issue as it generate Errors (redis_errors_total metric) for each exporter every time (not only in the beginning) and it prevents us to make normal alerts

@oliver006
Copy link
Owner

Hmm, interesting, I agree that's not optimal (it shouldn't generate an error the first time it happens).
Let me see how ot improve it, maybe bump the redis_errors_total metric only after the first error?

@azhurbilo
Copy link

I think option 3 is much better than generate 1 error at start

add a flag to disable accessing the latency histogram - easiest to implement but user needs to take action to configure the exporter right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment