Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip default value for check-single-keys #891

Open
d0mitoridesu opened this issue Mar 13, 2024 · 1 comment
Open

Skip default value for check-single-keys #891

d0mitoridesu opened this issue Mar 13, 2024 · 1 comment

Comments

@d0mitoridesu
Copy link

Hello!
On large setups, there's an issue that when Redis is heavily loaded with data, SCAN doesn't perform well and only adds more load to Redis. To address this, there's check-single-keys, but it has its own problem. It sets the metric to 0 if the key is not found. For example, if I have 10 clusters managed by Sentinel, each with three nodes, and a total of 2000 queues across all of them. It's not known in advance which nodes will have which queues and which will be the masters today, so we monitor everything. This results in 10 * 5 * 2000 = 100,000 metrics instead of 2000 with SCAN. This leads to a significant increase in cardinality and a plethora of useless metrics. This overloads Grafana too. Is it possible to add an option for check-single-keys that allows skipping the key (LIST or STREAM) if it's not found, behaving like SCAN?

It would also be good to consider timeouts for scan. I mean terminating the scan if a certain timeout is reached. Otherwise, it turns out that for every Prometheus request, we have a bunch of hanging scans.

@oliver006
Copy link
Owner

This leads to a significant increase in cardinality and a plethora of useless metrics

That's true and I isn't really desirable. I vaguely remember a past conversation about this but I think it violates the Prometheus best-practices. I think it's preferred to not submit a metric at all rather than zero if no data was found.
This is a breaking change though so we'd have to be a bit careful about communicating this.

It would also be good to consider timeouts for scan

That totally makes sense. That could be a good thing to add regardless of the issue above to make the exporter more robust. Let me know if you want to look into sumitting a PR for that, otherwise we can keep it open if someone else is interested in contributing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants