Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus cardinality explosion #344

Open
iszulcdeepsense opened this issue Oct 16, 2023 · 4 comments
Open

Prometheus cardinality explosion #344

iszulcdeepsense opened this issue Oct 16, 2023 · 4 comments
Labels
design It needs rethinking and designing

Comments

@iszulcdeepsense
Copy link
Collaborator

iszulcdeepsense commented Oct 16, 2023

Let's make sure we're secured against exploding metrics in its cardinality.
Prometheus label cardinality refers to the number of unique label value combinations in a given metric.
Specifically, if Prometheus metrics have too many labels dimensions, it can cause number of all metrics values (combinations) to soar drastically, thus causing further problems like performance issues, exceeding storage limits, etc.

Prometheus TSDB storage is optimized for working with relatively low number of time series, not high cardinality.

@iszulcdeepsense
Copy link
Collaborator Author

Formerly, I've run into the issue with Prometheus volume running out of storage, even though there was configured much lower retention size. Maybe that's a result of cardinality explosion.

@iszulcdeepsense iszulcdeepsense added the design It needs rethinking and designing label Dec 4, 2023
@JosefAssadERST
Copy link
Member

This has a section titled "Find High Cardinality Metrics" which looks to me like a good place to start, i.e. figure out where we actually are.

In fact you might even want to have a meta-dashboard and alarm to keep an eye on cardinality, using some of those PromQL queries in that page.

@anders314159 anders314159 self-assigned this Mar 4, 2024
@anders314159
Copy link
Contributor

Without any specific examples of where it goes wrong, it is difficult to design specific policies.
However, in my locally setup kind cluster, most of the time series are related to Postgres, so that might be a place to start working on some nebulous "improvement":
image

@anders314159
Copy link
Contributor

anders314159 commented Mar 9, 2024

We might consider aggregating or outright removing some of the metrics/labels that are scraped from Postgres. On the other hand, compression might take care of most of Postgres metrics, which is why examples would be nice.

@anders314159 anders314159 removed their assignment Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design It needs rethinking and designing
Projects
None yet
Development

No branches or pull requests

3 participants