Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High label cardinality issue #53

Open
sergioasantiago opened this issue Feb 7, 2023 · 6 comments
Open

High label cardinality issue #53

sergioasantiago opened this issue Feb 7, 2023 · 6 comments

Comments

@sergioasantiago
Copy link
Contributor

sergioasantiago commented Feb 7, 2023

After a successful deployment of dependency-track-exporter, I start receiving alerts in our infrastructure because the exporter is generating labels with high cardinalities.

It is a known issue when taking into account Prometheus performance as stated in this article Cardinality is key by Robust Perception

After a deep investigation, I found that the offender metric is dependency_track_project_policy_violations which has a label uuid that can explode the number of combinations.

I would suggest dropping the uuid label since it doesn't bring benefits in this case as we already have the project name.

Unfortunately, I'm not a good Go developer, but I would be happy to help in any other way.

@ribbybibby
Copy link
Member

The uuid label is highly cardinal (it's different for each project) but I wouldn't have expected removing uuid to reduce cardinality.

Each uuid is associated with a project+version combination, so if you removed uuid, I would have thought you'd still have exactly the same number of series, just without the uuid label.

Unfortunately, the nature of this exporter is such that there will inevitably be a lot of series when you have a lot of projects.

That being said, dependency_track_project_policy_violations in particular has a explosively high cardinality because there are a lot of dimensions with quite a few possible values that combine to produce a lot of metric combinations:

3 (type) * 3 (state) * 4 (analysis) * 2 (suppressed) = 72 series for each project

Thinking off the top of my head, there are a few things we might consider doing in the exporter:

  1. We currently initialise all possible series with a 0 value to address this issue: increase() should consider creation of new timeseries as reset prometheus/prometheus#1673. We could stop doing this and we'd likely end up with a lot less series for your typical DT instance.
    • The downside to that would be that delta() wouldn't work as described in that issue. But perhaps we can find a query that works around that issue.
  2. Remove some or all of the dimensions. I'm not sure if any would be good candidates, they all seem quite useful to me.
  3. Just accept that the metric is not suitable for Prometheus and remove it.

You may also want to consider dropping the metric or a subset of the series with metric_relabel_configs in your Prometheus configuration. This would be a good solution if you aren't actually using the metric at all.

@sergioasantiago
Copy link
Contributor Author

I like the first option. It is unlikely that all the projects will have a value other than 0 at the same time so it would solve the high number of combinations.

Just to give you more details about my setup, I have 720 projects and 0 violations, which would not create a single metric and therefore no issues with high cardinality.

Regarding the caveats when choosing this approach, we have had a similar issue in the past and solved it with an OR like below:

delta(dependency_track_project_vulnerabilities{name="MyProject"}[5m]) > 0 or ((dependency_track_project_vulnerabilities{name="MyProject"} != 0 unless dependency_track_project_vulnerabilities{name="MyProject"} offset 1m))

This is only used when we want to alert whenever a new vulnerability is found (same works for policy violation)

PS: Unfortunately I cannot find the article where I got this solution to link it

@ribbybibby
Copy link
Member

Presumably you have 0 violations because you don't have any policies? If that's the case, then dropping the metric with metric_relabel_configs sounds like a good option for you.

It pains me a bit to implement 1, simply because it goes against the typical best-practice advice and often results in missed alerts because people aren't aware of the issues with increase()/delta(). The cardinality probably does outweigh that though.

@sergioasantiago
Copy link
Contributor Author

sergioasantiago commented Feb 7, 2023

I see. Unfortunately, we don't "have access" to the Prometheus configuration, since it was designed to not have exceptions like this one.
Sadly I will need to skip using dependency-track-exporter due to this limitation. Although we have implemented our own exporter (which is in Java) this one can do the same, but with fewer resources which would be good.

That said, I understand your point and appreciate the quick support.

Feel free to close the issue if you like.

@mieliespoor
Copy link

What would be the impact in light of this thread if the project tags are added as a label to the other metrics? At the moment, it is only available of the project info metric.

We use the tags to get an understanding which teams own which project in dependency track. We need to report what the security footprint for each team is, so will need to be able to report on only a subset of projects, which is where tags comes in.

@ribbybibby
Copy link
Member

You should be able to join tags from dependency_track_project_info into your queries, like in this example:

(dependency_track_project_policy_violations{state="WARN",analysis!="APPROVED",analysis!="REJECTED",suppressed="false"} > 0)
* on (uuid) group_left(tags,active) dependency_track_project_info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants