-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement predicate_query to validate which are the metrics to be collected #4503
Conversation
❗ By default, the pull request is configured to backport to all release branches.
|
@@ -378,6 +387,66 @@ func (c QueryCollector) collect(conn *sql.DB, ch chan<- prometheus.Metric) error | |||
return nil | |||
} | |||
|
|||
// isCollectable checks if a query to collect metrics can be executed or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why we are accepting multiple types?
In my opinion, we should only evaluate booleans, and it should be the responsibility of the query to return the intended type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Armando. It's better to rely on booleans to do that.
True means it's collectible, false means it's not, and NULL means we don't know.
And if we don't know, perhaps it's better not to collect it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to summarize we want to support ONLY bools so that even numbers are not tolerated. Also, do we want to force the developers to deliver a query that returns ONLY one column? If so we can simplify the whole logic with a single QueryRow
and a Scan
that accepts a bool variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think evaluating a bool should be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment and the method accordingly. Now assume that the type is a bool
if not we return false
and log a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I disagree and would prefer to see it accept both a 0-row result and a null as a false result. I can see the argument both ways though, so what you say goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated code handles:
bool
checking the first row first column.- no rows, returning
false
. - null value, returning
false
.
Thank you, @gabrosys, for this! Welcome to the CNPG community! |
No, it definitely should not log, or not at above In general there is IMO a need to improve how CNP's metrics scraper diagnostics work and make it easier to understand its behaviour + detect errors. That's why I wrote a little tool to run a one-shot scrape against a pre-configured postgres recently. I'll see if I can tidy that up for a community contribution. But logging isn't IMO the answer to that. I already have enough problems with it quietly logging a complaint and happily continuing with some confusing behaviour. It might make sense to emit metrics as scrape output to report on skips, but they should probably be optional, and I don't personally think they're necessary as part of the basic feature. They could easily bloat scrape results and add unwanted scrape cardinality, requiring more filtering by scrape result consumers to discard the expected ones.
The problem with that is that the user has to reliably assume the query has the expected results under all circumstances. Part of the point of this is to make the CNP scraper robust in the face of differing server versions, extension installs, vendor flavours, etc etc. OTOH if the query does something unexpected, a clear error log might be better than just silently returning false and not running the query. IDK. Right now it's hard to properly test CNP scraper configs across a wide matrix of possible server configs, so I prefer to be error-tolerant, but I can see the argument for strictness too. I'd personally prefer it to accept a 0-row result as false, but it's not hard to wrap that in a |
/test tl=4 l=local |
@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9131921366 |
@mnencia What can we do to progress this PR? |
@gabrosys I see a test failure
https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9131921366/job/25112487661#step:9:5716 It failed in some of the other runs too, but passed in others. Details https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9131921366/job/25112490237#step:9:2344
so it looks like an issue with the spec for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identified query bug causing test failure
tests/e2e/fixtures/metrics/custom-queries-with-predicate-query.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't isCollectable
be substantially simplified?
Update the PR as requested. Now we accept only
@jsilvela ready to be reviewed, thanks 🙇 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work, Gabriele
d527164
to
8bb9d0b
Compare
/test limit=local |
@armru, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9255900078 |
/test limit=local |
@armru, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/9271964382 |
Hello @gabrosys could you please integrate the |
Update the doc with a new paragraph with a simple example and add the new prop to the prop list. Thank you all for your support 🙇 |
…cs to be collected Signed-off-by: Gabriele <gabriele.rossini@enterprisedb.com>
…te_query Signed-off-by: Gabriele <gabriele.rossini@enterprisedb.com>
Signed-off-by: Gabriele <gabriele.rossini@enterprisedb.com>
Signed-off-by: Gabriele <gabriele.rossini@enterprisedb.com>
Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Gabriele <gabriele.rossini@enterprisedb.com>
Signed-off-by: Gabriele <gabriele.rossini@enterprisedb.com>
Signed-off-by: Gabriele <gabriele.rossini@enterprisedb.com>
…be collected (cloudnative-pg#4503) This patch introduces a new field called "predicate_query" in the user-defined metrics. The `predicate_query` is an SQL query that returns at most one row and one `boolean` column to run on the target database. The system evaluates the predicate and, if `true,` executes the `query.` This allows the users to execute the user-defined query only when certain criteria are met. Closes cloudnative-pg#4499 Signed-off-by: Gabriele <gabriele.rossini@enterprisedb.com> Signed-off-by: Jaime Silvela <jaime.silvela@enterprisedb.com> Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Co-authored-by: Jaime Silvela <jaime.silvela@enterprisedb.com> Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Signed-off-by: Douglass Kirkley <dkirkley@eitccorp.com>
What
Implement a validation logic to verify if a query to collect metrics should be executed or not.
How
predicate_query
.isCollectable
that is invoked bycollect
passing the sametx
(to reduce the performance consumption) that returns abool
.predicate_query
.Test
e2e-test-local
withFEATURE_TYPE= observability
andTEST_DEPTH=3
.Open points - Questions
false
response of thepredicate_query
? I want to avoid bloating PG with this kind of log so this is why currently it is not present.isCollectable
is intended to be invoked only within thecollect
method, maybe I could define it as anonymous method inside thecollect
?isCollectable
method if we assume strong and tight rules regarding the output of thepredicate_query
, what do you think about that?predicate_query
?Thanks at disposal for any question 🙇 🚀