New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Worker metrics. #5234
base: main
Are you sure you want to change the base?
Add Worker metrics. #5234
Conversation
0f45777
to
d51fdb5
Compare
1d7267e
to
35caff4
Compare
addade5
to
1c84817
Compare
"description": "Number of unblocked tasks waiting in the queue.", | ||
"unit": "tasks", | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the value of the metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we talked before, this involves some changes to the test machinery, and yet we probably won't be able to catch the right value for the metric during the tests.
Talking with @lubosmj and @dkliban we understood that changing to use the OpenTelemetry Collector and test the metrics as it's exported to be consumed by Prometheus could generate better results. Yet, it's a considerable effort that is out of the scope of this task.
unblocked_tasks_stats["longest_unblocked_waiting_time"].seconds | ||
) | ||
|
||
self.cursor.execute(f"NOTIFY pulp_worker_metrics_heartbeat, '{str(now)}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think we do not need to re-notify the worker, do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We notify all workers, that the work was done. So they hold off of doing it for another cooldown time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But since you asked, a comment to that end may help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks man. In the end I removed since it doesn't make sense to be used anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @lubosmj. @mdellweg reminded me why we're using this. #5234 (review)
@@ -392,6 +419,42 @@ def handle_available_tasks(self): | |||
keep_looping = True | |||
self.supervise_task(task) | |||
|
|||
def record_unblocked_waiting_tasks_metric(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any safeguard that prohibits the execution of this method if the telemetry is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. It simply will not be emitted if the agent is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is valid. In the current state, we still run the query even if we send nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we have an otel_enabled setting we can look at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this stage of the implementation, I think we can squash all the commits into a single one. What do you think?
82d79e4
to
5023476
Compare
12b5b71
to
5e989df
Compare
89acc63
to
d8ec695
Compare
@@ -392,6 +419,42 @@ def handle_available_tasks(self): | |||
keep_looping = True | |||
self.supervise_task(task) | |||
|
|||
def record_unblocked_waiting_tasks_metric(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is valid. In the current state, we still run the query even if we send nothing.
Closes pulp#3821 Co-authored-by: Matthias Dellweg <2500@gmx.de> Co-authored-by: Ľuboš Mjachky <lmjachky@redhat.com> Co-authored-by: Grant Gainey <ggainey@users.noreply.github.com> Co-authored-by: Ina Panova <ipanova@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's the last question remaining, how to avoid the costly db queries on systems that do not run with otel.
pytest.skip("Need PULP_OTEL_ENABLED to run this test.") | ||
|
||
# Checking online workers ready to get a task | ||
workers_online = pulpcore_bindings.WorkersApi.list(online="true").count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Close #3821