Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Worker metrics. #5234

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Add Worker metrics. #5234

wants to merge 1 commit into from

Conversation

decko
Copy link
Member

@decko decko commented Apr 8, 2024

Close #3821

pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
@decko decko marked this pull request as ready for review April 25, 2024 15:44
@decko decko requested review from lubosmj and mdellweg April 25, 2024 15:44
pulpcore/constants.py Show resolved Hide resolved
pulpcore/tasking/worker.py Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
@decko decko force-pushed the worker_metrics branch 2 times, most recently from 1d7267e to 35caff4 Compare April 29, 2024 11:30
@decko decko closed this Apr 29, 2024
@decko decko force-pushed the worker_metrics branch 7 times, most recently from addade5 to 1c84817 Compare May 3, 2024 21:18
pulpcore/tests/functional/api/test_tasking.py Outdated Show resolved Hide resolved
pulpcore/tests/functional/api/test_tasking.py Show resolved Hide resolved
"description": "Number of unblocked tasks waiting in the queue.",
"unit": "tasks",
}
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the value of the metric?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked before, this involves some changes to the test machinery, and yet we probably won't be able to catch the right value for the metric during the tests.
Talking with @lubosmj and @dkliban we understood that changing to use the OpenTelemetry Collector and test the metrics as it's exported to be consumed by Prometheus could generate better results. Yet, it's a considerable effort that is out of the scope of this task.

unblocked_tasks_stats["longest_unblocked_waiting_time"].seconds
)

self.cursor.execute(f"NOTIFY pulp_worker_metrics_heartbeat, '{str(now)}'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think we do not need to re-notify the worker, do we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We notify all workers, that the work was done. So they hold off of doing it for another cooldown time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since you asked, a comment to that end may help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks man. In the end I removed since it doesn't make sense to be used anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @lubosmj. @mdellweg reminded me why we're using this. #5234 (review)

@@ -392,6 +419,42 @@ def handle_available_tasks(self):
keep_looping = True
self.supervise_task(task)

def record_unblocked_waiting_tasks_metric(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any safeguard that prohibits the execution of this method if the telemetry is disabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. It simply will not be emitted if the agent is not used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is valid. In the current state, we still run the query even if we send nothing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we have an otel_enabled setting we can look at?

CHANGES/3821.feature Outdated Show resolved Hide resolved
Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage of the implementation, I think we can squash all the commits into a single one. What do you think?

@decko decko force-pushed the worker_metrics branch 2 times, most recently from 82d79e4 to 5023476 Compare May 6, 2024 16:05
pulpcore/tasking/worker.py Outdated Show resolved Hide resolved
@decko decko requested review from mdellweg and lubosmj May 6, 2024 17:19
@decko decko force-pushed the worker_metrics branch 2 times, most recently from 12b5b71 to 5e989df Compare May 6, 2024 21:57
pulpcore/app/models/task.py Outdated Show resolved Hide resolved
pulpcore/tests/functional/assets/otel_server.py Outdated Show resolved Hide resolved
@decko decko force-pushed the worker_metrics branch 4 times, most recently from 89acc63 to d8ec695 Compare May 8, 2024 15:14
pulpcore/tests/functional/api/test_tasking.py Outdated Show resolved Hide resolved
@@ -392,6 +419,42 @@ def handle_available_tasks(self):
keep_looping = True
self.supervise_task(task)

def record_unblocked_waiting_tasks_metric(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is valid. In the current state, we still run the query even if we send nothing.

Closes pulp#3821

Co-authored-by: Matthias Dellweg <2500@gmx.de>
Co-authored-by: Ľuboš Mjachky <lmjachky@redhat.com>
Co-authored-by: Grant Gainey <ggainey@users.noreply.github.com>
Co-authored-by: Ina Panova <ipanova@redhat.com>
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the last question remaining, how to avoid the costly db queries on systems that do not run with otel.

pytest.skip("Need PULP_OTEL_ENABLED to run this test.")

# Checking online workers ready to get a task
workers_online = pulpcore_bindings.WorkersApi.list(online="true").count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adds open telemetry metrics for tasks
5 participants