Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blocking progress mode to Python async #116

Open
wants to merge 14 commits into
base: branch-0.36
Choose a base branch
from

Conversation

pentschev
Copy link
Member

Implements the blocking progress mode (UCX-Py default), which was still not implemented in UCXX.

@pentschev pentschev requested review from a team as code owners November 2, 2023 20:43
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the core of this looks fine, but I am reminded that this epoll_wait issue with asyncio doesn't actually quite work correctly with the code we are using (see discussion here rapidsai/ucx-py#888)

cpp/include/ucxx/worker.h Show resolved Hide resolved
Comment on lines +108 to +110
# - All asyncio tasks that isn't waiting on UCX must be executed
# so that the asyncio's next state is epoll wait.
# See <https://github.com/rapidsai/ucx-py/issues/413>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I think I understand the constraint, I am not sure what it means for the "next state" to be epoll_wait. Surely there can be arbitrary non-ucx tasks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna be honest and say my understanding here is also a bit fuzzy, and as you noted yourself this "doesn't work" (except when it does), with the original being adapted from https://stackoverflow.com/a/48491563. In my understanding, what epoll_wait refers to here is the socket state, which is there solely to provide a mechanism to prevent asyncio from running out of "ready" tasks, so epoll_wait will ensure that if nothing useful happens in the event loop, asyncio will still be awaken at some point to allow the loop to reevaluate.


if self.worker.arm():
# At this point we know that asyncio's next state is
# epoll wait.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who does the epoll_wait call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my #116 (comment), I believe this is related to the sockets.

@pentschev
Copy link
Member Author

I think the core of this looks fine, but I am reminded that this epoll_wait issue with asyncio doesn't actually quite work correctly with the code we are using (see discussion here rapidsai/ucx-py#888)

Yes, I remember that. The purpose of this is not necessarily to be used long-term, but rather to have a fallback to original UCX-Py behavior. This will allow us testing as close as possible with what UCX-Py did in the past, and we may later deprecate/remove this if we are confident we have a better option (e.g., thread progress mode).

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanations!

@pentschev pentschev requested a review from a team as a code owner January 15, 2024 18:02
@pentschev pentschev changed the base branch from branch-0.35 to branch-0.36 January 15, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants