Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Standalone] BUG: several tasks fail and get stuck in pending #1346

Closed
kikemolina3 opened this issue May 12, 2024 · 2 comments
Closed

[Standalone] BUG: several tasks fail and get stuck in pending #1346

kikemolina3 opened this issue May 12, 2024 · 2 comments

Comments

@kikemolina3
Copy link
Contributor

Hello,

When I was experimenting launching a large map (e.g. 1200 tasks) in EC2 standalone mode, I could notice that some of these tasks never passed the Pending status. I have experimented some time before encountering this problem, so the failure ratio is low: in the map stage of 1200 roles, only approx. 5~10 roles fail.

Entering the localhost-runner.log file in the VM worker, I can find the error EOFError: Ran out of input, inside the get_function_and_modules function, only for the failed processes.

This suggests to me that some workers try to read the pickle function file when its size is already 0 (maybe it is still open by the writer process?).

I will do a PR trying to solve this problem.

Have a nice weekend!

@kikemolina3
Copy link
Contributor Author

I just realized that very recently (a few commits ago: 267601a) this fact was taken into account. Please feel free to close this issue and the related PR if the bug was previously resolved by you.

@kikemolina3
Copy link
Contributor Author

Closed after check issue was previously resolved in master branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant