Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot evaluate communication tasks with at least 10 user processes #1207

Open
Daniel-Aga opened this issue May 20, 2022 · 2 comments
Open
Milestone

Comments

@Daniel-Aga
Copy link

Description:
The system cannot evaluate submissions for communication tasks which are configured to have at least $10$ user processes.
The submissions are "stuck" in the evaluation phase and are re-evaluated (indefinitely?).
In the sandbox logs, the status is XX, and the message is execve("./task"):Resource Temporarily Unavailable.
I believe the reason for this behavior is the definition of box_id in cms/grading/Sandbox.py, Lines 861-873.
The code there allocates $10$ ids per worker shard, and when the worker tries to evaluate a submission with at least $10$ user processes (plus an additional manager process), duplicate ids are used.

As a workaround, one could increase the number of ids allocated for each worker shard in Sandbox.py, but perhaps we can find a more generic fix.

Expected: The submissions should be evaluated correctly.

Actual: The submissions are re-evaluated due to sandbox errors.

System Information

CMS version: 1.4.rc1
Was CMS installed: yes
Using a virtualenv: no

@wil93
Copy link
Member

wil93 commented Nov 27, 2022

Thanks for this report. I think we should look into having a more reliable solution.

Out of curiosity: does your use case require a fixed amount of box ids (which is >= 10) or do you need a variable amount of box ids?

@wil93 wil93 added this to the P1 milestone Nov 27, 2022
@Daniel-Aga
Copy link
Author

Initially I wanted to run a large (though fixed amount, around 500 boxes), but in the end I figured out a way to rephrase the task and use only 3, so it was fine 😃

By the way, when I worked on this I tried to increase the number of sandbox ids and I encountered another problem: 29 user processes worked, but 30 (or more) didn't. The reason turned out to be that isolate only allowed 64 open files for each process. So the manager could not open the fifos to the last user process. In the latest version of isolate, it can take as a command line argument ("-n") the number of open files, so after cms will migrate to this version, we should make sure to use the correct open files limit when we initialize the sandbox in communication tasks.
I wrote about it in gitter back then and wanted to add this info to this issue, but forgot...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants