Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Judgehost crash did not recover well after restart #2476

Open
eldering opened this issue Apr 18, 2024 · 2 comments
Open

Judgehost crash did not recover well after restart #2476

eldering opened this issue Apr 18, 2024 · 2 comments

Comments

@eldering
Copy link
Member

Description of the problem

On the WF luxor online judge, submission 3989, judging 11755 for wf46 crashed/hung with the error below. The judgedaemon was restarted, but the judging was still pending and had to be manually rejudged.

[Apr 18 14:11:21.031] judgedaemon[686632]: API request POST judgehosts/fetch-work
[Apr 18 14:11:29.921] judgedaemon[686632]: ⇝ Received 5 'judging_run' judge tasks (endpoint default)
[Apr 18 14:11:29.921] judgedaemon[686632]:   Working directory: /opt/domjudge/output/judgings/judgehost0003-2/endpoint-default/3989/11755
[Apr 18 14:11:29.921] judgedaemon[686632]:   🔓 Executing chroot script: 'chroot-startstop.sh stop'
[Apr 18 14:11:29.945] judgedaemon[686632]:   🔒 Executing chroot script: 'chroot-startstop.sh start'
[Apr 18 14:11:29.989] judgedaemon[686632]: API request GET config
[Apr 18 14:11:30.030] judgedaemon[686632]: warning: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading This request will be retried after about 1.1937795567297sec... (1/3)
[Apr 18 14:11:31.265] judgedaemon[686632]: warning: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading This request will be retried after about 2.117632600813sec... (2/3)
[Apr 18 14:11:33.423] judgedaemon[686632]: error: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading Retry limit reached.
[Apr 18 14:11:36.724] judgedaemon[688607]: Judge started on judgehost0003-2 [DOMjudge/8.3.0DEV/0121a2f98]

and

[Apr 18 14:11:01.708] judgedaemon[777720]: API request POST judgehosts/add-judging-run/judgehost0006-2/824666
[Apr 18 14:11:01.735] judgedaemon[777720]:   ESC[1;31m✗ESC[0m  ...done in 0.015s (CPU: 0.001s), result: run-error
[Apr 18 14:11:01.735] judgedaemon[777720]: API request POST judgehosts
[Apr 18 14:11:01.777] judgedaemon[777720]:   🔙 Returned unfinished judging with jobid 11753 in my name; given back unfinished runs from me.
[Apr 18 14:11:01.777] judgedaemon[777720]: API request POST judgehosts/fetch-work
[Apr 18 14:11:01.795] judgedaemon[777720]:   🔓 Executing chroot script: 'chroot-startstop.sh stop'
[Apr 18 14:11:01.816] judgedaemon[777720]: No submissions in queue (for endpoint default), waiting...
[Apr 18 14:11:09.676] judgedaemon[777720]: ⇝ Received 5 'judging_run' judge tasks (endpoint default)
[Apr 18 14:11:09.676] judgedaemon[777720]:   Working directory: /opt/domjudge/output/judgings/judgehost0006-2/endpoint-default/3988/11754
[Apr 18 14:11:09.678] judgedaemon[777720]:   🔒 Executing chroot script: 'chroot-startstop.sh start'
[Apr 18 14:11:09.722] judgedaemon[777720]: API request GET config
[Apr 18 14:11:09.763] judgedaemon[777720]: warning: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading This request will be retried after about 1.0024431142967sec... (1/3)
[Apr 18 14:11:10.807] judgedaemon[777720]: warning: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading This request will be retried after about 2.0028794896802sec... (2/3)
[Apr 18 14:11:12.851] judgedaemon[777720]: error: Error while executing curl GET to url https://domjudge-online.icpc-vcss.org/api/config?: error:0A000126:SSL routines::unexpected eof while reading Retry limit reached.
[Apr 18 14:11:16.152] judgedaemon[778589]: Judge started on judgehost0006-2 [DOMjudge/8.3.0DEV/0121a2f98]
[Apr 18 14:11:16.153] judgedaemon[778589]: Installing signal handlers
[Uploading debug-s3989-judgehost0003.zip…]()

Your environment

DOMjudge at https://onlinejudge.icpc.global/ on the wfluxor-online branch.

@eldering
Copy link
Member Author

@vmcj
Copy link
Member

vmcj commented Apr 18, 2024

debug-s3989-judgehost0003.zip debug-s3989-judgehost0006.zip

I suspect we need to keep the access/error log also for this one as I suspect we ran out of PHP fpm workers at that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants