Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0s in queue, ran for 0s #847

Open
art-w opened this issue Aug 1, 2023 · 9 comments
Open

0s in queue, ran for 0s #847

art-w opened this issue Aug 1, 2023 · 9 comments
Assignees
Labels
context/causing-pain Limitations of ocaml-ci which hurt development workflows

Comments

@art-w
Copy link
Contributor

art-w commented Aug 1, 2023

Context

I don't understand the 0s timings displayed by the CI header on some jobs. The example link is coming from https://github.com/ocurrent/current-bench/pull/438/checks?check_run_id=13718052712 which has other jobs with understandable timings.

Steps to reproduce

2023-07-31 15:44.43: New job: test ocurrent/current-bench https://github.com/ocurrent/current-bench.git#refs/heads/schema-infix-op (43c5f8cf5f7c5dfa7e1be4a50b904561c9af462c) (linux-arm64:debian-12-4.14_arm64_opam-2.1)
...
2023-07-31 15:44.43: Waiting for resource in pool OCluster
2023-07-31 17:06.50: Waiting for worker…
2023-07-31 17:22.45: Got resource from pool OCluster
...
2023-07-31 17:38.24: Job succeeded

Expected behaviour

16min in queue (or 1h32min?)
Ran for 16min

@art-w art-w added the type/bug Something isn't working label Aug 1, 2023
@benmandrew
Copy link
Contributor

Thanks for the report. IIRC these are cached results (and thus cached logs) so the queue- and run-time in the header are correct, but it would be ideal to mark them as cached to prevent confusion. I seem to remember this was a surprisingly hard problem to solve @novemberkilo?

@benmandrew benmandrew added context/causing-pain Limitations of ocaml-ci which hurt development workflows and removed type/bug Something isn't working labels Aug 1, 2023
@novemberkilo
Copy link
Contributor

Thanks @art-w -- as @benmandrew points out, unfortunately this is a known issue. Will add it to our list of fixes/enhancements. // @tmcgilchrist

@moyodiallo moyodiallo self-assigned this Oct 4, 2023
@moyodiallo
Copy link
Contributor

This issue is related to the work I did when connecting ocaml-ci and solver-service. Because we're send 2 different requests to OCluster, this is why we start immediately the job. Otherwise using Cluster connection, the job will be started twice so ends up failing.

Making those 2 different requests in one type of request to be sent to the solver-service can solve this issue.

@moyodiallo
Copy link
Contributor

moyodiallo commented Nov 17, 2023

Making those 2 different requests in one type of request to be sent to the solver-service can solve this issue.

Instead of having to upgrade the solver-service API each time we add a different request, it is preferable to have a pool for analysis in which all the different requests is sent at different time to the solver-service using the same API. Some selections like ocamlformat, opam-dune-lint have been got with different requests to the solver-service during the analysis job.

This PR #888 fixes the issue at line https://github.com/ocurrent/ocaml-ci/blame/b3c3facfe0e1e1e18dfd0389827f555908c1ee0b/lib/analyse.ml#L253, where the pool was removed at some point.

@moyodiallo
Copy link
Contributor

@art-w would you like to confirm the fix ?

@moyodiallo
Copy link
Contributor

@benmandrew this could be closed, I think.

@art-w
Copy link
Contributor Author

art-w commented Dec 6, 2023

Oh sorry @moyodiallo I didn't see your message! I'm not sure I understand the technical details, beside the 0s timings being related to the cache (and so it's obviously hard to fix)... So without digging into the code, I had a look at the latest commit on ocurrent which shows a bunch of tasks with 0s duration: https://ocaml.ci.dev/github/ocurrent/ocurrent/commit/8e0b9d4bb348b13df8696fe63feba303b9a476fd (I don't know if the CI is running your fix though!)

(also I understand that there were other issues related to cluster jobs which were higher priorities, I don't think the run duration is critical for end-users, it's a bit confusing but otherwise a minor issue)

@benmandrew
Copy link
Contributor

@art-w you are correct, the issue is related to the ocurrent cache, not the cluster connection. This issue still exists as you saw.

@moyodiallo
Copy link
Contributor

Sorry guys (@benmandrew, @art-w) I mixed it, I solved another issue and thinking it is related. The issue I solved is when all the analysis jobs start with 0s in queue and lot of them keep waiting at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
context/causing-pain Limitations of ocaml-ci which hurt development workflows
Projects
None yet
Development

No branches or pull requests

4 participants