Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

learn-ocaml build -j 2 is broken #414

Closed
erikmd opened this issue Aug 27, 2021 · 3 comments · Fixed by #481 · May be fixed by #416
Closed

learn-ocaml build -j 2 is broken #414

erikmd opened this issue Aug 27, 2021 · 3 comments · Fixed by #481 · May be fixed by #416

Comments

@erikmd
Copy link
Member

erikmd commented Aug 27, 2021

Bug description

It seems there's a race condition in learn-ocaml build that prevents us from safely using the --jobs option.

To Reproduce

Steps to reproduce the behavior:

  • Clone learn-ocaml (git clone https://github.com/ocaml-sf/learn-ocaml.git)
  • Run cd learn-ocaml
  • Docker-pull learn-ocaml master (docker pull ocamlsf/learn-ocaml:master)
  • Run docker run --rm -it -v "$PWD/demo-repository:/repository" ocamlsf/learn-ocaml:master -j 2

Then, there is high probability that we get:

Learnocaml v.0.12 running.
Updating app at ./www
Cannot process exercises: Unix.Unix_error(Unix.EEXIST, "mkdir", "./www/static")

if the repository contains 2+ exercises (which is now the case of demo-repository).

Expected behavior

No error should happen.

@erikmd
Copy link
Member Author

erikmd commented Aug 27, 2021

As an aside, when this bug will be fixed:

we may want to add a -j 2 (or -j $(nproc)) flag in the biggest, learn-ocaml-corpus based, test of our CI to get a speed-up…

@AltGr
Copy link
Collaborator

AltGr commented Sep 17, 2021

I have examined this a bit, and it attempts to use Lwt_unix.fork() which just can't be made to work reliably (see e.g. ocsigen/lwt#812 and there are others like it) ; duplicating the Lwt state in the child process is just going to result in a mess, I am even puzzled that there appears to have been an Lwt version where it worked at some point.

So the option should just be disabled at this point. The more reasonable implementation could be:

  • define a dedicated CLI command to compile a specific exercise
  • run a whole process with that command inside the Lwt_list.map_p on each exercise, using e.g. Lwt_process.exec. The difficulty is to make sure that all relevant options in the current context are properly passed to the child process.

Or, of course, we could wait for multicore which should make it fairly trivial 🚀

@erikmd
Copy link
Member Author

erikmd commented Sep 17, 2021

Hi @AltGr, thanks a lot for investigating, and for these pointers.

Or, of course, we could wait for multicore which should make it fairly trivial

Yes, that sounds a good trade-off → and replacing the learn-ocaml build -j $n option with an error, or just with (a -j 1 fallback + a warning)… I don't know what'd be more sensible 🤔

BTW, I've just skimmed the open issues, and it seems this had been spotted some time ago;
(just putting the URL here to link both issues: #158)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants