Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: new option à la --replace to leverage this existing feature in a docker restart context #594

Closed
erikmd opened this issue Feb 29, 2024 · 8 comments · Fixed by #597 · May be fixed by #600
Closed

Feature: new option à la --replace to leverage this existing feature in a docker restart context #594

erikmd opened this issue Feb 29, 2024 · 8 comments · Fixed by #597 · May be fixed by #600
Assignees
Labels
kind: feature New user-facing feature.

Comments

@erikmd
Copy link
Member

erikmd commented Feb 29, 2024

Related user(s):

@erikmd, @AltGr

Related issue(s) or PR(s):

Related project scope(s):

No response

The problem:

  • learn-ocaml 1.0.0 's learn-ocaml build serve --replace is very nice 💯
  • but in a docker-compose context, on an exodir update, we don't successively start 2 learn-ocaml procs
    (anyway, there's the 1-proc-per-container rule!)
  • we rather use docker compose restart, and we would really like to benefit from the --replace semantics in this case

Wanted solution:

  • add an alternative option to --replace, e.g. --fork-replace, or --smooth-replace

Considered alternatives:

  • (add an additional option, passed along with --replace, e.g. --fork or --quick-restart)
  • (but issue if --fork is passed without --replace)

Additional context:

We should think about ENV VARS as well! (e.g., a single environment variable that can be set to {}="0", "1" or "2")

@erikmd erikmd added the kind: feature New user-facing feature. label Feb 29, 2024
@erikmd erikmd changed the title Feature: new option à la --replace to be leverage this latter option in docker restart context Feature: new option à la --replace to leverage this existing feature in a docker restart context Feb 29, 2024
@AltGr
Copy link
Collaborator

AltGr commented Mar 1, 2024

I am not sure the executable itself can do anything in the case when it is containerised in this way (I don't understand what --fork-replace would do ?): AFAIK replacing a containerised server with minimal downtime is one of the core functions of any load-balancer / orchestrator: start new container, wait for it to start accepting connections, start routing incoming connections to it, kill older container -- was'nt that available before I added the option for the stand-alone case ?

Maybe the new option would help with the management of stored state during such transitions ?

@erikmd
Copy link
Member Author

erikmd commented Mar 1, 2024

Hi @AltGr, thanks a lot for your comment! Indeed, the overall aim is to have a Dockerised server with minimal downtime.

I agree that this may be possible to implement this by using a Docker Swarm or Kubernetes balancer, several instances sharing the www folder, but this looks quite difficult to me to implement though. And in usual teaching context, we only have ~100 students, where a single instance using Docker Compose already scales for that, while the --replace option can't be used in this context IINM. Indeed:

  • in general we only have 1 main process per container,
  • and if we spin two different instances, the www folder is not shared between containers, nor the TCP interface (each container has a localhost:8080 address which is local).

So my idea was to support a similar CLI option, to offer this high-availability feature easily within the app's logic, namely:

  • at container startup, the default command learn-ocaml build serve --repo=/repository --fork-replace is run;
  • if the www folder already exist (and should be rebuilt), it forks its own process; the first one serves the existing www, and the second one behaves as learn-ocaml build serve --repo=/repository --replace;
  • if the www folder doesn't exist, the option --fork-replace is ignored and no fork occurs.

→ What do you think?

Have a nice WE!

@erikmd
Copy link
Member Author

erikmd commented Mar 1, 2024

Maybe the new option would help with the management of stored state during such transitions ?

I didn't think about this (it's another topic!): maybe there will also be sth to do to get a safer --replace; I don't know.

@AltGr
Copy link
Collaborator

AltGr commented Mar 4, 2024

Ah! I understand better your intent.
Wouldn't [ -d www ] && learn-ocaml serve & learn-ocaml build serve --replace work then ?

@erikmd
Copy link
Member Author

erikmd commented Mar 4, 2024

Hi @AltGr! Mostly yes, this could work. (Except that [ -d www ] && … would yield a bad exit code if www doesn't exist.)

Actually when Docker entrypoints come into play, I believe it's always slightly better to avoid spinning a shell, to ensure that the entrypoint binary always receives proper signals (see e.g. https://hynek.me/articles/docker-signals/)

So, what do we decide?

Option 1. Just keep learn-ocaml CLI as is, and change the Docker entrypoint. (I downvote this one because of signals.)
Option 2. Implement --fork-replace with Unix.fork along with the current implementation of --replace.
Option 3. Implement --fork-replace with Unix.fork and with Unix.kill (given we know the PID of the child process)

And if ever Option 2. or 3. looks good to you, would you prefer to implement it, or to review the PR ? :)

@AltGr
Copy link
Collaborator

AltGr commented Mar 22, 2024

After reading https://hynek.me/articles/docker-signals/ I still think signals should work, since we already use dumb-init... it may be worth testing anyway. Otherwise no strong objection to option 3 (maybe with a more high-level name than fork-init, that doesn't convey the intent: something like --serve-right-away would be more explicit).
(also, be very careful with forking when any lwt stuff is going on, it's very easy to have extremely weird stuff happening even if you tried to clean up all lwt threads in the child)

@erikmd
Copy link
Member Author

erikmd commented Mar 22, 2024

Thanks @AltGr !

After reading https://hynek.me/articles/docker-signals/ I still think signals should work, since we already use dumb-init...

AFAIU, it would be very OK if the shell is ephemeral (using the exec builtin), which is not the case.

it may be worth testing anyway.

Sure, I can volunteer to do more tests.

Otherwise no strong objection to option 3 (maybe with a more high-level name than fork-init, that doesn't convey the intent: something like --serve-right-away would be more explicit).

OK, good point.

(also, be very careful with forking when any lwt stuff is going on, it's very easy to have extremely weird stuff happening even if you tried to clean up all lwt threads in the child)

OK. Would you have a few suggestions in mind, regarding the things to monitor ? maybe, with a dedicated tool ?

And do you think that the potentiel weird things that can occur would be specific to the forking ? or would they already occur with the current implementation of --replace ?

Have a very nice WE!

@AltGr
Copy link
Collaborator

AltGr commented Mar 25, 2024

Ah indeed, the Dockerfile ENTRYPOINT is ["dumb-init","/usr/bin/learn-ocaml","--sync-dir=/sync","--repo=/repository"], which doesn't go through a shell, but we'd need to interspede the snippet between the dumb-init and the learn-ocaml process. Of course we could exec the second run of learn-ocaml but we'll still run into problems if one tries to interrupt while the "temporary server" is still active (and that one will have to be in a subshell anyway, even if we exec it).

I'm still pretty sure it can be done... but the balance of which approach is simplest might be changed :)

As long as you fork before running Lwt_main.run you'll be fine. After that, you can use Lwt_unix.fork but if there is any lwt calls within the child, extremely bad and confusing things will happen. It's just fork, replace just kills an external process, which is safe.

erikmd added a commit to pfitaxel/learn-ocaml that referenced this issue Apr 9, 2024
Motivation:

- Makes it possible to provide a feature similar to the `--replace` option
  (namely, `learn-ocaml build serve --replace`) within a Docker context.

  As `--replace` needed to successively start 2 learn-ocaml processes
  listening to the same port, but if we spin two different containers,
  ./www is not shared between containers, nor the local TCP interface.

- If tweaking the default entrypoint could be an alternative solution,
  the shell script would be involved to cope with the need to handle
  signals properly.

- The solution implemented in this patch is simpler and can be enabled
  in a docker-compose context by passing:
  ```
  environment:
    LEARNOCAML_SERVE_DURING_BUILD: 'true'
  ```
  then run a command such as `docker restart learn-ocaml`.

Remarks:

- Using docker-compose, to restart a server and benefit from this feature, use
  ( docker compose stop ; docker compose restart )
  rather than
  ( docker compose down ; docker compose up -d )

- This commit has been double-checked with both native server
  ( make ; make opaminstall ;
    learn-ocaml build serve --serve-during-build --repo=$REPO )
  and bytecode server
  ( make ; make opaminstall ; mv $(which learn-ocaml-server){,~} ;
    learn-ocaml build serve --serve-during-build --repo=$REPO )

Close ocaml-sf#594
@erikmd erikmd self-assigned this Apr 9, 2024
erikmd added a commit to pfitaxel/learn-ocaml that referenced this issue Apr 10, 2024
Motivation:

- Makes it possible to provide a feature similar to the `--replace` option
  (namely, `learn-ocaml build serve --replace`) within a Docker context.

  As `--replace` needed to successively start 2 learn-ocaml processes
  listening to the same port, but if we spin two different containers,
  ./www is not shared between containers, nor the local TCP interface.

- If tweaking the default entrypoint could be an alternative solution,
  the shell script would be involved to cope with the need to handle
  signals properly.

- The solution implemented in this commit is simpler and can be enabled
  in a docker-compose context by passing:
  ```
  environment:
    LEARNOCAML_SERVE_DURING_BUILD: 'true'
  ```
  or:
  ```
  environment:
    - 'LEARNOCAML_SERVE_DURING_BUILD=true'
  ```
  then run a command such as `docker restart learn-ocaml`.

Remarks:

- Using docker-compose, to restart a server and benefit from this feature, use
  ( docker compose stop ; docker compose restart )
  rather than
  ( docker compose down ; docker compose up -d )

- This commit has been double-checked with both native server
  ( make ; make opaminstall ;
    learn-ocaml build serve --serve-during-build --repo=$REPO )
  and bytecode server
  ( make ; make opaminstall ; mv $(which learn-ocaml-server){,~} ;
    learn-ocaml build serve --serve-during-build --repo=$REPO )

- For uniformity, this commit also introduces an environment variable
  'LEARNOCAML_REPLACE=true' as a fallback for `--replace`.

Close ocaml-sf#594
erikmd added a commit to pfitaxel/learn-ocaml that referenced this issue Apr 11, 2024
Motivation:

- Makes it possible to provide a feature similar to the `--replace` option
  (namely, `learn-ocaml build serve --replace`) within a Docker context.

  As `--replace` needed to successively start 2 learn-ocaml processes
  listening to the same port, but if we spin two different containers,
  ./www is not shared between containers, nor the local TCP interface.

- If tweaking the default entrypoint could be an alternative solution,
  the shell script would be involved to cope with the need to handle
  signals properly.

- The solution implemented in this commit is simpler and can be enabled
  in a docker-compose context by passing:
  ```
  environment:
    LEARNOCAML_SERVE_DURING_BUILD: 'true'
  ```
  or:
  ```
  environment:
    - 'LEARNOCAML_SERVE_DURING_BUILD=true'
  ```
  then run a command such as `docker restart learn-ocaml`.

Remarks:

- Using docker-compose, to restart a server and benefit from this feature, use
  ( docker compose stop ; docker compose restart )
  rather than
  ( docker compose down ; docker compose up -d )

- This commit has been double-checked with both native server
  ( make ; make opaminstall ;
    learn-ocaml build serve --serve-during-build --repo=$REPO )
  and bytecode server
  ( make ; make opaminstall ; mv $(which learn-ocaml-server){,~} ;
    learn-ocaml build serve --serve-during-build --repo=$REPO )

- For uniformity, this commit also introduces an environment variable
  'LEARNOCAML_REPLACE=true' as a fallback for `--replace`.

Close ocaml-sf#594
erikmd added a commit to pfitaxel/learn-ocaml that referenced this issue Apr 11, 2024
Motivation:

- Makes it possible to provide a feature similar to the `--replace` option
  (namely, `learn-ocaml build serve --replace`) within a Docker context.

  As `--replace` needed to successively start 2 learn-ocaml processes
  listening to the same port, but if we spin two different containers,
  ./www is not shared between containers, nor the local TCP interface.

- If tweaking the default entrypoint could be an alternative solution,
  the shell script would be involved to cope with the need to handle
  signals properly.

- The solution implemented in this commit is simpler and can be enabled
  in a docker-compose context by passing:
  ```
  environment:
    LEARNOCAML_SERVE_DURING_BUILD: 'true'
  ```
  or:
  ```
  environment:
    - 'LEARNOCAML_SERVE_DURING_BUILD=true'
  ```
  then run a command such as `docker restart learn-ocaml`.

Remarks:

- Using docker-compose, to restart a server and benefit from this feature, use
  ( docker compose stop ; docker compose restart )
  rather than
  ( docker compose down ; docker compose up -d )

- This commit has been double-checked with both native server
  ( make ; make opaminstall ;
    learn-ocaml build serve --serve-during-build --repo=$REPO )
  and bytecode server
  ( make ; make opaminstall ; mv $(which learn-ocaml-server){,~} ;
    learn-ocaml build serve --serve-during-build --repo=$REPO )

- For uniformity, this commit also introduces an environment variable
  'LEARNOCAML_REPLACE=true' as a fallback for `--replace`.

Close ocaml-sf#594
erikmd added a commit to pfitaxel/learn-ocaml that referenced this issue Apr 19, 2024
Motivation:

- Makes it possible to provide a feature similar to the `--replace` option
  (namely, `learn-ocaml build serve --replace`) within a Docker context.

  As `--replace` needed to successively start 2 learn-ocaml processes
  listening to the same port, but if we spin two different containers,
  ./www is not shared between containers, nor the local TCP interface.

- If tweaking the default entrypoint could be an alternative solution,
  the shell script would be involved to cope with the need to handle
  signals properly.

- The solution implemented in this commit is simpler and can be enabled
  in a docker-compose context by passing:
  ```
  environment:
    LEARNOCAML_SERVE_DURING_BUILD: 'true'
  ```
  or:
  ```
  environment:
    - 'LEARNOCAML_SERVE_DURING_BUILD=true'
  ```
  then run a command such as `docker restart learn-ocaml`.

Remarks:

- Using docker-compose, to restart a server and benefit from this feature, use
  ( docker compose stop ; docker compose restart )
  rather than
  ( docker compose down ; docker compose up -d )

- This commit has been double-checked with both native server
  ( make ; make opaminstall ;
    learn-ocaml build serve --serve-during-build --repo=$REPO )
  and bytecode server
  ( make ; make opaminstall ; mv $(which learn-ocaml-server){,~} ;
    learn-ocaml build serve --serve-during-build --repo=$REPO )

- For uniformity, this commit also introduces an environment variable
  'LEARNOCAML_REPLACE=true' as a fallback for `--replace`.

Close ocaml-sf#594
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: feature New user-facing feature.
Projects
None yet
2 participants