Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Carry 38704] Implement exec kill API #41548

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

awesomebytes
Copy link

@awesomebytes awesomebytes commented Oct 14, 2020

Closes #35703 #9098

Hello this is just a rebase of #38704

  • What I did

I rebased #38704 (which has all the rest of the questions answered).

Note that I do not know go... I just really would like to have this feature implemented. I hope there is CI that can pickup any problem I generated while rebasing.

@AkihiroSuda AkihiroSuda changed the title Implement exec kill API [Carry 38704] Implement exec kill API Oct 14, 2020
@awesomebytes awesomebytes force-pushed the 35703-exec-kill branch 3 times, most recently from b5e70eb to abeca6b Compare October 16, 2020 12:31
@awesomebytes
Copy link
Author

Hey @AkihiroSuda I finally managed to fix all the CI errors. Could you take a look at this PR again please?

Thank you a lot, and I'm sorry it took me so many tries. For sure squash the commits...

@AkihiroSuda
Copy link
Member

@dtaniwaki PTAL?

@dtaniwaki
Copy link

Cool! Thank you for updating my PR, @awesomebytes. It looks great to me 👍

@awesomebytes
Copy link
Author

@dtaniwaki @AkihiroSuda I just pushed the changes you asked for to my best of my understanding/abilities. It's on the hands of the CI now! (I did run the unit tests locally first. The integration tests take forever in my machine, it's way faster in your CI).

@awesomebytes
Copy link
Author

Hey @AkihiroSuda all tests have passed (there is a flaky test unrelated to my changes that failed apparently?). Are we good to merge?

Additionally, how does the timeline look for API 1.42 to be available? Is it a matter of a few months?

And adding to that question, if I'd like to use it in my environment already, is there an 'easy' way? Something like generating .deb files from the build in my machine?

Thanks again!

@AkihiroSuda
Copy link
Member

v20.10 (API v1.41) is feature-freezed, so probably this PR will be merged after the release of v20.10, before v21.XX (v21.03?).

daemon/exec.go Outdated Show resolved Hide resolved
api/server/router/container/exec.go Outdated Show resolved Hide resolved
daemon/exec.go Outdated Show resolved Hide resolved
docs/api/version-history.md Outdated Show resolved Hide resolved
@awesomebytes
Copy link
Author

The PR is ready for review or hopefully merge again @AkihiroSuda @cpuguy83

The CI failed because of some DNS issue that I believe has nothing to do with this PR.

func (s *containerRouter) postContainerExecKill(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
version := httputils.VersionFromContext(ctx)
if versions.LessThan(version, "1.42") {
return errdefs.InvalidParameter(fmt.Errorf("postContainerExecKill requires API version 1.42, but the Docker daemon API version is %s", version))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/postContainerExecKill/exec kill/

// If we have a signal, look at it. Otherwise, do nothing
if sigStr := r.Form.Get("signal"); sigStr != "" {
var err error
if sig, err = signal.ParseSignal(sigStr); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what we do in the container API? I think we should have this in ContainerExecKill rather than here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block appears to be a copy-paste from the postContainersKill:

// If we have a signal, look at it. Otherwise, do nothing
if sigStr := r.Form.Get("signal"); sigStr != "" {
var err error
if sig, err = signal.ParseSignal(sigStr); err != nil {
return errdefs.InvalidParameter(err)
}
}

Let me know how to proceed for this PR - leave both as-is, update the exec kill one only, or update both?

daemon/exec.go Outdated
logrus.Infof("Container %v, process %v failed to exit within %v of signal TERM - using the force", c.ID, name, termProcessTimeout)
daemon.containerd.SignalProcess(ctx, c.ID, name, int(signal.SignalMap["KILL"]))
logrus.Infof("Container %v, process %v failed to exit within %d seconds of signal TERM - using the force", c.ID, name, termProcessTimeout)
daemon.ContainerExecKill(ctx, name, uint64(signal.SignalMap["KILL"]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

daemon/exec.go Outdated
@@ -276,15 +277,15 @@ func (daemon *Daemon) ContainerExecStart(ctx context.Context, name string, stdin
select {
case <-ctx.Done():
logrus.Debugf("Sending TERM signal to process %v in container %v", name, c.ID)
daemon.containerd.SignalProcess(ctx, c.ID, name, int(signal.SignalMap["TERM"]))
daemon.ContainerExecKill(ctx, name, uint64(signal.SignalMap["TERM"]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we already have the exec config, it seems better to call the contaunerd API directly than to grab this info again.

@adamjseitz
Copy link

@awesomebytes are you planning to make these updates?

If not I'd be happy to do so and create a new PR. I'd like to see this get merged.

@awesomebytes
Copy link
Author

@adamjseitz I recently started a new job, and got a new computer (so I lost my workspace for this) so I don't foresee me having much time to push this forward unfortunately.

I've invited you to the repository of my fork so you can directly do any changes and we keep this PR going (and the conversation on just one place, maintainers will probably appreciate it). I hope that works for you!

Thanks for taking the baton!

@adamjseitz
Copy link

Looks like this needs a rebase (and commits squashed where it makes sense)

@thaJeztah Gave it a rebase and did a bit of squashing in the history. We could do more if you like, but I was trying to avoid squashing across commits with differing authors.

@adamjseitz
Copy link

Ah, well, looks like my rebasing managed to break something, I'll have to take a look at that tomorrow.

@adamjseitz adamjseitz force-pushed the 35703-exec-kill branch 2 times, most recently from e8db0eb to 157d306 Compare October 16, 2021 03:37
@adamjseitz
Copy link

adamjseitz commented Oct 16, 2021

@thaJeztah @awesomebytes

Fixed what I could figure out, but the swagger API validation failing, which has me quite confused. It appears that in the CI run, swagger generated some types in a different order than what is committed:

[2021-10-16T03:42:03.979Z] diff --git a/api/types/container/container_wait.go b/api/types/container/container_wait.go
[2021-10-16T03:42:03.979Z] index 49e05ae669..c2e69b5b25 100644
[2021-10-16T03:42:03.979Z] --- a/api/types/container/container_wait.go
[2021-10-16T03:42:03.979Z] +++ b/api/types/container/container_wait.go
[2021-10-16T03:42:03.979Z] @@ -6,14 +6,6 @@ package container // import "github.com/docker/docker/api/types/container"
[2021-10-16T03:42:03.979Z]  // See hack/generate-swagger-api.sh
[2021-10-16T03:42:03.979Z]  // ----------------------------------------------------------------------------
[2021-10-16T03:42:03.979Z]  
[2021-10-16T03:42:03.979Z] -// ContainerWaitOKBodyError container waiting error, if any
[2021-10-16T03:42:03.979Z] -// swagger:model ContainerWaitOKBodyError
[2021-10-16T03:42:03.979Z] -type ContainerWaitOKBodyError struct {
[2021-10-16T03:42:03.979Z] -
[2021-10-16T03:42:03.979Z] -	// Details of an error
[2021-10-16T03:42:03.979Z] -	Message string `json:"Message,omitempty"`
[2021-10-16T03:42:03.979Z] -}
[2021-10-16T03:42:03.979Z] -
[2021-10-16T03:42:03.979Z]  // ContainerWaitOKBody OK response to ContainerWait operation
[2021-10-16T03:42:03.979Z]  // swagger:model ContainerWaitOKBody
[2021-10-16T03:42:03.979Z]  type ContainerWaitOKBody struct {
[2021-10-16T03:42:03.979Z] @@ -26,3 +18,11 @@ type ContainerWaitOKBody struct {
[2021-10-16T03:42:03.979Z]  	// Required: true
[2021-10-16T03:42:03.979Z]  	StatusCode int64 `json:"StatusCode"`
[2021-10-16T03:42:03.979Z]  }
[2021-10-16T03:42:03.979Z] +
[2021-10-16T03:42:03.979Z] +// ContainerWaitOKBodyError container waiting error, if any
[2021-10-16T03:42:03.979Z] +// swagger:model ContainerWaitOKBodyError
[2021-10-16T03:42:03.979Z] +type ContainerWaitOKBodyError struct {
[2021-10-16T03:42:03.979Z] +
[2021-10-16T03:42:03.979Z] +	// Details of an error
[2021-10-16T03:42:03.979Z] +	Message string `json:"Message,omitempty"`
[2021-10-16T03:42:03.979Z] +}
[2021-10-16T03:42:03.979Z] 
[2021-10-16T03:42:03.979Z] Please update api/swagger.yaml with any API changes, then 
[2021-10-16T03:42:03.979Z] run hack/generate-swagger-api.sh. script returned exit code 1

I wasn't able to replicate this by running the validate scripts locally, and as far as I can tell this PR shouldn't affect those types (though I don't really know anything about swagger).

The remaining CI failures are the following known flaky/failing tests:

@awesomebytes
Copy link
Author

Hello @thaJeztah, @adamjseitz has done some great work to get the PR improved and it seems that the main problem is flaky tests. Can the tests be re-run by someone without adding additional commits?

@awesomebytes
Copy link
Author

Hello @thaJeztah, @adamjseitz, just friendly pinging about this PR again

@AkihiroSuda
Copy link
Member

Could you rebase?

@adamjseitz
Copy link

@AkihiroSuda thanks for taking a look. I rebased. CI failures this time looks like these known flaky tests:

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great.

Can you squash commits?

@@ -290,6 +290,11 @@ type ExecStartCheck struct {
Tty bool
}

// ExecKillOptions is a temp struct used by execKill
type ExecKillOptions struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an unused type?

@adamjseitz
Copy link

@cpuguy83

Overall looks great.

Can you squash commits?

I removed that unused type squashed it down to three commits. I was not sure if it is important to retain the DCO sign-offs from all three contributors. I can squash it into just one if you prefer.

Looks like all tests passed this time.

@thaJeztah
Copy link
Member

I was not sure if it is important to retain the DCO sign-offs from all three contributors. I can squash it into just one if you prefer.

It looks like the second and third commit are "touch ups" for the first one, so I think it's fine to squash the commits; w.r.t. the DCO sign-offs, it's ok to have multiple sign-offs in the commit, e.g.

Signed-Off-By: Person 1 <foo1@example.com>
Signed-Off-By: Person 2 <foo2@example.com>
Signed-Off-By: Person 3 <foo3@example.com>

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow response; this one kept dropping off my list. Overall, this looks good (thanks!) I left some comments/thoughts inline (feedback/input welcome!)

I did have some other thoughts though (perhaps worth a discussion); while looking at this, I started wondering if there would be situations where a graceful termination of exec processes would be needed.

I realize this endpoint os largely modelled after /container/{id}/kill, but for containers, we also have /container/{id}/stop, which allows for that (defaults to SIGTERM -> timeout -> SIGKILL). ContainerKill also looks to have extra handling;

func (daemon *Daemon) ContainerKill(name string, sig uint64) error {

e.g. for possibly dead processes:

moby/daemon/kill.go

Lines 141 to 146 in 12f1b3c

if err := daemon.killPossiblyDeadProcess(container, int(syscall.SIGKILL)); err != nil {
// kill failed, check if process is no longer running.
if isErrNoSuchProcess(err) {
return nil
}
}

Not sure if all of those are needed for exec though! But I see that ContainerExecStart does have some logic for graceful termination;

moby/daemon/exec.go

Lines 280 to 282 in 12f1b3c

case <-ctx.Done():
logrus.Debugf("Sending TERM signal to process %v in container %v", name, c.ID)
daemon.containerd.SignalProcess(ctx, c.ID, name, int(signal.SignalMap["TERM"]))

In case the /exec/{id}/kill endpoint is used with a custom signal, I think the only guarantee it would give is that the signal was sent, but if the intent in that case is to check if the process is terminated, would require manual checking (and re-trying with a different signal). Of course, other cases could be to only signal (non-terminal signals, e.g., to reload a process).

api/swagger.yaml Outdated
Comment on lines 8900 to 8902
/exec/{id}/kill:
post:
summary: "Kill an exec instance"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly a "random" thought; I know that we use kill for the equivalent container endpoint (and command, description), and the kill naming probably originates from it being the equivalent of kill on command line (on Linux).

That said; the naming has led to some confusion among users (as, depending on the signal, kill doesn't always kill)

So .. mostly a random thought (could use feedback on this); would it make sense to name this signal (as in: the endpoint would signal (send a signal to) an exec instance)? Wondering if that would be more descriptive.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming and graceful shutdown issues you mentioned above seem to be related, so I'll respond to both here.

As you noted above, /container/{id}/kill has some extra handling to wait for the container to stop if a SIGKILL signal is used. It is somewhat surprising behavior to me that which signal is sent changes other behavior (and this doesn't seem to be noted in the endpoint's documentation). The endpoint does feel a bit muddled to me - is it intended to behave like the Linux kill command, or is it intended to kill the process? It can do either one, but some behaviors, such as "send SIGKILL and don't wait" don't seem to be possible through the API. All this seems to stem from the confusion regarding the name.

Given that, this implementation of exec kill might indeed be better described as /exec/{id}/signal in order to help differentiate it. A second endpoint, /exec/{id}/stop might also be useful, mirroring the behavior of the equivalent container API for graceful shutdown.

Copy link

@adamjseitz adamjseitz Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a further look, some more of the container APIs would probably make sense to have exec equivalents:

  • /container/{id}/execs (or some other way to list execs in a container? not sure if some way to do this exists that I missed).
  • /exec/{id}/attach (looks like this has been previously proposed, see Attach to running exec #9527)
  • /exec/{id}/wait

I don't think that we would want to hold up this PR to create all these, though. I expect the signal endpoint, combined with polling the existing inspect endpoint if the exit status is needed, will fulfill a lot of users' needs.

api/swagger.yaml Outdated
/exec/{id}/kill:
post:
summary: "Kill an exec instance"
description: "Kill an exec instance."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should use a similar descriptor as is used for container kill;

moby/api/swagger.yaml

Lines 6569 to 6572 in 01ae952

description: |
Send a POSIX signal to a container, defaulting to killing to the
container.
operationId: "ContainerKill"

responses:
204:
description: "No error"
404:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need to add 400 as well (invalid parameter), which can happen if an invalid signal is passed.

@@ -18,6 +18,7 @@ type execBackend interface {
ContainerExecInspect(id string) (*backend.ExecInspect, error)
ContainerExecResize(name string, height, width int) error
ContainerExecStart(ctx context.Context, name string, stdin io.Reader, stdout io.Writer, stderr io.Writer) error
ContainerExecKill(ctx context.Context, name string, signame string) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should make this take a (to be created) types.ExecKillConfig, so that more options can be added without making a long list of arguments (I'll leave another comment with some thoughts why this could be needed; also see #43206).

return err
}

return daemon.containerd.SignalProcess(ctx, e.ContainerID, name, int(sig))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what (kind of) errors this can return?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to trace the possibilities here, and I think we either get a errdefs.NotFound, or the result of errdefs.FromGRPC. I'm not sure which errors we might end up with via the GRPC calls though.

@thaJeztah thaJeztah mentioned this pull request Feb 3, 2022
Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com>
Signed-off-by: Sammy Pfeiffer <sammypfeiffer@gmail.com>
Signed-off-by: Adam Seitz <adamjseitz@gmail.com>
@adamjseitz
Copy link

@thaJeztah

I pushed an update addressing your comments.

  • Renamed the endpoint from kill -> signal
  • Added an ExecSignalConfig type
  • Made updates to swagger.yaml.
  • Squashed everything to one commit.

CI failure looks like a known flaky test #39352. Let me know if we need to do anything differently with errors from daemon.containerd.SignalProcess. I am not too familiar with the error types.

agnostic-apollo added a commit to agnostic-apollo/termux-packages that referenced this pull request Jul 26, 2022
… to commands in some cases leaving processes still running

If `--tty` is not passed to `docker exec` because stdout is not available (`[ ! -t 1 ]`), like due to redirection to file (`&> build.log`) or if stdin is not available (`< /dev/null`), then docker does not forward kill signals to the process started and they remain running.

To fix the issue, the `DOCKER_EXEC_PID_FILE_PATH` env variable with the value `/tmp/docker-exec-pid-<timestamp>` is passed to the process called with `docke exec` and the process started stores its pid in the file path passed. Traps are set in `run-docker.sh` that runs the `docker exec` command to receive any kills signals, and if it does, it runs another `docker exec` command to read the pid of the process previously started from `DOCKER_EXEC_PID_FILE_PATH` and then kills it and all its children.

See Also:

docker/cli#2607
moby/moby#9098
moby/moby#41548
https://stackoverflow.com/questions/41097652/how-to-fix-ctrlc-inside-a-docker-container

Also passing `--init` to `docker run` to "Run an init inside the container that forwards signals and reaps processes", although it does not work for above cases, but may helpful in others. The `--init` flag changes will only engage on new container creation.

https://docs.docker.com/engine/reference/run/#specify-an-init-process

https://docs.docker.com/engine/reference/commandline/run/

```
./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
^C
$ ./scripts/run-docker.sh ps -efww
Running container 'termux-package-builder' from image 'termux/package-builder'...
UID          PID    PPID  C STIME TTY          TIME CMD
builder        1       0  0 05:48 pts/0    00:00:00 bash
builder     9243       0  0 06:01 pts/1    00:00:00 bash
builder    28127       0  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28141   28127  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28449   28141  1 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
builder    28656   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28657   28656 79 06:12 ?        00:00:01 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28694   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28695   28694 89 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28728   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28729   28728  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28731   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28734   28731  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28740   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28741   28740  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28744       0  0 06:12 pts/2    00:00:00 ps -efww
builder    28748   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28752   28748  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28753   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28754   28753  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28755   28449  0 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo
Running container 'termux-package-builder' from image 'termux/package-builder'...
ERROR: Another build is already running within same environment.
```
agnostic-apollo added a commit to agnostic-apollo/termux-packages that referenced this pull request Jul 26, 2022
… to commands in some cases leaving processes still running

If `--tty` is not passed to `docker exec` because stdout is not available (`[ ! -t 1 ]`), like due to redirection to file (`&> build.log`) or if stdin is not available (`< /dev/null`), then docker does not forward kill signals to the process started and they remain running.

To fix the issue, the `DOCKER_EXEC_PID_FILE_PATH` env variable with the value `/tmp/docker-exec-pid-<timestamp>` is passed to the process called with `docke exec` and the process started stores its pid in the file path passed. Traps are set in `run-docker.sh` that runs the `docker exec` command to receive any kills signals, and if it does, it runs another `docker exec` command to read the pid of the process previously started from `DOCKER_EXEC_PID_FILE_PATH` and then kills it and all its children.

See Also:

docker/cli#2607
moby/moby#9098
moby/moby#41548
https://stackoverflow.com/questions/41097652/how-to-fix-ctrlc-inside-a-docker-container

Also passing `--init` to `docker run` to "Run an init inside the container that forwards signals and reaps processes", although it does not work for above cases, but may helpful in others. The `--init` flag changes will only engage on new container creation.

https://docs.docker.com/engine/reference/run/#specify-an-init-process

https://docs.docker.com/engine/reference/commandline/run/

```
./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
^C
$ ./scripts/run-docker.sh ps -efww
Running container 'termux-package-builder' from image 'termux/package-builder'...
UID          PID    PPID  C STIME TTY          TIME CMD
builder        1       0  0 05:48 pts/0    00:00:00 bash
builder     9243       0  0 06:01 pts/1    00:00:00 bash
builder    28127       0  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28141   28127  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28449   28141  1 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
builder    28656   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28657   28656 79 06:12 ?        00:00:01 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28694   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28695   28694 89 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28728   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28729   28728  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28731   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28734   28731  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28740   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28741   28740  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28744       0  0 06:12 pts/2    00:00:00 ps -efww
builder    28748   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28752   28748  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28753   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28754   28753  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28755   28449  0 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo
Running container 'termux-package-builder' from image 'termux/package-builder'...
ERROR: Another build is already running within same environment.
```
agnostic-apollo added a commit to agnostic-apollo/termux-packages that referenced this pull request Jul 28, 2022
… to commands in some cases leaving processes still running

If `--tty` is not passed to `docker exec` because stdout is not available (`[ ! -t 1 ]`), like due to redirection to file (`&> build.log`) or if stdin is not available (`< /dev/null`), then docker does not forward kill signals to the process started and they remain running.

To fix the issue, the `DOCKER_EXEC_PID_FILE_PATH` env variable with the value `/tmp/docker-exec-pid-<timestamp>` is passed to the process called with `docke exec` and the process started stores its pid in the file path passed. Traps are set in `run-docker.sh` that runs the `docker exec` command to receive any kills signals, and if it does, it runs another `docker exec` command to read the pid of the process previously started from `DOCKER_EXEC_PID_FILE_PATH` and then kills it and all its children.

See Also:

docker/cli#2607
moby/moby#9098
moby/moby#41548
https://stackoverflow.com/questions/41097652/how-to-fix-ctrlc-inside-a-docker-container

Also passing `--init` to `docker run` to "Run an init inside the container that forwards signals and reaps processes", although it does not work for above cases, but may helpful in others. The `--init` flag changes will only engage on new container creation.

https://docs.docker.com/engine/reference/run/#specify-an-init-process

https://docs.docker.com/engine/reference/commandline/run/

```
./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
^C
$ ./scripts/run-docker.sh ps -efww
Running container 'termux-package-builder' from image 'termux/package-builder'...
UID          PID    PPID  C STIME TTY          TIME CMD
builder        1       0  0 05:48 pts/0    00:00:00 bash
builder     9243       0  0 06:01 pts/1    00:00:00 bash
builder    28127       0  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28141   28127  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28449   28141  1 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
builder    28656   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28657   28656 79 06:12 ?        00:00:01 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28694   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28695   28694 89 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28728   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28729   28728  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28731   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28734   28731  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28740   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28741   28740  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28744       0  0 06:12 pts/2    00:00:00 ps -efww
builder    28748   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28752   28748  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28753   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28754   28753  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28755   28449  0 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo
Running container 'termux-package-builder' from image 'termux/package-builder'...
ERROR: Another build is already running within same environment.
```
@awesomebytes
Copy link
Author

@thaJeztah I guess this PR will need to be updated again and reviewed again?

@adamjseitz
Copy link

@thaJeztah I guess this PR will need to be updated again and reviewed again?

I'd be happy to update it again if there's a chance it's actually going to get merged this time.

@uellenberg
Copy link

I would love to see this feature available in Docker! I'm not terribly familiar with Moby's internals, but I can give updating it a shot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

introduce docker exec kill
8 participants