Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kill docker exec command will not terminate the spawned process #9098

Open
dqminh opened this issue Nov 11, 2014 · 67 comments
Open

Kill docker exec command will not terminate the spawned process #9098

dqminh opened this issue Nov 11, 2014 · 67 comments
Labels
area/runtime exp/expert kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/master

Comments

@dqminh
Copy link
Contributor

dqminh commented Nov 11, 2014

Whenever a process is launched via docker exec, it seems that killing docker exec will not terminate the process. For example:

> docker run -d --name test-exec busybox top
> docker exec -it test-exec sh
/ # # we have an exec shell now. assume pid of docker exec is 1234
> kill 1234
# docker exec process is terminated atm, but `nsenter-exec` process is still running with sh as its child

I would expect that killing docker exec -it process will also kill the spawned process, or there should be a way to stop the spawn process similar to how docker stop works.

My version of docker:

❯ docker version
Client version: 1.3.1-dev
Client API version: 1.16
Go version (client): go1.3.3
Git commit (client): c049949
OS/Arch (client): linux/amd64
Server version: 1.3.1-dev
Server API version: 1.16
Go version (server): go1.3.3
Git commit (server): c049949

❯ docker info
Containers: 1
Images: 681
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 693
Execution Driver: native-0.2
Kernel Version: 3.13.0-33-generic
Operating System: Ubuntu 14.04.1 LTS
CPUs: 2
Total Memory: 1.955 GiB
Debug mode (server): true
Debug mode (client): false
Fds: 17
Goroutines: 16
EventsListeners: 0
Init Path: /home/action/bin/docker
Username: dqminh
Registry: [https://index.docker.io/v1/]
WARNING: No swap limit support
@SvenDowideit
Copy link
Contributor

mmm, I've just followed your example, and have a mildly different result?

[sven@t440s docker]$ docker run -d -name test-exec busybox top
Warning: '-name' is deprecated, it will be replaced by '--name' soon. See usage.
0daecd23a78f05990466c9f7d1094c737771a0cc15142588bb57ebd6b7f99c5f
[sven@t440s docker]$ docker exec -it test-exec sh
/ # ps
PID   USER     COMMAND
    1 root     top
    7 root     sh
   13 root     ps
/ # kill 7
/ # ps
PID   USER     COMMAND
    1 root     top
    7 root     sh
   14 root     ps
/ # kill -9 7
[sven@t440s docker]$ docker exec -it test-exec ps aux
PID   USER     COMMAND
    1 root     top
   15 root     ps aux
[sven@t440s docker]$ docker version
Client version: 1.3.1
Client API version: 1.15
Go version (client): go1.3.3
Git commit (client): 4e9bbfa
OS/Arch (client): linux/amd64
Server version: 1.3.1
Server API version: 1.15
Go version (server): go1.3.3
Git commit (server): 4e9bbfa
[sven@t440s docker]$ 

and the non-containerised version works the same:

[sven@t440s docker]$ sh
sh-4.2$ ps
  PID TTY          TIME CMD
11920 pts/3    00:00:00 bash
12090 pts/3    00:00:00 sh
12091 pts/3    00:00:00 ps
sh-4.2$ kill 12090
sh-4.2$ 
sh-4.2$ ps
  PID TTY          TIME CMD
11920 pts/3    00:00:00 bash
12090 pts/3    00:00:00 sh
12092 pts/3    00:00:00 ps
sh-4.2$ kill -HUP 12090
Hangup

so for me, its works as intended (

@dqminh
Copy link
Contributor Author

dqminh commented Nov 12, 2014

@SvenDowideit ah, my use case is that the docker exec process is killed from outside of the container, not the process started by docker exec inside the container. For example, after running docker exec, the tree will look like ( pseudo pids here to illustrate the point ) :

1024 --- docker run -d -it --name test-exec busybox top
1025 --- docker exec -it --name test-exec sh
10 --- docker -d
  \ 10000 --- top
  \ 10001 --- nsenter-exec --nspid 23119 --console /dev/pts/19 -- sh
          \---- sh

Now if i do kill 1025, which kill the docker exec process, the process tree becomes:

1024 --- docker run -d -it --name test-exec busybox top
10 --- docker -d
  \ 10000 --- top
  \ 10001 --- nsenter-exec --nspid 23119 --console /dev/pts/19 -- sh
          \---- sh

I would expect nsenter-exec to be killed as well and/or maybe docker should expose a way to programatically stopped the exec-process from outside.

@SvenDowideit
Copy link
Contributor

ah, good to know more info :)

@dqminh
Copy link
Contributor Author

dqminh commented Nov 12, 2014

ah, good to know more info :)

Yes, i should have included the process tree from the start as it's much easier to know what's going on. Should not submit an issue at 5am i guess :(

@SvenDowideit
Copy link
Contributor

mmm, ok, so I agree - I too would expect that docker exec would trap the kill signal and pass it on to the Docker daemon, which should then pass the signal on to the exec'd child

I don't see much in the way of support for this in the API, http://docs.docker.com/reference/api/docker_remote_api_v1.15/#exec-create, so bug?

@SvenDowideit
Copy link
Contributor

@vieux @proppy what do you think? (I'm going off the MAINTAINERs file :))

@proppy
Copy link
Contributor

proppy commented Nov 12, 2014

Yes and I don't see where the pid for the child is (if at all) stored in the ExecConfig.

@proppy
Copy link
Contributor

proppy commented Nov 12, 2014

/cc @vishh

@vishh
Copy link
Contributor

vishh commented Nov 12, 2014

Terminating a running 'exec' session via an API has not been implemented
yet.
@proppy: Yes, the child pid is not stored as part of ExecConfig.

On Tue, Nov 11, 2014 at 10:41 PM, Johan Euphrosine <notifications@github.com

wrote:

/cc @vishh https://github.com/vishh


Reply to this email directly or view it on GitHub
#9098 (comment).

@dqminh
Copy link
Contributor Author

dqminh commented Nov 12, 2014

@vishh do you think adding support for POST /exec/:name/stop (and maybe POST /exec/:name/kill) make senses here ( similar to POST /containers/:name/stop and POST /containers/:name/kill ) ? That would actually solve majority of my usecase as I mainly consume the remote API ( which makes the exec process's unique id available with POST /exec/:name/create )

It's probably much harder to do it from the docker cli though as we don't really expose the exec's id anywhere.

@vishh
Copy link
Contributor

vishh commented Nov 12, 2014

Yes. A stop/kill daemon api makes sense to me. For the CLI case, I need to
see if the daemon can automatically terminate an abandoned interactive
'exec' command.

On Tue, Nov 11, 2014 at 11:14 PM, Daniel, Dao Quang Minh <
notifications@github.com> wrote:

@vishh https://github.com/vishh do you think adding support for POST
/exec/:name/stop (and maybe POST /exec/:name/kill) make senses here (
similar to POST /containers/:name/stop and POST /containers/:name/kill )
? That would actually solve majority of my usecase as I mainly consume the
remote API ( which makes the exec process's unique id available with POST
/exec/:name/create )

It's probably much harder to do it from the docker cli though as we don't
really expose the exec's id anywhere.


Reply to this email directly or view it on GitHub
#9098 (comment).

@LK4D4
Copy link
Contributor

LK4D4 commented Nov 12, 2014

@vishh I'm not sure how we can implement this auto-terminate. Maybe we can have some list api for exec? And make exec jobs dependent on container, so on container deleting - all abandoned jobs deletes too.

@vishh
Copy link
Contributor

vishh commented Nov 12, 2014

AFAIK exec jobs should get terminated on container deletion. Is that not
the case?

On Wed, Nov 12, 2014 at 9:23 AM, Alexandr Morozov notifications@github.com
wrote:

@vishh https://github.com/vishh I'm not sure how we can implement this
auto-terminate. Maybe we can have some list api for exec? And make exec
jobs dependent on container, so on container deleting - all abandoned jobs
deletes too.


Reply to this email directly or view it on GitHub
#9098 (comment).

@thaJeztah
Copy link
Member

Maybe we can have some list api for exec?

Perhaps add a way to see all processes related to a container? Eg

docker containers ps <containerid>

Which will include the exec process.

@vishh
Copy link
Contributor

vishh commented Nov 12, 2014

Good point. We should expose exec jobs belonging to a container.

On Wed, Nov 12, 2014 at 9:59 AM, Sebastiaan van Stijn <
notifications@github.com> wrote:

Maybe we can have some list api for exec?

Perhaps add a way to see all processes related to a container? Eg

docker containers ps

Which will include the exec process.


Reply to this email directly or view it on GitHub
#9098 (comment).

@LK4D4
Copy link
Contributor

LK4D4 commented Nov 12, 2014

@vishh eh, I meant internal execStore. Yeah, it is a little different, because I wanted to add method for getting exitCode of exec job and be sure that this job will be deleted from execStore. (All I can imagine is pretty ugly)

@anandkumarpatel
Copy link
Contributor

+1
this also causes a go routine leak. 3 go routines are leaked whenever this happens.

@dqminh
Copy link
Contributor Author

dqminh commented Nov 14, 2014

I proposed additional extensions for the remote API to stop/kill exec command here #9167 . That should fix my particular use case ( programmatically managing exec commands )

The proposal doesn't include CLI changes as i'm not sure what is the appropriate interface for exposing exec sessions yet.

@thomasthiriez
Copy link

An alternative to killing the spawned process would be to close stdin, stdout and stderr when docker exec is killed. In most cases, such as when a shell is being exected, the spawned process will quit when stdin is closed.

Currently, it seems that when docker exec is killed, the spawned process still has a stdin with nobody attached to.

I don't know if closing stdin would be a better alternative to killing the spawned process.

@jessfraz
Copy link
Contributor

I get this on latest:

/ # kill 3899
sh: can't kill pid 3899: No such process
/ # kill 3900
sh: can't kill pid 3900: No such process
/ # 

super weirrd

@ghost
Copy link

ghost commented Mar 2, 2015

Container becomes unresponsive after creating some random number of exec instances :(. Could be related to this .. +1 for ability to destroy them via the remote API.

@spf13 spf13 added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. and removed exp/expert bug labels Mar 21, 2015
josegonzalez added a commit to dokku/dokku-nats that referenced this issue Sep 13, 2021
…for :connect and :enter commands

Apparently terminating the ssh connection that runs 'docker exec' may result in a process leak as the signal isn't propagated properly (moby/moby#9098). Since we cannot fix this, we should document it so that users do not stumble upon the issue unawares.

Closes dokku/dokku-postgres#212
josegonzalez added a commit to dokku/dokku-omnisci that referenced this issue Sep 13, 2021
…for :connect and :enter commands

Apparently terminating the ssh connection that runs 'docker exec' may result in a process leak as the signal isn't propagated properly (moby/moby#9098). Since we cannot fix this, we should document it so that users do not stumble upon the issue unawares.

Closes dokku/dokku-postgres#212
josegonzalez added a commit to dokku/dokku-postgres that referenced this issue Sep 13, 2021
…for :connect and :enter commands

Apparently terminating the ssh connection that runs 'docker exec' may result in a process leak as the signal isn't propagated properly (moby/moby#9098). Since we cannot fix this, we should document it so that users do not stumble upon the issue unawares.

Closes #212
josegonzalez added a commit to dokku/dokku-pushpin that referenced this issue Sep 13, 2021
…for :connect and :enter commands

Apparently terminating the ssh connection that runs 'docker exec' may result in a process leak as the signal isn't propagated properly (moby/moby#9098). Since we cannot fix this, we should document it so that users do not stumble upon the issue unawares.

Closes dokku/dokku-postgres#212
josegonzalez added a commit to dokku/dokku-rabbitmq that referenced this issue Sep 13, 2021
…for :connect and :enter commands

Apparently terminating the ssh connection that runs 'docker exec' may result in a process leak as the signal isn't propagated properly (moby/moby#9098). Since we cannot fix this, we should document it so that users do not stumble upon the issue unawares.

Closes dokku/dokku-postgres#212
josegonzalez added a commit to dokku/dokku-redis that referenced this issue Sep 13, 2021
…for :connect and :enter commands

Apparently terminating the ssh connection that runs 'docker exec' may result in a process leak as the signal isn't propagated properly (moby/moby#9098). Since we cannot fix this, we should document it so that users do not stumble upon the issue unawares.

Closes dokku/dokku-postgres#212
josegonzalez added a commit to dokku/dokku-rethinkdb that referenced this issue Sep 13, 2021
…for :connect and :enter commands

Apparently terminating the ssh connection that runs 'docker exec' may result in a process leak as the signal isn't propagated properly (moby/moby#9098). Since we cannot fix this, we should document it so that users do not stumble upon the issue unawares.

Closes dokku/dokku-postgres#212
josegonzalez added a commit to dokku/dokku-solr that referenced this issue Sep 13, 2021
…for :connect and :enter commands

Apparently terminating the ssh connection that runs 'docker exec' may result in a process leak as the signal isn't propagated properly (moby/moby#9098). Since we cannot fix this, we should document it so that users do not stumble upon the issue unawares.

Closes dokku/dokku-postgres#212
@nardavin
Copy link

Greetings from the year 2022, where I lost my Friday night to this issue. I was also in the situation where I was trying to run docker exec commands from within a python subprocess.Popen call. My most elegant workaround was to find the PID of the persistent process inside the docker container by using subprocess.run to execute:

docker exec my_container bash -c 'pgrep -xf "my_exact_persistent_command"'

with that PID, I'm able to easily kill it with another subprocess.run of:

docker exec my_container bash -c 'kill my_pid'

agnostic-apollo added a commit to agnostic-apollo/termux-packages that referenced this issue Jul 26, 2022
… to commands in some cases leaving processes still running

If `--tty` is not passed to `docker exec` because stdout is not available (`[ ! -t 1 ]`), like due to redirection to file (`&> build.log`) or if stdin is not available (`< /dev/null`), then docker does not forward kill signals to the process started and they remain running.

To fix the issue, the `DOCKER_EXEC_PID_FILE_PATH` env variable with the value `/tmp/docker-exec-pid-<timestamp>` is passed to the process called with `docke exec` and the process started stores its pid in the file path passed. Traps are set in `run-docker.sh` that runs the `docker exec` command to receive any kills signals, and if it does, it runs another `docker exec` command to read the pid of the process previously started from `DOCKER_EXEC_PID_FILE_PATH` and then kills it and all its children.

See Also:

docker/cli#2607
moby/moby#9098
moby/moby#41548
https://stackoverflow.com/questions/41097652/how-to-fix-ctrlc-inside-a-docker-container

Also passing `--init` to `docker run` to "Run an init inside the container that forwards signals and reaps processes", although it does not work for above cases, but may helpful in others. The `--init` flag changes will only engage on new container creation.

https://docs.docker.com/engine/reference/run/#specify-an-init-process

https://docs.docker.com/engine/reference/commandline/run/

```
./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
^C
$ ./scripts/run-docker.sh ps -efww
Running container 'termux-package-builder' from image 'termux/package-builder'...
UID          PID    PPID  C STIME TTY          TIME CMD
builder        1       0  0 05:48 pts/0    00:00:00 bash
builder     9243       0  0 06:01 pts/1    00:00:00 bash
builder    28127       0  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28141   28127  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28449   28141  1 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
builder    28656   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28657   28656 79 06:12 ?        00:00:01 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28694   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28695   28694 89 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28728   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28729   28728  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28731   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28734   28731  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28740   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28741   28740  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28744       0  0 06:12 pts/2    00:00:00 ps -efww
builder    28748   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28752   28748  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28753   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28754   28753  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28755   28449  0 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo
Running container 'termux-package-builder' from image 'termux/package-builder'...
ERROR: Another build is already running within same environment.
```
agnostic-apollo added a commit to agnostic-apollo/termux-packages that referenced this issue Jul 26, 2022
… to commands in some cases leaving processes still running

If `--tty` is not passed to `docker exec` because stdout is not available (`[ ! -t 1 ]`), like due to redirection to file (`&> build.log`) or if stdin is not available (`< /dev/null`), then docker does not forward kill signals to the process started and they remain running.

To fix the issue, the `DOCKER_EXEC_PID_FILE_PATH` env variable with the value `/tmp/docker-exec-pid-<timestamp>` is passed to the process called with `docke exec` and the process started stores its pid in the file path passed. Traps are set in `run-docker.sh` that runs the `docker exec` command to receive any kills signals, and if it does, it runs another `docker exec` command to read the pid of the process previously started from `DOCKER_EXEC_PID_FILE_PATH` and then kills it and all its children.

See Also:

docker/cli#2607
moby/moby#9098
moby/moby#41548
https://stackoverflow.com/questions/41097652/how-to-fix-ctrlc-inside-a-docker-container

Also passing `--init` to `docker run` to "Run an init inside the container that forwards signals and reaps processes", although it does not work for above cases, but may helpful in others. The `--init` flag changes will only engage on new container creation.

https://docs.docker.com/engine/reference/run/#specify-an-init-process

https://docs.docker.com/engine/reference/commandline/run/

```
./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
^C
$ ./scripts/run-docker.sh ps -efww
Running container 'termux-package-builder' from image 'termux/package-builder'...
UID          PID    PPID  C STIME TTY          TIME CMD
builder        1       0  0 05:48 pts/0    00:00:00 bash
builder     9243       0  0 06:01 pts/1    00:00:00 bash
builder    28127       0  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28141   28127  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28449   28141  1 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
builder    28656   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28657   28656 79 06:12 ?        00:00:01 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28694   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28695   28694 89 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28728   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28729   28728  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28731   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28734   28731  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28740   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28741   28740  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28744       0  0 06:12 pts/2    00:00:00 ps -efww
builder    28748   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28752   28748  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28753   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28754   28753  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28755   28449  0 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo
Running container 'termux-package-builder' from image 'termux/package-builder'...
ERROR: Another build is already running within same environment.
```
agnostic-apollo added a commit to agnostic-apollo/termux-packages that referenced this issue Jul 28, 2022
… to commands in some cases leaving processes still running

If `--tty` is not passed to `docker exec` because stdout is not available (`[ ! -t 1 ]`), like due to redirection to file (`&> build.log`) or if stdin is not available (`< /dev/null`), then docker does not forward kill signals to the process started and they remain running.

To fix the issue, the `DOCKER_EXEC_PID_FILE_PATH` env variable with the value `/tmp/docker-exec-pid-<timestamp>` is passed to the process called with `docke exec` and the process started stores its pid in the file path passed. Traps are set in `run-docker.sh` that runs the `docker exec` command to receive any kills signals, and if it does, it runs another `docker exec` command to read the pid of the process previously started from `DOCKER_EXEC_PID_FILE_PATH` and then kills it and all its children.

See Also:

docker/cli#2607
moby/moby#9098
moby/moby#41548
https://stackoverflow.com/questions/41097652/how-to-fix-ctrlc-inside-a-docker-container

Also passing `--init` to `docker run` to "Run an init inside the container that forwards signals and reaps processes", although it does not work for above cases, but may helpful in others. The `--init` flag changes will only engage on new container creation.

https://docs.docker.com/engine/reference/run/#specify-an-init-process

https://docs.docker.com/engine/reference/commandline/run/

```
./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
^C
$ ./scripts/run-docker.sh ps -efww
Running container 'termux-package-builder' from image 'termux/package-builder'...
UID          PID    PPID  C STIME TTY          TIME CMD
builder        1       0  0 05:48 pts/0    00:00:00 bash
builder     9243       0  0 06:01 pts/1    00:00:00 bash
builder    28127       0  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28141   28127  0 06:12 ?        00:00:00 /bin/bash ./build-package.sh -f libjpeg-turbo
builder    28449   28141  1 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
builder    28656   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28657   28656 79 06:12 ?        00:00:01 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28694   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28695   28694 89 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28728   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28729   28728  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28731   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28734   28731  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28740   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28741   28740  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28744       0  0 06:12 pts/2    00:00:00 ps -efww
builder    28748   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28752   28748  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28753   28449  0 06:12 ?        00:00:00 /bin/sh -c /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28754   28753  0 06:12 ?        00:00:00 /home/builder/.termux-build/_cache/android-r23c-api-24-v0/bin/clang
builder    28755   28449  0 06:12 ?        00:00:00 ninja -w dupbuild=warn -j 8
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo  &> build.log
$ ./scripts/run-docker.sh ./build-package.sh -f libjpeg-turbo
Running container 'termux-package-builder' from image 'termux/package-builder'...
ERROR: Another build is already running within same environment.
```
@SkyperTHC
Copy link

I ran into the same problem (with hundreds of stale shells not getting SIGHUP when the docker exec client received the SIGHUP). Also, looks like half of the Internet thinks that --init would solve it - obviously not and sends people into the wrong direction.

I wrote it down here:
https://gist.github.com/SkyperTHC/cb4ebb633890ac36ad86e80c6c7a9bb2

The workaround at the moment is a clean-up cron job - it's a mess.

@awesomebytes
Copy link

awesomebytes commented Sep 12, 2022

@SkyperTHC

If you want another workaround, a few comments up I made this one:
#9098 (comment)

Also, the PR to add the correct kill behaviour is still pending here: #41548

@SkyperTHC
Copy link

@awesomebytes

We use -it and thus your solution did not work for us. (but thanks for the great work).

I wrote my own solution:
https://github.com/hackerschoice/segfault/blob/main/host/docker-exec-sigproxy.c

The tool intercepts traffic on the /var/run/docker.sock and detects when a 'docker exec' happens. It registers all signals and then forwards (proxies) the signal to the process running inside the instance.

Old command:

docker exec -it alpine

new command:

docker-exec-sigproxy exec -it alpine

I wonder why docker wont add the --sig-proxy=true to the 'docker exec'......Half the Internet is crying about stale processes and are being suggested to use --init and send down the wrong path...

@peternann
Copy link

I would have thought that the 'proper' solution here is to NOT cascade signals (since that can never be 100% reliable) and instead the container-side pty should detect that the connection is lost, and carry out the normal SIGHUP behaviours that were designed and reliable in the 1970's based on RS-232 terminals.

Is this approach not possible?

@SkyperTHC
Copy link

I would have thought that the 'proper' solution here is to NOT cascade signals (since that can never be 100% reliable) and instead the container-side pty should detect that the connection is lost, and carry out the normal SIGHUP behaviours that were designed and reliable in the 1970's based on RS-232 terminals.

Is this approach not possible?

What you are describing is how every user would expect docker to behave - including me.

The tool I provided adds exactly that behaviour: The docker container (you call it "app" above) will receive a SIGHUP when the 'docker exec' disconnects (e.g. terminates). (and yes, cascading signals is reliable in this instance. The kernel wont drop signals or forget about them - they will get delivered).

Docker does not do this. In docker-exec-land the "app" is executed within its own PTY harness and will not receive a SIGHUP if the docker-exec client 'disconnects' (hangs up).

I've explained the details of this 'misbehaviour' above in my earliest post. The details are far more complicated and have to do how docker-exec instructs the Linux Kernel to start the 'app' from PID=1 etc etc and thus behave very different than the 1970's RS-232 terminal would have. Anyway, my tool above makes docker-exec behave as it was in the 70s.

@jmlord
Copy link

jmlord commented Mar 7, 2023

We have been using @SkyperTHC 's docker-exec-sigproxy fine for some time, when we hit a problem: for a long-running process, the socket dropped after exactly 5 minutes. After several attempts at changing this delay, we ended up dropping the signal proxy and launching a second exec to kill the process by pid.

  • The initial exec command prints its pid to a unique file (we already had a unique output folder for every run). Example when launching Julia script:
command = listOf("/usr/local/bin/docker", "exec", "-i", container, "julia", "-e",
    """
    open("${pidFile.absolutePath}", "w") do file write(file, string(getpid())) end;
    ARGS=["${outputFolder.absolutePath}"];
    include("${scriptFile.absolutePath}")
    """
)
  • If the process is cancelled, we read the file and issue a SIGTERM
/usr/local/bin/docker exec -i <container> kill -s TERM <pid>
  • The .pid file is removed upon completion or termination

@Tofandel
Copy link

Tofandel commented Jul 5, 2023

The new docker exec doc is there https://docs.docker.com/engine/api/v1.43/#tag/Exec/operation/ExecStart

There is still no way to kill a started exec

@rvveber
Copy link

rvveber commented Feb 6, 2024

Just ran into this problem now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime exp/expert kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/master
Projects
None yet
Development

Successfully merging a pull request may close this issue.