Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containers remain running after exiting #114

Open
alice-mkh opened this issue Apr 13, 2019 · 21 comments · May be fixed by #541
Open

Containers remain running after exiting #114

alice-mkh opened this issue Apr 13, 2019 · 21 comments · May be fixed by #541
Labels
1. Bug Something isn't working
Milestone

Comments

@alice-mkh
Copy link

In particular, this means it's impossible to remove a toolbox-created container without first stopping/killing it with podman:

 ~/toolbox  ./toolbox create -c test
Created container: test
Enter with: toolbox enter --container test
 ~/toolbox  ./toolbox enter -c test 
🔹[exalm@toolbox toolbox]$ logout
 ~/toolbox  ./toolbox rm test
toolbox: failed to remove container test
 ~/toolbox  podman ps
CONTAINER ID  IMAGE                              COMMAND     CREATED         STATUS             PORTS  NAMES
1a08f09ea797  localhost/fedora-toolbox-exalm:30  sleep +Inf  16 seconds ago  Up 10 seconds ago         test
 ~/toolbox  podman stop test
1a08f09ea79710801859bea8dc6a5a85d2031ce1a73dd7d284c3e1fa51a67be0
 ~/toolbox  ./toolbox rm test
 ~/toolbox  
@debarshiray
Copy link
Member

toolbox rm --force should also work.

But yes, I'd like to make this properly reference counted, but sadly, I don't know of a way to implement that using the existing Podman command line interface.

@imciner2
Copy link

Using toolbox rm --force is not able to delete the container when it is running on my machine.

Perhaps what would could be done is to detect if the container is running when deleting it and then give the user a prompt such as This toolbox is currently running, are you sure you wish to delete it [y/N]: Then call podman stop before calling the delete command if they choose to continue.

@paul8046
Copy link

I am unable to delete a container created by toolbox even after using toolbox rm --force and stopping it with podman. I am having to reboot and then toolbox rm works.

@debarshiray
Copy link
Member

I am unable to delete a container created by toolbox even after using
toolbox rm --force and stopping it with podman. I am having to reboot
and then toolbox rm works.

It will fail if you have currently active toolbox enter sessions. We need to improve the error handling there.

Otherwise, if that's not the case and you can reproduce at will, then I suggest trying podman rm --force <container> to delete the container. If that also fails, then we might have a Podman bug. In any case, let's use a different issue to discuss this.

Thanks for stopping by!

@bam80
Copy link

bam80 commented Aug 26, 2020

Considering it was reported almost 1.5 years ago, I'm wondering if there was any progress since then.

@bam80
Copy link

bam80 commented Aug 26, 2020

Seems I can't stop the containers left after Toolbox:

[bam@host ~]$ toolbox list
IMAGE ID      IMAGE NAME                                            CREATED
a198bc8c3cda  registry.fedoraproject.org/f31/fedora-toolbox:31      9 months ago
fe7b8c2393f9  registry.fedoraproject.org/f32/fedora-toolbox:32      4 months ago
3864bc58ab7b  registry.fedoraproject.org/f33/fedora-toolbox:33      4 months ago
b390f0663e2a  registry.fedoraproject.org/f33/fedora-toolbox:latest  2 weeks ago

CONTAINER ID  CONTAINER NAME     CREATED       STATUS      IMAGE NAME
c27048bea726  fedora-toolbox-31  6 months ago  configured  registry.fedoraproject.org/f31/fedora-toolbox:31
f48d171dc79e  fedora-toolbox-32  3 months ago  running     registry.fedoraproject.org/f32/fedora-toolbox:32
...
f429c215fa02  toolbox            3 hours ago   running     registry.fedoraproject.org/f32/fedora-toolbox:32


[bam@host ~]$ podman stop fedora-toolbox-32 
2020-08-26T14:33:24.000938739Z: kill process 3459: Operation not permitted
Error: operation not permitted
[bam@host ~]$ podman stop toolbox 
2020-08-26T14:33:32.000453021Z: kill process 3318: Operation not permitted
Error: operation not permitted

[bam@host ~]$ sudo podman stop toolbox 
[sudo] password for bam: 
Error: no container with name or ID toolbox found: no such container

@bam80
Copy link

bam80 commented Aug 26, 2020

[bam@host ~]$ podman stop fedora-toolbox-32 
2020-08-26T14:33:24.000938739Z: kill process 3459: Operation not permitted
Error: operation not permitted

The reason of the error is seems conmon subprocesses run with weird PID 100000:

bam         3315    1332  0 16:32 ?        Ssl    0:00  \_ /usr/bin/conmon --api-version 1 -c f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb9e1e0a3d9f1220b -u f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb
100000      3318    3315  0 16:32 ?        Ss     0:00  |   \_ sleep +Inf
...
bam         3456    1332  0 16:33 ?        Ssl    0:00  \_ /usr/bin/conmon --api-version 1 -c f48d171dc79e0db510fa334827fa5a4693b1f952221bc916666f408d845d5b92 -u f48d171dc79e0db510fa334827fa5a4693b1f952221bc9166
100000      3459    3456  0 16:33 ?        Ss     0:00  |   \_ sleep +Inf
[bam@host ~]$ ll /var/home/bam/.local/share/containers/storage/overlay-containers/f429c215fa02f617362a0b17c4045eb32d4d8c461c38248fb9e1e0a3d9f1220b/
total 4
drwx------. 3 100000 100000 4096 Aug 26 16:38 userdata

What is it? Is it an error, or it's by design?
In the former case, how could I fix my containers?

@debarshiray
Copy link
Member

podman stop <container> should definitely work, unless you have active toolbox enter or podman run sessions.

@debarshiray
Copy link
Member

So far, I can think of two different ways to make Toolbox containers reference-counted so that they automatically stop once the last toolbox enter or toolbox run session has terminated.

(Note that stopping the container is the same thing as terminating the container's entry point process.)

  • The enter and run commands use POSIX signals to tell the container's entry point that a new session is about to start, or has just ended. eg., it could send SIGUSR1 for one and SIGUSR2 for the other. The entry point handles these signals and keeps a reference count of the number of active sessions. Once the counter hits zero, it terminates.

This can be implemented with Go channels, os/signal and such. Here is an example.

The downside of this is that it's not resilient against crashes in the enter and run commands. If they crash, then the second signal indicating the end of the session might not get sent.

  • The enter and run sessions acquire shared file locks (ie., flock --shared ...) and the entry point blocks trying to acquire an exclusive lock (ie., flock --exclusive ...) on a common file. The entry point will be unblocked once all shared locks have been released by the active sessions, and then it can terminate.

The nice thing about this is that locks are automatically released by the kernel when a process terminates. So, even if the enter and run commands crash, the locks would get released.

@bam80
Copy link

bam80 commented Aug 26, 2020

podman stop <container> should definitely work, unless you
have active toolbox enter or podman run sessions.

Still it doesn't, and I have no running sessions.

This sounds like a Podman bug.

If you can repeatedly reproduce this, then I'd suggest filing a Podman bug. It would be even better if you can reproduce this just with Podman commands. eg., podman create ... sleep +Inf a container, then podman start ... and so on.

However, I can kill those sleep processes with 100000 PIDs as usual user,
and then the session stops.
Do you have an idea where that 100000 PIDs come from?

I think those are UIDs, not PIDs.

Those UIDs look big because they are inside a user namespace.

debarshiray added a commit to debarshiray/toolbox that referenced this issue Aug 26, 2020
Currently, once a toolbox container gets started with 'podman start',
as part of the 'toolbox enter' command, it doesn't stop unless the
host is shut down or someone explicitly calls 'podman stop'. This
becomes annoying if someone tries to remove the container because
commands like 'podman rm' and such don't work without the '--force'
flag, even if all active 'toolbox enter' and 'toolbox run' sessions
have terminated.

A system of reference counting based on advisory file locks has been
used to automatically terminate the container's entry point once all
the active sessions have died.

The 'toolbox enter' and 'toolbox run' sessions acquire shared file
locks, and the container's entry point blocks trying to acquire an
exclusive lock on a common file. The entry point will be unblocked once
all shared locks have been released by the active sessions, and then
it terminates.

Once the container has been started, and the entry point has finished
setting it up, the entry point waits for a while before trying to
acquire its exclusive lock. This is meant to give some time to the
first session to go ahead and acquire its shared lock. A duration of 25
seconds, the same interval as the default for D-Bus method calls, was
chosen for this.

containers#114
@bam80
Copy link

bam80 commented Aug 26, 2020

If you can repeatedly reproduce this

Seems I'm not. Not sure if it's good or bad :)
Anyway, I have already filed the podman issue and closed it as non-reproducible. I'll reopen if I face it again:
containers/podman#7463

I think those are UIDs, not PIDs.

Those UIDs look big because they are inside a user namespace.

Of course they are UIDs, sorry.

Thanks!

debarshiray added a commit to debarshiray/toolbox that referenced this issue Aug 26, 2020
Currently, once a toolbox container gets started with 'podman start',
as part of the 'toolbox enter' command, it doesn't stop unless the
host is shut down or someone explicitly calls 'podman stop'. This
becomes annoying if someone tries to remove the container because
commands like 'podman rm' and such don't work without the '--force'
flag, even if all active 'toolbox enter' and 'toolbox run' sessions
have terminated.

A system of reference counting based on advisory file locks has been
used to automatically terminate the container's entry point once all
the active sessions have died.

The 'toolbox enter' and 'toolbox run' sessions acquire shared file
locks, and the container's entry point blocks trying to acquire an
exclusive lock on a common file. The entry point will be unblocked once
all shared locks have been released by the active sessions, and then
it terminates.

Once the container has been started, and the entry point has finished
setting it up, the entry point waits for a while before trying to
acquire its exclusive lock. This is meant to give some time to the
first session to go ahead and acquire its shared lock. A duration of 25
seconds, the same interval as the default for D-Bus method calls, was
chosen for this.

containers#114
debarshiray added a commit to debarshiray/toolbox that referenced this issue Aug 26, 2020
debarshiray added a commit to debarshiray/toolbox that referenced this issue Aug 26, 2020
Currently, once a toolbox container gets started with 'podman start',
as part of the 'toolbox enter' command, it doesn't stop unless the
host is shut down or someone explicitly calls 'podman stop'. This
becomes annoying if someone tries to remove the container because
commands like 'podman rm' and such don't work without the '--force'
flag, even if all active 'toolbox enter' and 'toolbox run' sessions
have terminated.

A system of reference counting based on advisory file locks has been
used to automatically terminate the container's entry point once all
the active sessions have died.

The 'toolbox enter' and 'toolbox run' sessions acquire shared file
locks, and the container's entry point blocks trying to acquire an
exclusive lock on a common file. The entry point will be unblocked once
all shared locks have been released by the active sessions, and then
it terminates.

Once the container has been started, and the entry point has finished
setting it up, the entry point waits for a while before trying to
acquire its exclusive lock. This is meant to give some time to the
first session to go ahead and acquire its shared lock. A duration of 25
seconds, the same interval as the default for D-Bus method calls, was
chosen for this.

containers#114
@HarryMichal
Copy link
Member

podman stop <container> should definitely work, unless you have active toolbox enter or podman run sessions.

I believe podman stop <container> will force those session to exit.

@HarryMichal HarryMichal added the 1. Bug Something isn't working label Sep 10, 2020
@HarryMichal HarryMichal moved this from Needs triage to Low priority in Priority Board Sep 10, 2020
@HarryMichal HarryMichal added this to the Release 0.1.0 milestone Sep 10, 2020
@debarshiray
Copy link
Member

So far, I can think of two different ways to make Toolbox
containers reference-counted so that they automatically
stop once the last toolbox enter or toolbox run session
has terminated.

Another option, used by coreos/toolbox, is to call podman stop after every invocation of podman exec in toolbox enter. podman stop will keep failing as long there's any active podman exec session, but once the last one finishes, the container will get stopped.

It's less sophisticated than the other alternatives, but simpler to implement.

@nanonyme
Copy link
Contributor

nanonyme commented Feb 25, 2021

@debarshiray I would vote for the simple approach (just always stop after exec) as long as you suppress the spurious "Error: container ... has active exec sessions, refusing to clean up: container state improper"output when container cannot be stopped.

@castedo
Copy link

castedo commented Dec 31, 2021

Here's a simple fix that I'm trying out in https://github.com/castedo/cnest. So far seems to be working well.
https://github.com/castedo/cnest/blob/c76c5b0dfb08f9f5db6ddcc2b7ec66c8b84a5335/bin/cnest#L43
If there are zero of those IDs, then call podman stop otherwise don't.

@bellegarde-c
Copy link

bellegarde-c commented Aug 30, 2022

This is working, it finally kills all toolbox running in background

[gnumdk@xps13 ~]$ cat .config/systemd/user/logout.service 
[Unit]
Description=Logout script
DefaultDependencies=no
Conflicts=shutdown.target
Before=basic.target shutdown.target

[Service]
Type=oneshot
ExecStop=%h/.config/systemd/user/logout.sh
RemainAfterExit=yes
TimeoutStopSec=5m

[Install]
WantedBy=basic.target
[gnumdk@xps13 ~]$ cat .config/systemd/user/logout.sh
#!/bin/bash

# Force container to the exit state
podman container stop fedora-toolbox-36

# Failed, container in stopped state
if (( $? != 0 ))
then
	# Force it to run again
	toolbox run true
	# And stop it
	podman container stop fedora-toolbox-36
fi

debarshiray added a commit to debarshiray/toolbox that referenced this issue Jan 18, 2023
Currently, once a toolbox container gets started with 'podman start',
as part of the 'toolbox enter' command, it doesn't stop unless the
host is shut down or someone explicitly calls 'podman stop'. This
becomes annoying if someone tries to remove the container because
commands like 'podman rm' and such don't work without the '--force'
flag, even if all active 'toolbox enter' and 'toolbox run' sessions
have terminated.

A crude form of reference counting has been set up that depends on
'podman stop' failing as long there's any active 'podman exec' session
left.  Every invocation of 'podman exec' in 'enter' and 'run' is
followed by 'podman stop', so that the container gets stopped once the
last session finishes.

While this approach looks very crude at first glance, it does have the
advantage of being ridiculously simple to implement.  Thus, it's a lot
more robust and easier to verify than setting up some custom reference
counting or synchronization using other means like POSIX signals or file
locks.

Based on the implementation in github.com/coreos/toolbox.

containers#114
@debarshiray
Copy link
Member

This is working, it finally kills all toolbox running in background

That's about cleaning up any active podman exec sessions when logging out, right?

If so, then that's different from this issue. This issue is about stopping the container (ie., killing the entry point) when the last podman exec session goes away during normal use, so that --force is not necessary with podman rm and the output of toolbox list is more intuitive.

I think the problem you were trying to address might have been fixed in Podman through containers/podman#17025

@debarshiray
Copy link
Member

So far, I can think of two different ways to make Toolbox
containers reference-counted so that they automatically
stop once the last toolbox enter or toolbox run session
has terminated.

Another option, used by coreos/toolbox, is to call podman stop after every invocation of podman exec in toolbox enter. podman stop will keep failing as long there's any active podman exec session, but once the last one finishes, the container will get stopped.

It turns out that current implementations of podman stop do stop the container (ie., the entry point gets killed) even when there are active podman exec sessions around. This negates the coreos/toolbox approach of always calling podman stop and leaving the reference counting to Podman.

@debarshiray
Copy link
Member

Here's a simple fix that I'm trying out in https://github.com/castedo/cnest. So far seems to be working well. https://github.com/castedo/cnest/blob/c76c5b0dfb08f9f5db6ddcc2b7ec66c8b84a5335/bin/cnest#L43 If there are zero of those IDs, then call podman stop otherwise don't.

Interesting. So you are doing:

podman exec -it \
  -e LANG \
  -e TERM \
  -e DISPLAY \
  --detach-keys="" \
  -e OSVIRTALIAS=$CONTAINER \
  -e debian_chroot=$CONTAINER \
  $CONTAINER \
  $COMMAND

NUM_EXEC=$(podman container inspect --format "{{len .ExecIDs}}" $CONTAINER)
if [[ $NUM_EXEC -eq 0 ]]; then
  podman stop $CONTAINER
fi

I am worried that there's a race. A podman start against the same container from a different terminal can slip in between the podman inspect and the podman start.

@castedo
Copy link

castedo commented Jan 18, 2023

I am worried that there's a race. A podman start against the same container from a different terminal can slip in between the podman inspect and the podman start.

Good eye! You are correct, there is that possibility. It's fair to say what I coded in cnest for this is a hack.

I've been using it for more than a year now. Still working well. But I'm only using it for "nest" containers for which I enter and exit them manually at the command line. I'm not fast enough to be able to create the race condition.

My hack might not be OK for more general uses of a container. Maybe there are single-user cases where someone has programs in the background entering/starting the container and not only entering from the command line manually.

@sandorex
Copy link

sandorex commented Jun 12, 2023

I've made a wrapper script to fix this issue (89luca89/distrobox#786) when using distrobox and adapted it for toolbox, pretty simple, it gets the conmon PID on start of the container then kills it afterwards using a background script

Could easily be adapted to kill the container too when there is no more shells open

#!/usr/bin/env bash
#
# toolbox-enter-wrapper - wrapper to call the shell properly in toolbox

if [ -z "$1" ]; then
    echo "Please provide container name"
    exit 1
fi

PIDFILE_DIR="$HOME/.local/state/toolbox"
PIDFILE="$PIDFILE_DIR/$$"

mkdir -p "$PIDFILE_DIR"
touch "$PIDFILE"

nohup sh <<EOF >/dev/null 2>&1 &
# wait for the main script to end
while ps -p $$ >/dev/null; do
    sleep 1s
done

# get pid from the file
PID="\$(cat "$PIDFILE")"
rm -f "$PIDFILE"

# conmon already dead, quit
if ! ps -p "\$PID" >/dev/null; then
    exit 0
fi

# kill conmon
kill -1 "\$PID"

# quit
exit 0
EOF

toolbox run -c "$1" sh -c "echo \$PPID > $PIDFILE; exec ${2:-$SHELL}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1. Bug Something isn't working
Projects
No open projects
Priority Board
  
Low priority
Development

Successfully merging a pull request may close this issue.

10 participants