Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run systemctl instead of podman when generating systemd units #585

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nogweii
Copy link

@nogweii nogweii commented Apr 28, 2023

Start and stop containers with systemctl instead of podman when a user is generating the systemd unit for the container.

This prevents an annoying behavior that occurs with the following example task:

---
- name: Create an Authentik container
  containers.podman.podman_container:
    name: authentik
    image: "{{ authentik_container_image }}:{{ authentik_container_tag }}"
    image_strict: yes
    state: started
    command: server
    generate_systemd:
      path: /etc/systemd/system/
      restart_policy: always
      time: 120
      new: true
      requires:
        - postgresql.service
        - redis.service
      after:
        - postgresql.service
        - redis.service
    env_file: /etc/default/container-authentik
    volume:
      - /srv/containers/authentik/blueprints:/blueprints/aethernet
    restart_policy: "no" # letting systemd handle this

This example task works perfectly fine when creating the container, but if want to update the tag it misbehaves. The resulting container, at the end of the run, is not updated. This is because of the following sequence of events:

  1. The containers.podman.podman_container module (here on, "ansible") queries podman for the running container information. Determines the image needs to be changed.
  2. It will then attempt to recreate the container. Since podman doesn't have a native way to do this, it does that by stopping and then deleting the container. So, ansible will run podman stop.
  3. The container stops, and systemd notices this, so it thinks the unit has failed and will restart the unit. NB: At this point, the unit is still configured with the old image.
  4. The container is deleted by ansible (because ansible ran podman rm as well) and then recreated with the updated configuration. Ansible thinks everything is good, so it continues on.
  5. systemd again notices the unit (container-authentik in this example) has "failed", and will recreate it.
  6. At the end of the module's execution, ansible will call podman generate systemd but that scans the running container, which was constantly being remade by systemd and is not the latest tag. So this step effectively becomes a no-op.
  7. All of the commands and file manipulation have succeeded, so ansible thinks the module has succeeded. All good?

Sometimes, steps 3 & 4 above are swapped, in which case the module does fail because it attempts to create a container with a name that already exists, and podman (correctly) complains.

This is all due to the fact that this module is attempting to stop & start containers directly when I've configured them to be managed by a different service management daemon (systemd). Since we're already systemd-aware, it makes sense to me to teach the module how to properly behave in that scenario.

This solves the problem of attempting to recreate a container/pod but
failing to do so because systemd would do that for you. Basically, a
race condition between this module and systemd, which pretty much always
caused this module to lose.

Signed-off-by: Nogweii <colin@evaryont.me>
@nogweii nogweii marked this pull request as ready for review May 15, 2023 19:12
@nogweii
Copy link
Author

nogweii commented May 15, 2023

@sshnaidm if you could please allow the CI to run on this MR, I've been running this branch for a couple of weeks and it seems to work for me very well.

@sshnaidm
Copy link
Member

Thanks @nogweii , sorry for late reply.
I'm not sure I understand the case here. Why should we run systemd commands in podman module? We can do it in ansible systemd module, it's exactly its purpose.

@nogweii
Copy link
Author

nogweii commented May 16, 2023

Oh, I forgot to update the PR description. My bad. Just edited it, but here is the summary: When managing the containers with systemd (via the generated files from podman) stopping the container in order to recreate it with different configuration will not last. Specifically, when you enable the new flag.

@PlqnK
Copy link

PlqnK commented Jun 1, 2023

Thanks @nogweii , sorry for late reply.
I'm not sure I understand the case here. Why should we run systemd commands in podman module? We can do it in ansible systemd module, it's exactly its purpose.

Because the execution of the podman_container module will clash with systemd on systems that uses systemd units to manage podman containers as explained in @nogweii original post.

The module executes podman stop commands instead of systemctl stop when it needs to recreate a container. When you run podman stop on a container managed by systemd, systemd will restart it immediatly and that causes the module to either fail or to not do what's expected of it.

The problem is more apparent when using "named" containers (generate_systemd.name: true). What happens when the module determines that it needs to recreate a container:

  1. The module execute podman stop namedcontainer
    1b. systemd sees that the container is not running anymore and considers that it has failed, so it restarts it.
  2. The module execute podman (create|run) namedcontainer and the command fails because a container with the same name is already running (restarted by systemd).

What @nogweii is proposing if I understood correctly is that when the generate_systemd key is present, use systemctl stop commands instead of podman stop commands to manage the container that needs to be recreated.

@sshnaidm
Copy link
Member

sshnaidm commented Aug 3, 2023

I still object to using systemctl in containers module, it breaks too many standards and good practices and against of ansible modules vision to do one thing. Although I understand the problem happens there.
What if we use --replace option instead of stop/create approach?

@Luminger
Copy link

Luminger commented Aug 3, 2023

Well, but this is the way podman operates, or rather that's one way that podman expects users to manage the lifecycle of containers. It doesn't really matter if using systemd is not intended IMO as it renders the container part of the collection somewhat useless.

I get you point, don't get me wrong. I do however think that the configured software should win an argument on how its used rather than the orchestration tool that's configuring it.

Managing the container lifecycle using systemd is not really possible with the module right now, at least I didn't find a way that's not flaky or flat out doesn't work due to a race between ansible and systemd. For me, the interim solution was to move away from configuring my containers using the collection and rather utilize podman quadlet in combination with podman kube play. I didn't want to write k8s specs for my personal stuff, but I kind of had to as it's a reliable way to have systemd manage my containers in a predictable way, which outweighs my desire for simplicity.

I don't think I added much to the discussion here, but maybe we can agree that the configured subject (podman) has a stronger opinion on how it should be used rather than the tool used to configure it (ansible).

@nogweii
Copy link
Author

nogweii commented Aug 4, 2023

Taking a look at the source code for how podman handles the --replace parameter, I don't think it's a viable alternative.

That parameter will cause podman to stop and delete the container. It's the same as running the commands individually, so the race condition between systemd and the module still exists. (It's probably a smaller window as we don't have to wait for a whole new process to be spun up a couple of times, but we're talking about milliseconds of difference. Not really that much.)

@sshnaidm
Copy link
Member

sshnaidm commented Aug 4, 2023

@nogweii I think I might found the workaround. For managing the race condition we can add RestartSec which I've added also in #613 And we have to run systemd daemon-reload task after it. Can you please try your playbook with such task and #613 ? (put restart_sec: 5 there)

- name: Create an Authentik container
  containers.podman.podman_container:
    name: authentik
    image: "{{ authentik_container_image }}:{{ authentik_container_tag }}"
    image_strict: yes
    state: started
    command: server
    generate_systemd:
      path: /etc/systemd/system/
      restart_policy: always
      time: 120
      restart_sec: 5
      new: true
      requires:
        - postgresql.service
        - redis.service
      after:
        - postgresql.service
        - redis.service
    env_file: /etc/default/container-authentik
    volume:
      - /srv/containers/authentik/blueprints:/blueprints/aethernet
    restart_policy: "no" # letting systemd handle this

- name: Systemd reload
  systemd:
    daemon_reload: yes
    scope: user 

@sshnaidm
Copy link
Member

sshnaidm commented Aug 6, 2023

In addition to prevent race condition between restart sec and systemd daemon-reload I think we can add daemon-reload as an option, probably it would be true id --new is used in systemd generation.
Something like:

    state: started
    command: server
    systemd_reload: true
    generate_systemd:
      path: /etc/systemd/system/
      restart_policy: always

@Apollo3zehn
Copy link
Contributor

Apollo3zehn commented Mar 18, 2024

It would be great if this PR is being considered for merging as right now I always have to manually stop the container via systemctl stop before I can upgrade to new image versions via Ansible which is really annoying :-(

Edit: I cannot apply the restart_sec workaround as Ubuntu 22.04 LTS does have an too old Podman version in their repos and still it would be a workaround instead of the right solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants