Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Ansible stuck #1317

Open
Despire opened this issue Apr 2, 2024 · 2 comments
Open

Bug: Ansible stuck #1317

Despire opened this issue Apr 2, 2024 · 2 comments
Labels
bug Something isn't working groomed Task that everybody agrees to pass the gatekeeper

Comments

@Despire
Copy link
Contributor

Despire commented Apr 2, 2024

Current Behaviour

During a run of the e2e pipeline when installing wireguard via the ansible playbook the process spawned got stuck for unknown reasons. This halted the workflow of the picked up config and resulting in long build times and eventually a failure, and the process that got stuck will be left there indefinitely.

Expected Behaviour

There should be a mechanism for this, although I'm not sure what or how. A timeout will not help here as the larger the cluster is the longer the playbook will need to executed

Steps To Reproduce

I have encountered this at random.

@Despire Despire added the bug Something isn't working label Apr 2, 2024
@JKBGIT1
Copy link
Contributor

JKBGIT1 commented Apr 5, 2024

We are waiting for more occurrences to be able to debug deeper.

@JKBGIT1 JKBGIT1 added the groomed Task that everybody agrees to pass the gatekeeper label Apr 5, 2024
@JKBGIT1
Copy link
Contributor

JKBGIT1 commented Apr 24, 2024

I experienced something similar or maybe it was the same.

In my case, the playbook to install VPN stuck on Check if unattended-upgrades.service is present. I tried to kill all the ansible playbook processes to trigger the retry of that ansible playbook run. However, the processes probably weren't actually killed (I used SIGTERM and also SIGKILL), only their command changed and I still could see them listed when running ps aux (see the image below).

Image

The processes with the command [ansible-playbook] were "killed" by me and the rest were newly spawned by the main container process after I "killed" the old ones.

I kept killing the ansible playbook processes to run out of retries and finish up with a failed workflow. After the workflow failed, I ran the playbook to Install VPN with higher verbosity from ansibler manually. The playbook finished successfully this time. The next time, when the ansibler ran this playbook, it also went well, so I don't know what was going on there before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working groomed Task that everybody agrees to pass the gatekeeper
Projects
None yet
Development

No branches or pull requests

2 participants