Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ephemeral runners start but never connect to github because of dnf failures #3853

Open
bmulhollem opened this issue Apr 18, 2024 · 1 comment

Comments

@bmulhollem
Copy link

After upgrading to 5.10.0(from 4.2.3), I have been facing significant issues with runners failing to start up properly. After investigation, there seems to be a problem with dnf failing. My setup uses the multi-runner module, with ephemeral runners, a backup pool of runners, and does NOT use a custom AMI. I am relying on the stock Amazon Linux 2023 setup. Using the default Amazon Linux 2023 AMI seems to be the key point that breaks things.

I grabbed the system log on a few of the failed runners and the problem seems to be shown in the log lines below.

<13>Apr 18 13:00:58 user-data: Attempting 0/5: upgrade-minimal


Amazon Linux 2023.4.20240416
Kernel 6.1.84-99.169.amzn2023.x86_64 on an x86_64 (-)

ip-10-100-1-224 login: <13>Apr 18 13:01:08 user-data: Amazon Linux 2023 repository                    2.2 MB/s |  23 MB     00:10    
<13>Apr 18 13:01:12 user-data: Amazon Linux 2023 Kernel Livepatch repository   492 kB/s | 165 kB     00:00    
<13>Apr 18 13:01:13 user-data: Dependencies resolved.
<13>Apr 18 13:01:13 user-data: Nothing to do.
<13>Apr 18 13:01:13 user-data: Complete!
<13>Apr 18 13:01:13 user-data: Attempting 0/5: Installing docker
<13>Apr 18 13:01:13 user-data: Amazon Linux 2023 repository                     49 MB/s |  23 MB     00:00    
<13>Apr 18 13:01:17 user-data: Amazon Linux 2023 Kernel Livepatch repository   534 kB/s | 165 kB     00:00    
<13>Apr 18 13:01:18 user-data: Dependencies resolved.
<13>Apr 18 13:01:18 user-data: ================================================================================
<13>Apr 18 13:01:18 user-data:  Package                  Arch     Version                  Repository     Size
<13>Apr 18 13:01:18 user-data: ================================================================================
<13>Apr 18 13:01:18 user-data: Installing:
<13>Apr 18 13:01:18 user-data:  docker                   x86_64   25.0.3-1.amzn2023.0.1    amazonlinux    44 M
<13>Apr 18 13:01:18 user-data: Installing dependencies:
<13>Apr 18 13:01:18 user-data:  containerd               x86_64   1.7.11-1.amzn2023.0.1    amazonlinux    35 M
<13>Apr 18 13:01:18 user-data:  iptables-libs            x86_64   1.8.8-3.amzn2023.0.2     amazonlinux   401 k
<13>Apr 18 13:01:18 user-data:  iptables-nft             x86_64   1.8.8-3.amzn2023.0.2     amazonlinux   183 k
<13>Apr 18 13:01:18 user-data:  libcgroup                x86_64   3.0-1.amzn2023.0.1       amazonlinux    75 k
<13>Apr 18 13:01:18 user-data:  libnetfilter_conntrack   x86_64   1.0.8-2.amzn2023.0.2     amazonlinux    58 k
<13>Apr 18 13:01:18 user-data:  libnfnetlink             x86_64   1.0.1-19.amzn2023.0.2    amazonlinux    30 k
<13>Apr 18 13:01:18 user-data:  libnftnl                 x86_64   1.2.2-2.amzn2023.0.2     amazonlinux    84 k
<13>Apr 18 13:01:18 user-data:  pigz                     x86_64   2.5-1.amzn2023.0.3       amazonlinux    83 k
<13>Apr 18 13:01:18 user-data:  runc                     x86_64   1.1.11-1.amzn2023.0.1    amazonlinux   3.0 M
<13>Apr 18 13:01:18 user-data: 
<13>Apr 18 13:01:18 user-data: Transaction Summary
<13>Apr 18 13:01:18 user-data: ================================================================================
<13>Apr 18 13:01:18 user-data: Install  10 Packages
<13>Apr 18 13:01:18 user-data: 
<13>Apr 18 13:01:18 user-data: Total download size: 83 M
<13>Apr 18 13:01:18 user-data: Installed size: 313 M
<13>Apr 18 13:01:18 user-data: Downloading Packages:
<13>Apr 18 13:01:18 user-data: (1/10): iptables-libs-1.8.8-3.amzn2023.0.2.x86_ 4.3 MB/s | 401 kB     00:00    
<13>Apr 18 13:01:18 user-data: (2/10): iptables-nft-1.8.8-3.amzn2023.0.2.x86_6 5.1 MB/s | 183 kB     00:00    
<13>Apr 18 13:01:18 user-data: (3/10): libcgroup-3.0-1.amzn2023.0.1.x86_64.rpm 4.4 MB/s |  75 kB     00:00    
<13>Apr 18 13:01:18 user-data: (4/10): libnetfilter_conntrack-1.0.8-2.amzn2023 4.2 MB/s |  58 kB     00:00    
<13>Apr 18 13:01:18 user-data: (5/10): libnfnetlink-1.0.1-19.amzn2023.0.2.x86_ 1.7 MB/s |  30 kB     00:00    
<13>Apr 18 13:01:18 user-data: (6/10): libnftnl-1.2.2-2.amzn2023.0.2.x86_64.rp 4.7 MB/s |  84 kB     00:00    
<13>Apr 18 13:01:18 user-data: (7/10): pigz-2.5-1.amzn2023.0.3.x86_64.rpm      4.5 MB/s |  83 kB     00:00    
<13>Apr 18 13:01:18 user-data: (8/10): runc-1.1.11-1.amzn2023.0.1.x86_64.rpm    65 MB/s | 3.0 MB     00:00    
<13>Apr 18 13:01:18 user-data: (9/10): containerd-1.7.11-1.amzn2023.0.1.x86_64  65 MB/s |  35 MB     00:00    
<13>Apr 18 13:01:19 user-data: (10/10): docker-25.0.3-1.amzn2023.0.1.x86_64.rp  63 MB/s |  44 MB     00:00    
<13>Apr 18 13:01:19 user-data: --------------------------------------------------------------------------------
<13>Apr 18 13:01:19 user-data: Total                                           108 MB/s |  83 MB     00:00     
<13>Apr 18 13:01:19 user-data: Running transaction check
<13>Apr 18 13:01:19 user-data: Transaction check succeeded.
<13>Apr 18 13:01:19 user-data: Running transaction test
<13>Apr 18 13:01:20 user-data: Transaction test succeeded.
<13>Apr 18 13:01:20 user-data: Running transaction
<13>Apr 18 13:01:20 user-data: RPM: error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Resource temporarily unavailable)
<13>Apr 18 13:01:20 user-data: The downloaded packages were saved in cache until the next successful transaction.
<13>Apr 18 13:01:20 user-data: You can remove cached packages by executing 'dnf clean packages'.
<13>Apr 18 13:01:20 user-data: Error: Could not run transaction.
[   33.877357] cloud-init[3762]: 2024-04-18 13:01:20,197 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
[   33.877465] cloud-init[3762]: 2024-04-18 13:01:20,197 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.9/site-packages/cloudinit/config/cc_scripts_user.py'>) failed

I see that there is code in the default user-data template to do multiple attempts at installing things via dnf. However, that does not work because the default user-data.sh has bash launched with "-e", so any error kills the entire script. I believe that is the main problem. I put in an exact copy of user-data.sh without the "-e" option and things appear to now be working smoothly. However, I'm not entirely sure why dnf fails so often that it needs retry logic. It feels like there is a deeper issue here.

@jkruse14
Copy link
Contributor

as a workaround, you can provide a pre-setup script to sleep for two minutes until this is resolved. This has been working for me, it's ugly, but it unblocks the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants