You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After upgrading to 5.10.0(from 4.2.3), I have been facing significant issues with runners failing to start up properly. After investigation, there seems to be a problem with dnf failing. My setup uses the multi-runner module, with ephemeral runners, a backup pool of runners, and does NOT use a custom AMI. I am relying on the stock Amazon Linux 2023 setup. Using the default Amazon Linux 2023 AMI seems to be the key point that breaks things.
I grabbed the system log on a few of the failed runners and the problem seems to be shown in the log lines below.
I see that there is code in the default user-data template to do multiple attempts at installing things via dnf. However, that does not work because the default user-data.sh has bash launched with "-e", so any error kills the entire script. I believe that is the main problem. I put in an exact copy of user-data.sh without the "-e" option and things appear to now be working smoothly. However, I'm not entirely sure why dnf fails so often that it needs retry logic. It feels like there is a deeper issue here.
The text was updated successfully, but these errors were encountered:
as a workaround, you can provide a pre-setup script to sleep for two minutes until this is resolved. This has been working for me, it's ugly, but it unblocks the process.
After upgrading to 5.10.0(from 4.2.3), I have been facing significant issues with runners failing to start up properly. After investigation, there seems to be a problem with dnf failing. My setup uses the multi-runner module, with ephemeral runners, a backup pool of runners, and does NOT use a custom AMI. I am relying on the stock Amazon Linux 2023 setup. Using the default Amazon Linux 2023 AMI seems to be the key point that breaks things.
I grabbed the system log on a few of the failed runners and the problem seems to be shown in the log lines below.
I see that there is code in the default user-data template to do multiple attempts at installing things via dnf. However, that does not work because the default user-data.sh has bash launched with "-e", so any error kills the entire script. I believe that is the main problem. I put in an exact copy of user-data.sh without the "-e" option and things appear to now be working smoothly. However, I'm not entirely sure why dnf fails so often that it needs retry logic. It feels like there is a deeper issue here.
The text was updated successfully, but these errors were encountered: