Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS ECR: Could not set registry endpoint credentials ... failed timeout after 10s #657

Open
bcbrockway opened this issue Dec 22, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@bcbrockway
Copy link

Describe the bug

We have the Image Updater running on EKS clusters using IRSA to link them to an IAM role that grants it permissions to our ECR registry. In addition, we have an auth script configured to run an awscli command to grab a new token every 11 hours:

# configmap/argocd-image-updater-config
# ...
data:
  registries.conf: |
    registries:
    - api_url: https://000000000000.dkr.ecr.us-east-2.amazonaws.com
      credentials: ext:/scripts/ecr-login-us-east-2.sh
      credsexpire: 11h
      name: ECR
      prefix: 000000000000.dkr.ecr.us-east-2.amazonaws.com

# configmap/argocd-image-updater-authscripts
# ...
data:
  ecr-login-us-east-2.sh: |
    #!/bin/sh
    aws ecr --region 'us-east-2' get-authorization-token --cli-read-timeout 5 --cli-connect-timeout 5 --output text --query 'authorizationData[].authorizationToken' | base64 -d

This usually works on startup, and sometimes after credsexpire, but it also often fails with:

Could not set registry endpoint credentials: error executing /scripts/ecr-login-us-east-2.sh: /scripts/ecr-login-us-east-2.sh failed timeout after 10s

Sometimes this can take hours of retries to rectify and sometimes nothing short of killing the pod and starting a new one will fix it.

It's also weird that it seems to run this script once for each app in its update cycle (see logs below) rather than just running it once seeing as we've configured at the registry level.

To Reproduce
Set up as above. Unfortunately, this is intermittent.

Expected behavior
The script runs correctly (once) and stores the new token for all apps to use.

Additional context
N/A

Version
0.12.0

Logs

2023-12-20T14:11:38+00:00	time="2023-12-20T14:11:38Z" level=info msg="Processing results: applications=4 images_considered=3 images_skipped=1 images_updated=0 errors=1"
2023-12-20T14:11:38+00:00	time="2023-12-20T14:11:38Z" level=info msg="Starting image update cycle, considering 2 annotated application(s) for update"
2023-12-20T14:11:39+00:00	time="2023-12-20T14:11:39Z" level=info msg="Processing results: applications=2 images_considered=2 images_skipped=0 images_updated=0 errors=0"
2023-12-20T14:12:09+00:00	time="2023-12-20T14:12:09Z" level=info msg="Starting image update cycle, considering 2 annotated application(s) for update"
2023-12-20T14:12:10+00:00	time="2023-12-20T14:12:10Z" level=info msg="Processing results: applications=2 images_considered=2 images_skipped=0 images_updated=0 errors=0"
2023-12-20T14:12:30+00:00	{"log":"time=\"2023-12-20T14:12:30Z\" level=info msg=\"Starting image update cycle, considering 3 annotated application(s) for update\"\n","stream":"stdout","time":"2023-12-20T14:12:30.44525427Z"}
2023-12-20T14:12:31+00:00	{"log":"time=\"2023-12-20T14:12:31Z\" level=info msg=\"Processing results: applications=3 images_considered=2 images_skipped=2 images_updated=0 errors=0\"\n","stream":"stdout","time":"2023-12-20T14:12:31.748061081Z"}
2023-12-20T14:12:41+00:00	time="2023-12-20T14:12:41Z" level=info msg="Starting image update cycle, considering 19 annotated application(s) for update"
2023-12-20T14:12:41+00:00	time="2023-12-20T14:12:41Z" level=info msg=/scripts/ecr-login-us-east-2.sh dir= execID=7dddb
2023-12-20T14:12:41+00:00	time="2023-12-20T14:12:41Z" level=info msg=/scripts/ecr-login-us-east-2.sh dir= execID=33a72
2023-12-20T14:12:41+00:00	time="2023-12-20T14:12:41Z" level=info msg=/scripts/ecr-login-us-east-2.sh dir= execID=2a859
2023-12-20T14:12:41+00:00	time="2023-12-20T14:12:41Z" level=info msg=/scripts/ecr-login-us-east-2.sh dir= execID=e6515
2023-12-20T14:12:41+00:00	time="2023-12-20T14:12:41Z" level=info msg=/scripts/ecr-login-us-east-2.sh dir= execID=dce93
2023-12-20T14:12:41+00:00	time="2023-12-20T14:12:41Z" level=info msg=/scripts/ecr-login-us-east-2.sh dir= execID=d9146
2023-12-20T14:12:41+00:00	time="2023-12-20T14:12:41Z" level=info msg=/scripts/ecr-login-us-east-2.sh dir= execID=508d5
2023-12-20T14:12:41+00:00	time="2023-12-20T14:12:41Z" level=info msg=/scripts/ecr-login-us-east-2.sh dir= execID=68554
2023-12-20T14:12:41+00:00	time="2023-12-20T14:12:41Z" level=info msg=/scripts/ecr-login-us-east-2.sh dir= execID=3c106
2023-12-20T14:12:41+00:00	time="2023-12-20T14:12:41Z" level=info msg=/scripts/ecr-login-us-east-2.sh dir= execID=d7263
2023-12-20T14:12:51+00:00	time="2023-12-20T14:12:51Z" level=error msg="`/scripts/ecr-login-us-east-2.sh` failed timeout after 10s" execID=7dddb
2023-12-20T14:12:51+00:00	time="2023-12-20T14:12:51Z" level=error msg="Could not set registry endpoint credentials: error executing /scripts/ecr-login-us-east-2.sh: `/scripts/ecr-login-us-east-2.sh` failed timeout after 10s" alias=report-subscription-event-producer application=report-subscription-event-producer image_name=gitlab/mintel/core-services/report-subscription-event-producer image_tag=ebdfe4eccab090c0d5a60a3bd4aae4aa7b8c3ae2-test registry=000000000000.dkr.ecr.us-east-2.amazonaws.com
2023-12-20T14:12:51+00:00	time="2023-12-20T14:12:51Z" level=info msg=/scripts/ecr-login-us-east-2.sh dir= execID=b2204
2023-12-20T14:12:51+00:00	time="2023-12-20T14:12:51Z" level=error msg="`/scripts/ecr-login-us-east-2.sh` failed timeout after 10s" execID=2a859
2023-12-20T14:12:51+00:00	time="2023-12-20T14:12:51Z" level=error msg="Could not set registry endpoint credentials: error executing /scripts/ecr-login-us-east-2.sh: `/scripts/ecr-login-us-east-2.sh` failed timeout after 10s" alias=ataccama-event-bridge application=ataccama-event-bridge image_name=gitlab/mintel/data-warehouse/agents/reference-data/ataccama-event-bridge image_tag="sha256:80c37d6719f3f2fd3e24a5264e2e1fbf1e37cf06a308f379db88ca55639ae498" registry=000000000000.dkr.ecr.us-east-2.amazonaws.com
@bcbrockway bcbrockway added the bug Something isn't working label Dec 22, 2023
@PuChenTW
Copy link

PuChenTW commented Jan 2, 2024

Setting --max-concurrency to 1 works for me, although I don't know exactly how this fixes the problem 😅
https://argocd-image-updater.readthedocs.io/en/stable/install/reference/#flags

extraArgs:
  - --max-concurrency
  - "1"

@bcbrockway
Copy link
Author

Setting --max-concurrency to 1 works for me, although I don't know exactly how this fixes the problem 😅 https://argocd-image-updater.readthedocs.io/en/stable/install/reference/#flags

extraArgs:
  - --max-concurrency
  - "1"

Some of our ArgoCD instances have a lot of apps so this would slow us down quite a bit :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants