Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argo-repo-server issue: gpg ... --gen-key failed exit status 2 #9809

Open
3 tasks done
nice-pink opened this issue Jun 28, 2022 · 52 comments
Open
3 tasks done

argo-repo-server issue: gpg ... --gen-key failed exit status 2 #9809

nice-pink opened this issue Jun 28, 2022 · 52 comments
Labels
bug Something isn't working

Comments

@nice-pink
Copy link

nice-pink commented Jun 28, 2022

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

After upgrading argo-cd from version v2.3.5 to v.2.4.3 the argo-repo-server stopped working with the logs:

argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="Generating self-signed gRPC TLS certificate for this session"                                                     │
│ argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="Initializing GnuPG keyring at /app/config/gpg/keys"                                                               │
│ argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403" dir= execID=f1898       │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=error msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403` failed exit status 2" │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403]" dir= opera │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=fatal msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403` failed exit status 2"

This leads to Argo CD UI showing error:

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.3.43.220:8081: connect: connection refused"

To Reproduce

For me it was just the upgrade.

Expected behavior

argo-repo-server starts up without errors.

Screenshots

Version

argocd: v2.1.3+d855831.dirty
  BuildDate: 2021-09-30T22:11:24Z
  GitCommit: d855831540e51d8a90b1006d2eb9f49ab1b088af
  GitTreeState: dirty
  GoVersion: go1.17.1
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v2.4.3+471685f
@nice-pink nice-pink added the bug Something isn't working label Jun 28, 2022
@Hanyu96
Copy link

Hanyu96 commented Jul 1, 2022

Is it the same machine before and after the upgrade and does it have git access?

@nice-pink
Copy link
Author

It is the same machine, same setup. Just upgraded from 2.3.X to 2.4.X. I'm facing the same issue with all 2.4.X versions. After the downgrade to 2.3.5 repo server works as expected.

@florianzimm
Copy link

florianzimm commented Jul 7, 2022

had the same issue when trying to upgrade from 2.2.10.
switched back to latest 2.3.5 as well.

@sass1997
Copy link

sass1997 commented Jul 8, 2022

I do have the same issue here. Are you also having istio activated?

@nice-pink
Copy link
Author

No, I'm not using istio.

@sass1997
Copy link

sass1997 commented Jul 8, 2022

As far as I've read the documentation:

As a security enhancement, the argocd-repo-server Deployment uses its own Service Account instead of default.

If you have a custom environment that might depend on repo-server using the default Service Account (such as a plugin that uses the Service Account for auth), be sure to test before deploying the 2.4 upgrade to production.

Has anybody noticed that this service account needs an addtional role and rolebinding. For me it seems that this new service account has to less rights.

Due to the fact that there is no seperate Role and Rolebinding. It takes the setting of your cluster which you gave or the default is system:serviceaccounts or system:authenticated.

@nice-pink
Copy link
Author

This might not be an issue in our setup, but thanks for the hint. I'll check that.

@sass1997
Copy link

sass1997 commented Jul 8, 2022

I will try to debug and let you know with the correct Rolebinding which is needed to start this pod

@sass1997
Copy link

sass1997 commented Jul 8, 2022

I can confirm it's definitely missing special rights for this service account. I granted now everything for this service account and now the argocd-repo-server is starting. I'm now digging into that we have a least privilege role.

@sass1997
Copy link

sass1997 commented Jul 11, 2022

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: argocd
  name: argocd-repo-server
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["*"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: argocd-repo-server
  namespace: argocd
subjects:
- kind: ServiceAccount
  name: argocd-repo-server
  apiGroup: ""
roleRef:
  kind: Role
  name: argocd-repo-server
  apiGroup: ""

Wasn't able to reduce the api groups or resources yet. But this config is working.
if someone from the maintainers can help here would be very useful.

@crenshaw-dev
Copy link
Collaborator

@sass1997 the permissionless service account is working for me in both my local docker-desktop setup and in Intuit Argo CD instances. I'm not able to reproduce the bug.

@PierreRAFFA
Copy link

I have just rebuilt a k8s cluster and install argocd using kustomize
And I am having the exact same issue using version 2.4.3 and 2.4.4.

Defaulted container "argocd-repo-server" out of: argocd-repo-server, copyutil (init)
time="2022-07-11T16:23:47Z" level=info msg="Generating self-signed gRPC TLS certificate for this session"
time="2022-07-11T16:23:47Z" level=info msg="Initializing GnuPG keyring at /app/config/gpg/keys"
time="2022-07-11T16:23:47Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe1962454667" dir= execID=861a3
time="2022-07-11T16:23:53Z" level=error msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe1962454667` failed exit status 2" execID=861a3
time="2022-07-11T16:23:53Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe1962454667]" dir= operation_name="exec gpg" time_ms=6009.55569
time="2022-07-11T16:23:53Z" level=fatal msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe1962454667` failed exit status 2"

@PierreRAFFA
Copy link

I have just tested to install version 2.3.5 and argocd-repo-server works

@nice-pink
Copy link
Author

@sass1997 Not even adding the role and role binding solves the issue for me. Still get exactly the same error. Hm.

@ghost
Copy link

ghost commented Aug 2, 2022

I am seeing the same issue too, but also only in the production cluster not on my local minikube. Adjusting the roles also does not change anything.

Setting the log level to debug reveals the error:

"gpg: can't connect to the agent: End of file\ngpg: agent_genkey failed: No agent running\ngpg: key generation failed: No agent running\n"

@nice-pink is that the same for you ?

@nice-pink
Copy link
Author

I'm seeing the original error still.

argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="Generating self-signed gRPC TLS certificate for this session"                                                     │
│ argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="Initializing GnuPG keyring at /app/config/gpg/keys"                                                               │
│ argocd-repo-server time="2022-06-28T16:18:42Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403" dir= execID=f1898       │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=error msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403` failed exit status 2" │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403]" dir= opera │
│ argocd-repo-server time="2022-06-28T16:18:48Z" level=fatal msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe301546403` failed exit status 2" 

The whole argo setup is running in a managed OVH cluster. Haven't tried in minikube or any similar cluster so far.

@ghost
Copy link

ghost commented Aug 2, 2022

Mhh, might be two unrelated issues then, I will keep you posted if I find out more

@cesarmesones
Copy link

For us it was pod security policies that needed updating- try this for a quick test giving the namespace priviledged access

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: argocd-psp
rules:
- apiGroups:
  - policy
  resourceNames:
  - privileged
  resources:
  - podsecuritypolicies
  verbs:
  - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: argocd-psp
subjects:
- kind: Group
  name: system:serviceaccounts:argocd
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: argocd-psp
  apiGroup: rbac.authorization.k8s.io

@nice-pink
Copy link
Author

nice-pink commented Aug 10, 2022

Hm, also no change for us. So far only disabling gpg as described in https://argo-cd.readthedocs.io/en/stable/user-guide/gpg-verification/ removes the error.

@ghost
Copy link

ghost commented Aug 10, 2022

For me neither, however changing the underlying VM image of the worker nodes did resolve the issue for us. On a standard ubuntu based image everything is fine now, without changing anything else

@nice-pink
Copy link
Author

@pleic which Ubuntu version do you use for the base image?

@ghost
Copy link

ghost commented Aug 22, 2022

@pleic which Ubuntu version do you use for the base image?

I am using ubuntu bionic, the current cloud image

@bauerjs1
Copy link

Experiencing the same issue here

Hm, also no change for us. So far only disabling gpg as described in https://argo-cd.readthedocs.io/en/stable/user-guide/gpg-verification/ removes the error.

Thanks @nice-pink for this workaround 👍🏻

@jabbors
Copy link

jabbors commented Sep 12, 2022

We experienced the same error after upgrading from 2.2.x to 2.4.11.

In our case we had patched the deployment with the below patch. After removing it, the error disappeared and repo server could start up.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argocd-repo-server
spec:
  template:
    spec:
      securityContext:
        seccompProfile:
          type: RuntimeDefault

@shizhz
Copy link

shizhz commented Sep 20, 2022

We experienced the same error after upgrading from 2.2.x to 2.4.11.

In our case we had patched the deployment with the below patch. After removing it, the error disappeared and repo server could start up.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argocd-repo-server
spec:
  template:
    spec:
      securityContext:
        seccompProfile:
          type: RuntimeDefault

The same issue with version v2.4.12, this workaround works for me! Thanks

@mkilchhofer
Copy link
Member

mkilchhofer commented Sep 23, 2022

We @swisspost are also facing this issue on VMware TKGI 1.22 (= TKGI 1.13.4-build.15). Argo CD v2.4.7 (via helm chart version 4.9.16) is working fine on AWS EKS and Azure AKS but not on TKGI.

TKGI runs really old worker OS:

$ kubectl get nodes -o wide
NAME     STATUS   ROLES    AGE     VERSION            INTERNAL-IP    EXTERNAL-IP    OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
08f...   Ready    <none>   4h20m   v1.22.6+vmware.1   172.23.129.8   172.23.129.8   Ubuntu 16.04.7 LTS   4.15.0-176-generic   docker://20.10.9

I then suspected that Ubuntu 16 worker with Ubuntu 22 base image has some compat issues (Argo 2.4 container image is based on Ubuntu 22).

Unfortunately this theory is wrong - I booted a single-node k3s cluster with Ubuntu 16 LTS, docker and k3s:

$ kubectl get nodes -o wide
NAME     STATUS   ROLES                  AGE   VERSION        INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
ubuntu   Ready    control-plane,master   16m   v1.22.6+k3s1   192.168.91.205   <none>        Ubuntu 16.04.7 LTS   4.4.0-186-generic   docker://20.10.7

$ kubectl -n argocd get po
NAME                                  READY   STATUS    RESTARTS   AGE
argocd-redis-6bb9c5d89f-kh4jj         1/1     Running   0          5m57s
argocd-application-controller-0       1/1     Running   0          5m57s
argocd-repo-server-7d97f5cbdb-5tqjc   1/1     Running   0          5m56s
argocd-server-9f646bf78-qfsf4         1/1     Running   0          5m56s

$ helm ls -n argocd
NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
argocd  argocd          1               2022-09-23 17:02:28.2194322 +0200 CEST  deployed        argo-cd-4.9.16  v2.4.7

Argo CD 2.4.7 is working fine here. I have no clue what else to try and filed a VMware issue in our support portal. 🙄

@denysvitali
Copy link

denysvitali commented Oct 6, 2022

We @swisscom have the same issue with a Kubernetes cluster based on VMware Tanzu v1.21.9+vmware.1.

  Kernel Version:             4.15.0-167-generic
  OS Image:                   Ubuntu 16.04.7 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.9
  Kubelet Version:            v1.21.9+vmware.1
  Kube-Proxy Version:         v1.21.9+vmware.1

Whoa.

Log Line

time="2022-10-06T09:10:58Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe3419147315" dir= execID=0ca0f
time="2022-10-06T09:11:04Z" level=debug msg="gpg: can't connect to the agent: End of file\ngpg: agent_genkey failed: No agent running\ngpg: key generation failed: No agent running\n" duration=6.01130389s execID=0ca0f
time="2022-10-06T09:11:04Z" level=error msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe3419147315` failed exit status 2" execID=0ca0f
time="2022-10-06T09:11:04Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe3419147315]" dir= operation_name="exec gpg" time_ms=6011.9106090000005
time="2022-10-06T09:11:04Z" level=fatal msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe3419147315` failed exit status 2"

argocd-repo-server

v2.4.12

gpg

argocd@argocd-repo-server-64d5df97c5-2p6xx:~$ gpg --version
gpg (GnuPG) 2.2.27
libgcrypt 1.9.4
Copyright (C) 2021 Free Software Foundation, Inc.
License GNU GPL-3.0-or-later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Home: /home/argocd/.gnupg
Supported algorithms:
Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
        CAMELLIA128, CAMELLIA192, CAMELLIA256
Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2

argocd@argocd-repo-server-64d5df97c5-2p6xx:~$ ldd $(which gpg)
	linux-vdso.so.1 (0x00007ffe69139000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f1a538c1000)
	libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f1a538ae000)
	libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f1a53761000)
	libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f1a53623000)
	libreadline.so.8 => /lib/x86_64-linux-gnu/libreadline.so.8 (0x00007f1a535cf000)
	libassuan.so.0 => /lib/x86_64-linux-gnu/libassuan.so.0 (0x00007f1a535b9000)
	libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f1a53591000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1a53369000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1a53282000)
	libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f1a53250000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f1a539e4000)

@mkilchhofer
Copy link
Member

We @swisscom have the same issue with a Kubernetes cluster based on VMware Tanzu v1.21.9+vmware.1.

  Kernel Version:             4.15.0-167-generic
  OS Image:                   Ubuntu 16.04.7 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.9
  Kubelet Version:            v1.21.9+vmware.1
  Kube-Proxy Version:         v1.21.9+vmware.1

Whoa.

Nice! Thanks for confirming that❤️. We postponed the upgrade intent for Argo on Tanzu and wait for TKGI 1.16 in Q1 2023 🙄

@nice-pink
Copy link
Author

I managed to get everything running. But I did a complete fresh setup using the install.yaml from v2.4.13. There were quite some changes so it's not easy to say which diff was the important one.

@denysvitali
Copy link

denysvitali commented Oct 6, 2022

Seems to be related to https://dev.gnupg.org/T2203 somehow:

$ export GNUPGHOME=/app/config/gpg/keys
$ gpgconf --launch gpg-agent
gpgconf: error running '/usr/bin/gpg-connect-agent': exit status 1
gpgconf: error running '/usr/bin/gpg-connect-agent NOP': General error

$ gpg-connect-agent -v
gpg-connect-agent: no running gpg-agent - starting '/usr/bin/gpg-agent'
gpg-connect-agent: waiting for the agent to come up ... (5s)
gpg-connect-agent: waiting for the agent to come up ... (4s)
gpg-connect-agent: waiting for the agent to come up ... (3s)
gpg-connect-agent: waiting for the agent to come up ... (2s)
gpg-connect-agent: waiting for the agent to come up ... (1s)
gpg-connect-agent: can't connect to the agent: IPC connect call failed
gpg-connect-agent: error sending standard options: No agent running

$ # time_ms=6011.9106090000005 =~ 5s waiting for the gpg agent
$  gpg-agent -v --daemon
gpg-agent[46]: listening on socket '/app/config/gpg/keys/S.gpg-agent'
gpg-agent[46]: listening on socket '/app/config/gpg/keys/S.gpg-agent.extra'
gpg-agent[46]: listening on socket '/app/config/gpg/keys/S.gpg-agent.browser'
gpg-agent[46]: listening on socket '/app/config/gpg/keys/S.gpg-agent.ssh'
$ gpg-agent[47]: gpg-agent (GnuPG) 2.2.27 started

$ ldd $(which gpg-agent)
	linux-vdso.so.1 (0x00007fff349fe000)
	libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f980ea78000)
	libassuan.so.0 => /lib/x86_64-linux-gnu/libassuan.so.0 (0x00007f980ea62000)
	libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f980ea3c000)
	libnpth.so.0 => /lib/x86_64-linux-gnu/libnpth.so.0 (0x00007f980ea35000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f980e80d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f980ec0a000)

$ dpkg-query -l gpg libgcrypt20 gpg-agent
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name              Version           Architecture Description
+++-=================-=================-============-=====================================================
ii  gpg               2.2.27-3ubuntu2.1 amd64        GNU Privacy Guard -- minimalist public key operations
ii  gpg-agent         2.2.27-3ubuntu2.1 amd64        GNU privacy guard - cryptographic agent
ii  libgcrypt20:amd64 1.9.4-3ubuntu3    amd64        LGPL Crypto library - runtime library

@denysvitali
Copy link

denysvitali commented Oct 6, 2022

I'm getting closer to the issue I think:
https://lists.gnupg.org/pipermail/gnupg-users/2017-April/058158.html

VMware Tanzu (v1.21.9+vmware.1)

TKGi version: 1.12.4-build.14
$ k get nodes -o wide
NAME   STATUS   ROLES    AGE   VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
n1     Ready    <none>   54d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n2     Ready    <none>   54d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n3     Ready    <none>   19d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n4     Ready    <none>   19d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n5     Ready    <none>   54d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n6     Ready    <none>   54d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n7     Ready    <none>   19d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n8     Ready    <none>   54d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
$ dd if=/dev/urandom of=/tmp/test.bin bs=1024 count=10
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.000239062 s, 42.8 MB/s

$ time dd if=/dev/random of=/tmp/test.bin bs=1024 count=10
^C0+1 records in
0+1 records out
7 bytes copied, 8.42642 s, 0.0 kB/s


real	0m8.428s
user	0m0.001s
sys	0m0.000s

AWS (v1.23.7-eks-7709a84)

$ k get nodes -o wide
NAME                                      STATUS   ROLES    AGE     VERSION               INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                               KERNEL-VERSION   CONTAINER-RUNTIME
ip-xyz-1.eu-central-1.compute.internal    Ready    <none>   8d      v1.23.7-eks-7709a84   10.0.0.1   <none>        Bottlerocket OS 1.9.2 (aws-k8s-1.23)   5.10.130         containerd://1.6.6+bottlerocket
ip-xyz-2.eu-central-1.compute.internal    Ready    <none>   8d      v1.23.7-eks-7709a84   10.0.0.2    <none>        Bottlerocket OS 1.9.2 (aws-k8s-1.23)   5.10.130         containerd://1.6.6+bottlerocket
ip-xyz-3.eu-central-1.compute.internal    Ready    <none>   7d17h   v1.23.7-eks-7709a84   10.0.0.3    <none>        Bottlerocket OS 1.9.2 (aws-k8s-1.23)   5.10.130         containerd://1.6.6+bottlerocket

$  time dd if=/dev/urandom of=/tmp/test.bin bs=1024 count=10
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.000139985 s, 73.2 MB/s

real	0m0.001s
user	0m0.001s
sys	0m0.000s

$ time dd if=/dev/random of=/tmp/test.bin bs=1024 count=10
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.00011261 s, 90.9 MB/s

real	0m0.001s
user	0m0.001s
sys	0m0.000s

TL;DR: VMware Tanzu can't get more than a few bytes from /dev/random. Strange!
In any case, /dev/random shouldn't be used by GPG... but at this point I'm not sure.

@denysvitali
Copy link

denysvitali commented Oct 6, 2022

And using the workaround proposed somewhere else of setting ARGOCD_GPG_ENABLED=false to most of the containers only shifts the problem: it seems that once you do that, #9888 happens.

There is probably something both GPG and git do, that for some reason on Ubuntu 16.04 / VMware Tanzu causes issues:

cannot create async thread: Operation not permitted\nfatal: fetch-pack: unable to fork off sideband demultiplexer
argocd@argocd-repo-server-5b654fb47f-nhxqb:~$ git --version
git version 2.34.1
argocd@argocd-repo-server-5b654fb47f-nhxqb:~$ ldd $(which git)
	linux-vdso.so.1 (0x00007ffc6e3fd000)
	libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f4b917d9000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f4b917bd000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4b91595000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4b91c16000)

https://github.com/git/git/blob/v2.34.1/run-command.c#L1277-L1280

https://github.com/git/git/blob/v2.34.1/fetch-pack.c#L859-L860

Running GIT with GIT_TRACE=1, GIT_TRACE_PACKET=1 and GIT_TRACE_PACK_ACCESS=1 returns:

git fetch origin --tags --force` failed exit status 128: 
11:11:31.006050 git.c:455               trace: built-in: git fetch origin --tags --force
11:11:31.006558 run-command.c:668       trace: run_command: unset GIT_PREFIX; GIT_PROTOCOL=version=2 'ssh -i /dev/shm/1872321455 -o StrictHostKeyChecking=yes -o UserKnownHostsFile=/app/config/ssh/ssh_known_hosts' -o SendEnv=GIT_PROTOCOL -p 7999 git@git.corp.example.com 'git-upload-pack '\\''/prj/gitops.git'\\'''
11:11:31.152667 pkt-line.c:80           packet:        fetch< version 2
11:11:31.152760 pkt-line.c:80           packet:        fetch< agent=git/2.35.3
11:11:31.152766 pkt-line.c:80           packet:        fetch< ls-refs=unborn
11:11:31.152771 pkt-line.c:80           packet:        fetch< fetch=shallow wait-for-done filter
11:11:31.152776 pkt-line.c:80           packet:        fetch< server-option
11:11:31.152780 pkt-line.c:80           packet:        fetch< object-format=sha1
11:11:31.152784 pkt-line.c:80           packet:        fetch< object-info
11:11:31.152788 pkt-line.c:80           packet:        fetch< 0000
11:11:31.152793 pkt-line.c:80           packet:        fetch> command=ls-refs
11:11:31.152804 pkt-line.c:80           packet:        fetch> agent=git/2.34.1
11:11:31.152808 pkt-line.c:80           packet:        fetch> object-format=sha1
11:11:31.152811 pkt-line.c:80           packet:        fetch> 0001
11:11:31.152815 pkt-line.c:80           packet:        fetch> peel
11:11:31.152818 pkt-line.c:80           packet:        fetch> symrefs
11:11:31.152837 pkt-line.c:80           packet:        fetch> unborn
11:11:31.152842 pkt-line.c:80           packet:        fetch> ref-prefix refs/heads/
11:11:31.152845 pkt-line.c:80           packet:        fetch> ref-prefix refs/tags/
11:11:31.152848 pkt-line.c:80           packet:        fetch> 0000
11:11:31.162630 pkt-line.c:80           packet:        fetch< 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c refs/heads/feature/hello-world
11:11:31.162675 pkt-line.c:80           packet:        fetch< 486ea46224d1bb4fb680f34f7c9ad96a8f24ec88 refs/heads/master
11:11:31.162685 pkt-line.c:80           packet:        fetch< 0000
11:11:31.163023 pkt-line.c:80           packet:        fetch> command=fetch
11:11:31.163037 pkt-line.c:80           packet:        fetch> agent=git/2.34.1
11:11:31.163040 pkt-line.c:80           packet:        fetch> object-format=sha1
11:11:31.163042 pkt-line.c:80           packet:        fetch> 0001
11:11:31.163044 pkt-line.c:80           packet:        fetch> thin-pack
11:11:31.163047 pkt-line.c:80           packet:        fetch> no-progress
11:11:31.163049 pkt-line.c:80           packet:        fetch> ofs-delta
11:11:31.163062 pkt-line.c:80           packet:        fetch> want 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c
11:11:31.163068 pkt-line.c:80           packet:        fetch> want 486ea46224d1bb4fb680f34f7c9ad96a8f24ec88
11:11:31.163075 pkt-line.c:80           packet:        fetch> done
11:11:31.163077 pkt-line.c:80           packet:        fetch> 0000
11:11:31.175024 pkt-line.c:80           packet:        fetch< packfile
error: cannot create async thread: Operation not permitted
fatal: fetch-pack: unable to fork off sideband demultiplexer

(not helpful)

@denysvitali
Copy link

denysvitali commented Oct 6, 2022

Downgrading argocd-repo-server (only this) to v2.2.5 fixes #9888 and #9809 (by using ARGOCD_GPG_ENABLED=false):

$ git --version
12:50:23.016524 git.c:444               trace: built-in: git version
git version 2.30.2

$ ldd $(which git)
	linux-vdso.so.1 (0x00007ffe69648000)
	libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f2cf3222000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f2cf3206000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2cf31e4000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2cf2ff8000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f2cf3638000)

$ gpg --version
gpg (GnuPG) 2.2.20
libgcrypt 1.8.7
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Home: /home/argocd/.gnupg
Supported algorithms:
Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
        CAMELLIA128, CAMELLIA192, CAMELLIA256
Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2

$ ldd $(which gpg)
	linux-vdso.so.1 (0x00007fffa7a98000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f91b3ac9000)
	libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f91b3ab6000)
	libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f91b3975000)
	libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f91b3852000)
	libreadline.so.8 => /lib/x86_64-linux-gnu/libreadline.so.8 (0x00007f91b37fe000)
	libassuan.so.0 => /lib/x86_64-linux-gnu/libassuan.so.0 (0x00007f91b37e8000)
	libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f91b37be000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f91b35d2000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f91b3484000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f91b3462000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f91b345b000)
	libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f91b342c000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f91b3bf5000)


$ dpkg-query -l gpg gpg-agent libgcrypt20
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name              Version          Architecture Description
+++-=================-================-============-=====================================================
ii  gpg               2.2.20-1ubuntu3  amd64        GNU Privacy Guard -- minimalist public key operations
ii  gpg-agent         2.2.20-1ubuntu3  amd64        GNU privacy guard - cryptographic agent
ii  libgcrypt20:amd64 1.8.7-2ubuntu2.1 amd64        LGPL Crypto library - runtime library

@crenshaw-dev
Copy link
Collaborator

@denysvitali I'd be curious how far forward you can move in the 2.2.x series before encountering the issue again.

Downgrading is a pretty bad workaround, since there are several serious CVEs related to the repo-server which were patched in later 2.2 releases. But might help us isolate a problematic commit, if there is one.

@denysvitali
Copy link

@crenshaw-dev: See #9888 and specifically:
#9888 (comment)

It seems like this issue is caused by the combination of the new base image (Ubuntu 22.04) and VMware Tanzu (?).
So we don't really need to check further, anything after 44d8cb8 fails.

@nouseforaname
Copy link
Contributor

nouseforaname commented Nov 1, 2022

I'm getting closer to the issue I think: https://lists.gnupg.org/pipermail/gnupg-users/2017-April/058158.html

VMware Tanzu (v1.21.9+vmware.1)

TKGi version: 1.12.4-build.14
$ k get nodes -o wide
NAME   STATUS   ROLES    AGE   VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
n1     Ready    <none>   54d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n2     Ready    <none>   54d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n3     Ready    <none>   19d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n4     Ready    <none>   19d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n5     Ready    <none>   54d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n6     Ready    <none>   54d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n7     Ready    <none>   19d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
n8     Ready    <none>   54d   v1.21.9+vmware.1   10.0.0.XX     10.0.0.XX     Ubuntu 16.04.7 LTS   4.15.0-167-generic   docker://20.10.9
$ dd if=/dev/urandom of=/tmp/test.bin bs=1024 count=10
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.000239062 s, 42.8 MB/s

$ time dd if=/dev/random of=/tmp/test.bin bs=1024 count=10
^C0+1 records in
0+1 records out
7 bytes copied, 8.42642 s, 0.0 kB/s


real	0m8.428s
user	0m0.001s
sys	0m0.000s

AWS (v1.23.7-eks-7709a84)

$ k get nodes -o wide
NAME                                      STATUS   ROLES    AGE     VERSION               INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                               KERNEL-VERSION   CONTAINER-RUNTIME
ip-xyz-1.eu-central-1.compute.internal    Ready    <none>   8d      v1.23.7-eks-7709a84   10.0.0.1   <none>        Bottlerocket OS 1.9.2 (aws-k8s-1.23)   5.10.130         containerd://1.6.6+bottlerocket
ip-xyz-2.eu-central-1.compute.internal    Ready    <none>   8d      v1.23.7-eks-7709a84   10.0.0.2    <none>        Bottlerocket OS 1.9.2 (aws-k8s-1.23)   5.10.130         containerd://1.6.6+bottlerocket
ip-xyz-3.eu-central-1.compute.internal    Ready    <none>   7d17h   v1.23.7-eks-7709a84   10.0.0.3    <none>        Bottlerocket OS 1.9.2 (aws-k8s-1.23)   5.10.130         containerd://1.6.6+bottlerocket
$  time dd if=/dev/urandom of=/tmp/test.bin bs=1024 count=10
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.000139985 s, 73.2 MB/s

real	0m0.001s
user	0m0.001s
sys	0m0.000s

$ time dd if=/dev/random of=/tmp/test.bin bs=1024 count=10
10+0 records in
10+0 records out
10240 bytes (10 kB, 10 KiB) copied, 0.00011261 s, 90.9 MB/s

real	0m0.001s
user	0m0.001s
sys	0m0.000s

TL;DR: VMware Tanzu can't get more than a few bytes from /dev/random. Strange! In any case, /dev/random shouldn't be used by GPG... but at this point I'm not sure.

What IaaS is TKGi running on? AWS/GCP/Azure/vSphere/Openstack or something else?

@denysvitali
Copy link

I think vSphere.
For sure not AWS, GCP nor Azure.

Anyways, see the analysis here and the Ubuntu tracker issue I've opened: it seems like the issue is with pthread + Ubuntu 22.04 on an Ubuntu 16.04.7 LTS host running in TKGi.

The affected TKGi versions so far seems to be:

In a few minutes I'll try with 1.13.8-build.5 as our platform team just made that available - then I'll report back.

@krishgu
Copy link

krishgu commented Nov 1, 2022

I managed to get everything running. But I did a complete fresh setup using the install.yaml from v2.4.13. There were quite some changes so it's not easy to say which diff was the important one.

We had the same issue (argocd-repo-server erroring out with GPG errors) when installing from kustomize from cluster-install manifest. The setup works off the shelf for minikube, but for a "real" cluster had this issue. Just removing the seccompProfile section from the securityContext section solved it.

          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          seccompProfile:
            type: RuntimeDefault

@krishgu
Copy link

krishgu commented Nov 1, 2022

We experienced the same error after upgrading from 2.2.x to 2.4.11.

In our case we had patched the deployment with the below patch. After removing it, the error disappeared and repo server could start up.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argocd-repo-server
spec:
  template:
    spec:
      securityContext:
        seccompProfile:
          type: RuntimeDefault

just removing the seccompProfile sub-section worked for us (details). Thank you, @jabbors ! How did you narrow down to this part?

@jabbors
Copy link

jabbors commented Nov 2, 2022

We had 2.4.11 running in a temporary playground environment where it ran fine. Then we started comparing the setups and found the seccompProfile was missing in the playground environment. We removed it in our production environments and Bingo!

@florianzimm
Copy link

florianzimm commented Nov 2, 2022

to sum that up, because i had the same issue (for me it was podsecuritypolicies), working from @sass1997 solution to allow the repo-server anything. creating a role with all permissions with alcideo(s) rbac-tool and assigned to argocd-repo-server. reduced permissions bulkwise, applied and restarted argocd-repo-server after every change lead to this role

---
kind: Role
metadata:
  namespace: argocd
  name: argocd-repo-server
rules:
- apiGroups: ["policy"]
  resources: ["podsecuritypolicies"]
  verbs: ["use"]

checking psp's then all serviceaccounts in namespace argocd get a restricted psp (this is one of several .yaml applied after argocd's install.yaml).

# permit all service accounts to use psp within argocd namespace
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: argocd-psp-restricted-rolebinding
  namespace: argocd
subjects:
  - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: system:serviceaccounts:argocd
roleRef:
  kind: ClusterRole
  name: psp:restricted
  apiGroup: rbac.authorization.k8s.i

creating a new (privileged) psp 'argocd-repo-server' (from another psp already in place), modifying the role from above to only let argocd-repo-server use the new psp.

---
kind: Role
metadata:
  namespace: argocd
  name: argocd-repo-server
rules:
- apiGroups: ["policy"]
  resources: ["podsecuritypolicies"]
  verbs: ["use"]
  resourceNames: ["argocd-repo-server"] # this one forces argocd-repo-server to use the new psp

then modify this psp to (in the end having the same permissions as the restricted psp (not working) to find out, which setting causes the issue lead to 'seccomp: runtime/default'.

so this is what i ended up with (i think an OK-tradeoff from a security-perspective, way better than running an older version of argocd with older base-images having way more vulnerabilities). maybe this can be a viable (temporary) workaround (@denysvitali or @mkilchhofer) as well (albeit not adressing a root issue others are working on to find out)?

# permit all service accounts to use psp within argocd namespace
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: argocd-psp-restricted-rolebinding
  namespace: argocd
subjects:
  - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: system:serviceaccounts:argocd
roleRef:
  kind: ClusterRole
  name: psp:restricted
  apiGroup: rbac.authorization.k8s.io
---
# this is a workaround for argocd-repo-server only (> 2.3.6) with tkgi 1.11 onwards. assumedly up to 1.13 (or newer versions not upgrading ubuntu)
### issue:
# as a default all serviceaccounts in namespace argocd get assigned the default restricted psp from tkgi (with 'argocd-rolebinding-psp.yaml' in argocd-infra)
# this does no longer work with argocd-repo-server from 2.3.7 onwards
#
#   details for psp 'pks-restricted' : https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid-Integrated-Edition/1.15/tkgi/GUID-pod-security-policy.html
#
### solution:
# create a new psp for argocd-repo-server.
#   (!) the only difference to 'pks-restriced' is that in this new psp the annotation apparmor.security.beta.kubernetes.io/allowedProfileName is NOT
#   set to 'runtime/default' but to '*' (meaning, no seccomp-profile is used).
# create a new role/-binding for argocd-repo-server and make it use the new psp
#
### note:
# psp && role && rolebinding can get removed some time in the future (with tkgi 1.15?) ....
#
# (1) psp
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
  name: argocd-repo-server
spec:
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
  - ALL
  fsGroup:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  privileged: false
  runAsUser:
    rule: MustRunAsNonRoot
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  volumes:
  - configMap
  - emptyDir
  - projected
  - secret
  - downwardAPI
  - persistentVolumeClaim
---
# (2) role/-binding
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: argocd
  name: argocd-repo-server
rules:
- apiGroups: ["policy"]
  resources: ["podsecuritypolicies"]
  verbs: ["use"]
  resourceNames: ["argocd-repo-server"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: argocd-repo-server
  namespace: argocd
subjects:
- kind: ServiceAccount
  name: argocd-repo-server
  apiGroup: ""
roleRef:
  kind: Role
  name: argocd-repo-server
  apiGroup: ""

@YanzhaoLi
Copy link

@denysvitali Could you please try out in your env by excluding the seccomProfile ?

@denysvitali
Copy link

denysvitali commented Nov 2, 2022

Manifest

apiVersion: v1
kind: Pod
metadata:
  name: git-test-ubuntu-seccomp
spec:
  containers:
    - name: ubuntu-21-04-runtimedefault
      image: private-registry.example.com/git-debug:21.04
      command:
        - tail
        - -f
        - /dev/null
      securityContext:
        seccompProfile:
          type: RuntimeDefault
    - name: ubuntu-22-04-runtimedefault
      image: private-registry.example.com/git-debug:22.04
      command:
        - tail
        - -f
        - /dev/null
      securityContext:
        seccompProfile:
          type: RuntimeDefault

    - name: ubuntu-21-04
      image: private-registry.example.com/git-debug:21.04
      command:
        - tail
        - -f
        - /dev/null
    - name: ubuntu-22-04
      image: private-registry.example.com/git-debug:22.04
      command:
        - tail
        - -f
        - /dev/null
    - name: ubuntu-21-04-unconfined
      image: private-registry.example.com/git-debug:21.04
      command:
        - tail
        - -f
        - /dev/null
      securityContext:
        seccompProfile:
          type: Unconfined
    - name: ubuntu-22-04-unconfined
      image: private-registry.example.com/git-debug:22.04
      command:
        - tail
        - -f
        - /dev/null
      securityContext:
        seccompProfile:
          type: Unconfined

Results

ubuntu-21-04:/tmp$ git version
git version 2.30.2
ubuntu-21-04:/tmp$ git clone https://github.com/argoproj/argo-cd/ --single-branch
Cloning into 'argo-cd'...
remote: Enumerating objects: 55878, done.
remote: Counting objects: 100% (202/202), done.
remote: Compressing objects: 100% (97/97), done.
remote: Total 55878 (delta 124), reused 170 (delta 105), pack-reused 55676
Receiving objects: 100% (55878/55878), 48.36 MiB | 7.24 MiB/s, done.
Resolving deltas: 100% (37812/37812), done.
ubuntu-21-04-runtimedefault:/tmp$ git version
git version 2.30.2
ubuntu-21-04-runtimedefault:/tmp$ git clone https://github.com/argoproj/argo-cd/ --single-branch
Cloning into 'argo-cd'...
remote: Enumerating objects: 55878, done.
remote: Counting objects: 100% (202/202), done.
remote: Compressing objects: 100% (97/97), done.
remote: Total 55878 (delta 124), reused 168 (delta 105), pack-reused 55676
Receiving objects: 100% (55878/55878), 48.37 MiB | 5.79 MiB/s, done.
Resolving deltas: 100% (37802/37802), done.
ubuntu-22-04:/tmp$ git version
git version 2.34.1
ubuntu-22-04:/tmp$ git clone https://github.com/argoproj/argo-cd/
Cloning into 'argo-cd'...
remote: Enumerating objects: 96089, done.
remote: Counting objects: 100% (419/419), done.
remote: Compressing objects: 100% (162/162), done.
remote: Total 96089 (delta 274), reused 372 (delta 248), pack-reused 95670
Receiving objects: 100% (96089/96089), 71.02 MiB | 11.02 MiB/s, done.
Resolving deltas: 100% (60870/60870), done.
ubuntu-22-04-runtimedefault:/tmp$ git version
git version 2.34.1
$ git clone https://github.com/argoproj/argo-cd/ --single-branch
Cloning into 'argo-cd'...
fatal: unable to access 'https://github.com/argoproj/argo-cd/': getaddrinfo() thread failed to start
ubuntu-21-04-unconfined:/tmp$ git version
git version 2.30.2
ubuntu-21-04-unconfined:/tmp$ git clone https://github.com/argoproj/argo-cd/
Cloning into 'argo-cd'...
remote: Enumerating objects: 96089, done.
remote: Counting objects: 100% (419/419), done.
remote: Compressing objects: 100% (162/162), done.
remote: Total 96089 (delta 274), reused 372 (delta 248), pack-reused 95670
Receiving objects: 100% (96089/96089), 71.02 MiB | 5.49 MiB/s, done.
Resolving deltas: 100% (60870/60870), done.
ubuntu-22-04-unconfined/tmp$ git version
git version 2.34.1
ubuntu-22-04-unconfined:/tmp$ git clone https://github.com/argoproj/argo-cd/ --single-branch
Cloning into 'argo-cd'...
remote: Enumerating objects: 55878, done.
remote: Counting objects: 100% (202/202), done.
remote: Compressing objects: 100% (95/95), done.
remote: Total 55878 (delta 123), reused 173 (delta 107), pack-reused 55676
Receiving objects: 100% (55878/55878), 48.37 MiB | 3.48 MiB/s, done.
Resolving deltas: 100% (37796/37796), done.

So, my best assumption(s):

  • This new TKGi version sets the new --seccomp-default flag or something similar in some way (beta in Kubernetes 1.25, alpha in Kubernetes 1.22)
  • Ubuntu 22.04's git (?) can't run with the default SecComp (RuntimeDefault) and needs to be run Unconfined instead: my assumption is that libpthread isn't linked anymore and thus git is trying to fork the process in a way that is restricted by seccomp
  • ArgoCD will work again if we use the seccomp profile Unrestricted, although probably insecure.

It seems to work in every case except when RuntimeDefault + 22.10 is used now, but this is probably because we've upgraded recently TKGi from 1.12.4-build.14 to 1.13.8-build.5 (and thus bumped Kubernetes from v1.21.9+vmware.1 to v1.22.12+vmware.1).

Could this have been solved by this change then?

I'm very, very confused.
Top top it all, the error with git is not anymore:

error: cannot create async thread: Operation not permitted
fatal: fetch-pack: unable to fork off sideband demultiplexer

but:

fatal: unable to access 'https://github.com/argoproj/argo-cd/': getaddrinfo() thread failed to start

when running in an Ubuntu 22.10 container with the RuntimeDefault mode.

@quoc9x
Copy link

quoc9x commented Nov 25, 2022

We experienced the same error after upgrading from 2.2.x to 2.4.11.

In our case we had patched the deployment with the below patch. After removing it, the error disappeared and repo server could start up.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argocd-repo-server
spec:
  template:
    spec:
      securityContext:
        seccompProfile:
          type: RuntimeDefault

I have the same problem when I upgraded ArgoCD to version 2.5.2
This workaround works for me.
Thanks bro!

@markmcgookin
Copy link

markmcgookin commented Dec 6, 2022

Not sure if this helps anyone, but I am seeing this error on a few install using any of the installation scripts (ha or regular, namespace or just install) in Azure Kubernetes Service cluster: We are running kubernetes 1.21.2 on a few ubuntu 18.04 nodes.

EDIT: Ok... just tried this again on a new cluster. Still running ubuntu18.04 images, but on 1.23.12 this time and it seems to be working ok. Only difference I can really see is that the k8s version is bumped and this time I have 3 nodes instead of two.

@frizop
Copy link

frizop commented Dec 6, 2022

I have this same problem, I'm running the latest helm release, argo-cd-5.16.1

I've tried removing the security parts of the podSpec in the hopes it was some security setting like that. It's not. I've exec'd into the container to try writting to the /tmp directory and been successful, below is showing that:

k exec -it po/myargocd-mylocal-argocd-repo-server-9fffc7b9f-rjt6x -c repo-server -n argocd -- sh
$ pwd
/home/argocd
$ ls /tmp
$ touch /tmp/foo
$ stat /tmp/foo
  File: /tmp/foo
  Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
Device: 872h/2162d	Inode: 10224170    Links: 1
Access: (0644/-rw-r--r--)  Uid: (  999/  argocd)   Gid: (  999/  argocd)
Access: 2022-12-06 16:54:50.912589090 +0000
Modify: 2022-12-06 16:54:50.912589090 +0000
Change: 2022-12-06 16:54:50.912589090 +0000
 Birth: 2022-12-06 16:54:50.912589090 +0000

The following is a describe on the repo-server podspec that's failing:

Containers:
  repo-server:
    Container ID:  containerd://f7b8eaa09a9a43a61698dc98637d014e303ec4fd51feca4fd118097f0edd5035
    Image:         quay.io/argoproj/argocd:v2.5.3
    Image ID:      quay.io/argoproj/argocd@sha256:8283a9f06033c2377dc61b03daf4994a3ab961c53d79ed32b9aebadf79bb4858
    Ports:         8081/TCP, 8084/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      entrypoint.sh
    Args:
      argocd-repo-server
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    20
      Started:      Tue, 06 Dec 2022 09:28:45 -0600
      Finished:     Tue, 06 Dec 2022 09:28:51 -0600
    Ready:          False
    Restart Count:  3
    Liveness:       http-get http://:metrics/healthz%3Ffull=true delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:metrics/healthz delay=10s timeout=1s period=10s #success=1 #failure=3

version=v2.5.3+0c7de21

@abstract-entity
Copy link

I'm hosted by OVH, and upgrading my kubernetes cluster from 1.22 to 1.24 solve this issue

@cm0s
Copy link

cm0s commented Dec 14, 2022

I'm using the "official" argo-cd Helm chart to deploy ArgoCD on a K8s cluster. Unfortunately I cannot "unset" the seccompProfile value because of a bug in Helm when trying to overring subchart values (helm/helm#5184, helm/helm#9136, current pull request fixing this problem:helm/helm#11440). Normally, you just have to set the seccompProfile to "null".
This happen because I created a custom Helm chart which has a dependency the official ArgoCD helm chart.

So, for me the only solution was to set a value (at least this works when setting values for subcharts).
If someone encounters the same problem, just set the following value in your values.yaml file of your umbrella/parent chart:

argo-cd :   <= correspong to the name of my Helm "umbrella" chart, might be different for your
  repoServer:
    containerSecurityContext:
      seccompProfile:
        type: Unconfined

Hopping a better solution can be found in the near future because having to lower security to make the service work is not really a great workaround !

@masadaco
Copy link

We are hitting the same issue trying to get a new Argo CD with Vault installation up. Using the kustomize install from argocd-vault-plugin/manifests/cmp-configmap. GPG error no permission.

Node OS: CentOS Stream 8 Linux 4.18.0-394.el8.x86_64
VMware: 7.0.3

@matt-pei
Copy link

matt-pei commented Jan 5, 2023

I managed to get everything running. But I did a complete fresh setup using the install.yaml from v2.4.13. There were quite some changes so it's not easy to say which diff was the important one.

Hi brother, I also met the same problem. The final solution is to check whether your argocd version and kubernetes version correspond. Refer to the link https://argo-cd.readthedocs.io/en/stable/operator-manual/installation/

@mohahmed13
Copy link

Just in case it helps anyone in the future. We were running into this issue on only a couple of nodes where the repo server was being deployed. I saw this #9888 (comment) which hinted at a Docker version discrepancy. After upgrading Docker to latest version things started to work without any workarounds.

docker://20.10.17 was failing, docker://20.10.18 and higher is working.

@julian-waibel
Copy link

julian-waibel commented Mar 22, 2024

@mohahmed13: Thanks for the info! I will also document my case for the community, note the Docker version details:

My case

Deploying "official" Argo CD Helm chart v6.7.2, which uses Argo CD v2.10.3, to on-prem RKE1 (Rancher Kubernetes Engine) based Kubernetes clusters.

❌ Cluster/node setup not working out-of-the-box

Version: v1.24.17
OS: Ubuntu 20.04.2 LTS
Container Runtime:  Docker 20.10.7

Needs the seccompProfile: type: Unconfined fix described above to work, so that the Argo CD repo server pods stop keeping crashing.

✅ Cluster/node setup working out-of-the-box

Version: v1.24.17
OS: Ubuntu 20.04.4 LTS
Container Runtime:  Docker 20.10.14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests