New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
argo-repo-server issue: gpg ... --gen-key failed exit status 2 #9809
Comments
Is it the same machine before and after the upgrade and does it have git access? |
It is the same machine, same setup. Just upgraded from 2.3.X to 2.4.X. I'm facing the same issue with all 2.4.X versions. After the downgrade to 2.3.5 repo server works as expected. |
had the same issue when trying to upgrade from 2.2.10. |
I do have the same issue here. Are you also having istio activated? |
No, I'm not using istio. |
As far as I've read the documentation:
Has anybody noticed that this service account needs an addtional role and rolebinding. For me it seems that this new service account has to less rights. Due to the fact that there is no seperate Role and Rolebinding. It takes the setting of your cluster which you gave or the default is system:serviceaccounts or system:authenticated. |
This might not be an issue in our setup, but thanks for the hint. I'll check that. |
I will try to debug and let you know with the correct Rolebinding which is needed to start this pod |
I can confirm it's definitely missing special rights for this service account. I granted now everything for this service account and now the argocd-repo-server is starting. I'm now digging into that we have a least privilege role. |
Wasn't able to reduce the api groups or resources yet. But this config is working. |
@sass1997 the permissionless service account is working for me in both my local docker-desktop setup and in Intuit Argo CD instances. I'm not able to reproduce the bug. |
I have just rebuilt a k8s cluster and install argocd using kustomize
|
I have just tested to install version 2.3.5 and argocd-repo-server works |
@sass1997 Not even adding the role and role binding solves the issue for me. Still get exactly the same error. Hm. |
I am seeing the same issue too, but also only in the production cluster not on my local minikube. Adjusting the roles also does not change anything. Setting the log level to debug reveals the error:
@nice-pink is that the same for you ? |
I'm seeing the original error still.
The whole argo setup is running in a managed OVH cluster. Haven't tried in minikube or any similar cluster so far. |
Mhh, might be two unrelated issues then, I will keep you posted if I find out more |
For us it was pod security policies that needed updating- try this for a quick test giving the namespace priviledged access
|
Hm, also no change for us. So far only disabling gpg as described in https://argo-cd.readthedocs.io/en/stable/user-guide/gpg-verification/ removes the error. |
For me neither, however changing the underlying VM image of the worker nodes did resolve the issue for us. On a standard ubuntu based image everything is fine now, without changing anything else |
@pleic which Ubuntu version do you use for the base image? |
I am using ubuntu bionic, the current cloud image |
Experiencing the same issue here
Thanks @nice-pink for this workaround 👍🏻 |
We experienced the same error after upgrading from 2.2.x to 2.4.11. In our case we had patched the deployment with the below patch. After removing it, the error disappeared and repo server could start up.
|
The same issue with version v2.4.12, this workaround works for me! Thanks |
We @swisspost are also facing this issue on VMware TKGI 1.22 (= TKGI TKGI runs really old worker OS: $ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
08f... Ready <none> 4h20m v1.22.6+vmware.1 172.23.129.8 172.23.129.8 Ubuntu 16.04.7 LTS 4.15.0-176-generic docker://20.10.9 I then suspected that Ubuntu 16 worker with Ubuntu 22 base image has some compat issues (Argo 2.4 container image is based on Ubuntu 22). Unfortunately this theory is wrong - I booted a single-node k3s cluster with Ubuntu 16 LTS, docker and k3s: $ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ubuntu Ready control-plane,master 16m v1.22.6+k3s1 192.168.91.205 <none> Ubuntu 16.04.7 LTS 4.4.0-186-generic docker://20.10.7
$ kubectl -n argocd get po
NAME READY STATUS RESTARTS AGE
argocd-redis-6bb9c5d89f-kh4jj 1/1 Running 0 5m57s
argocd-application-controller-0 1/1 Running 0 5m57s
argocd-repo-server-7d97f5cbdb-5tqjc 1/1 Running 0 5m56s
argocd-server-9f646bf78-qfsf4 1/1 Running 0 5m56s
$ helm ls -n argocd
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
argocd argocd 1 2022-09-23 17:02:28.2194322 +0200 CEST deployed argo-cd-4.9.16 v2.4.7 Argo CD 2.4.7 is working fine here. I have no clue what else to try and filed a VMware issue in our support portal. 🙄 |
We @swisscom have the same issue with a Kubernetes cluster based on VMware Tanzu
Whoa. Log Line
|
Nice! Thanks for confirming that❤️. We postponed the upgrade intent for Argo on Tanzu and wait for TKGI 1.16 in Q1 2023 🙄 |
I managed to get everything running. But I did a complete fresh setup using the |
Seems to be related to https://dev.gnupg.org/T2203 somehow:
|
I'm getting closer to the issue I think: VMware Tanzu (
|
And using the workaround proposed somewhere else of setting There is probably something both GPG and git do, that for some reason on Ubuntu 16.04 / VMware Tanzu causes issues:
https://github.com/git/git/blob/v2.34.1/run-command.c#L1277-L1280 https://github.com/git/git/blob/v2.34.1/fetch-pack.c#L859-L860 Running GIT with
(not helpful) |
Downgrading
|
@denysvitali I'd be curious how far forward you can move in the 2.2.x series before encountering the issue again. Downgrading is a pretty bad workaround, since there are several serious CVEs related to the repo-server which were patched in later 2.2 releases. But might help us isolate a problematic commit, if there is one. |
@crenshaw-dev: See #9888 and specifically: It seems like this issue is caused by the combination of the new base image (Ubuntu 22.04) and VMware Tanzu (?). |
What IaaS is TKGi running on? AWS/GCP/Azure/vSphere/Openstack or something else? |
I think vSphere. Anyways, see the analysis here and the Ubuntu tracker issue I've opened: it seems like the issue is with pthread + Ubuntu 22.04 on an Ubuntu 16.04.7 LTS host running in TKGi. The affected TKGi versions so far seems to be:
In a few minutes I'll try with |
We had the same issue (argocd-repo-server erroring out with GPG errors) when installing from kustomize from cluster-install manifest. The setup works off the shelf for minikube, but for a "real" cluster had this issue. Just removing the
|
just removing the |
We had 2.4.11 running in a temporary playground environment where it ran fine. Then we started comparing the setups and found the seccompProfile was missing in the playground environment. We removed it in our production environments and Bingo! |
to sum that up, because i had the same issue (for me it was podsecuritypolicies), working from @sass1997 solution to allow the repo-server anything. creating a role with all permissions with alcideo(s) rbac-tool and assigned to argocd-repo-server. reduced permissions bulkwise, applied and restarted argocd-repo-server after every change lead to this role ---
kind: Role
metadata:
namespace: argocd
name: argocd-repo-server
rules:
- apiGroups: ["policy"]
resources: ["podsecuritypolicies"]
verbs: ["use"] checking psp's then all serviceaccounts in namespace argocd get a restricted psp (this is one of several .yaml applied after argocd's install.yaml). # permit all service accounts to use psp within argocd namespace
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: argocd-psp-restricted-rolebinding
namespace: argocd
subjects:
- kind: Group
apiGroup: rbac.authorization.k8s.io
name: system:serviceaccounts:argocd
roleRef:
kind: ClusterRole
name: psp:restricted
apiGroup: rbac.authorization.k8s.i creating a new (privileged) psp 'argocd-repo-server' (from another psp already in place), modifying the role from above to only let argocd-repo-server use the new psp. ---
kind: Role
metadata:
namespace: argocd
name: argocd-repo-server
rules:
- apiGroups: ["policy"]
resources: ["podsecuritypolicies"]
verbs: ["use"]
resourceNames: ["argocd-repo-server"] # this one forces argocd-repo-server to use the new psp then modify this psp to (in the end having the same permissions as the restricted psp (not working) to find out, which setting causes the issue lead to 'seccomp: runtime/default'. so this is what i ended up with (i think an OK-tradeoff from a security-perspective, way better than running an older version of argocd with older base-images having way more vulnerabilities). maybe this can be a viable (temporary) workaround (@denysvitali or @mkilchhofer) as well (albeit not adressing a root issue others are working on to find out)? # permit all service accounts to use psp within argocd namespace
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: argocd-psp-restricted-rolebinding
namespace: argocd
subjects:
- kind: Group
apiGroup: rbac.authorization.k8s.io
name: system:serviceaccounts:argocd
roleRef:
kind: ClusterRole
name: psp:restricted
apiGroup: rbac.authorization.k8s.io
---
# this is a workaround for argocd-repo-server only (> 2.3.6) with tkgi 1.11 onwards. assumedly up to 1.13 (or newer versions not upgrading ubuntu)
### issue:
# as a default all serviceaccounts in namespace argocd get assigned the default restricted psp from tkgi (with 'argocd-rolebinding-psp.yaml' in argocd-infra)
# this does no longer work with argocd-repo-server from 2.3.7 onwards
#
# details for psp 'pks-restricted' : https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid-Integrated-Edition/1.15/tkgi/GUID-pod-security-policy.html
#
### solution:
# create a new psp for argocd-repo-server.
# (!) the only difference to 'pks-restriced' is that in this new psp the annotation apparmor.security.beta.kubernetes.io/allowedProfileName is NOT
# set to 'runtime/default' but to '*' (meaning, no seccomp-profile is used).
# create a new role/-binding for argocd-repo-server and make it use the new psp
#
### note:
# psp && role && rolebinding can get removed some time in the future (with tkgi 1.15?) ....
#
# (1) psp
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
name: argocd-repo-server
spec:
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
fsGroup:
ranges:
- max: 65535
min: 1
rule: MustRunAs
privileged: false
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
supplementalGroups:
ranges:
- max: 65535
min: 1
rule: MustRunAs
volumes:
- configMap
- emptyDir
- projected
- secret
- downwardAPI
- persistentVolumeClaim
---
# (2) role/-binding
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: argocd
name: argocd-repo-server
rules:
- apiGroups: ["policy"]
resources: ["podsecuritypolicies"]
verbs: ["use"]
resourceNames: ["argocd-repo-server"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: argocd-repo-server
namespace: argocd
subjects:
- kind: ServiceAccount
name: argocd-repo-server
apiGroup: ""
roleRef:
kind: Role
name: argocd-repo-server
apiGroup: "" |
@denysvitali Could you please try out in your env by excluding the |
ManifestapiVersion: v1
kind: Pod
metadata:
name: git-test-ubuntu-seccomp
spec:
containers:
- name: ubuntu-21-04-runtimedefault
image: private-registry.example.com/git-debug:21.04
command:
- tail
- -f
- /dev/null
securityContext:
seccompProfile:
type: RuntimeDefault
- name: ubuntu-22-04-runtimedefault
image: private-registry.example.com/git-debug:22.04
command:
- tail
- -f
- /dev/null
securityContext:
seccompProfile:
type: RuntimeDefault
- name: ubuntu-21-04
image: private-registry.example.com/git-debug:21.04
command:
- tail
- -f
- /dev/null
- name: ubuntu-22-04
image: private-registry.example.com/git-debug:22.04
command:
- tail
- -f
- /dev/null
- name: ubuntu-21-04-unconfined
image: private-registry.example.com/git-debug:21.04
command:
- tail
- -f
- /dev/null
securityContext:
seccompProfile:
type: Unconfined
- name: ubuntu-22-04-unconfined
image: private-registry.example.com/git-debug:22.04
command:
- tail
- -f
- /dev/null
securityContext:
seccompProfile:
type: Unconfined Results
So, my best assumption(s):
It seems to work in every case except when Could this have been solved by this change then? I'm very, very confused.
but:
when running in an Ubuntu 22.10 container with the |
I have the same problem when I upgraded ArgoCD to version 2.5.2 |
Not sure if this helps anyone, but I am seeing this error on a few install using any of the installation scripts (ha or regular, namespace or just install) in Azure Kubernetes Service cluster: We are running kubernetes 1.21.2 on a few ubuntu 18.04 nodes. EDIT: Ok... just tried this again on a new cluster. Still running ubuntu18.04 images, but on 1.23.12 this time and it seems to be working ok. Only difference I can really see is that the k8s version is bumped and this time I have 3 nodes instead of two. |
I have this same problem, I'm running the latest helm release, argo-cd-5.16.1 I've tried removing the security parts of the podSpec in the hopes it was some security setting like that. It's not. I've exec'd into the container to try writting to the /tmp directory and been successful, below is showing that:
The following is a describe on the repo-server podspec that's failing:
version=v2.5.3+0c7de21 |
I'm hosted by OVH, and upgrading my kubernetes cluster from 1.22 to 1.24 solve this issue |
I'm using the "official" argo-cd Helm chart to deploy ArgoCD on a K8s cluster. Unfortunately I cannot "unset" the seccompProfile value because of a bug in Helm when trying to overring subchart values (helm/helm#5184, helm/helm#9136, current pull request fixing this problem:helm/helm#11440). Normally, you just have to set the seccompProfile to "null". So, for me the only solution was to set a value (at least this works when setting values for subcharts).
Hopping a better solution can be found in the near future because having to lower security to make the service work is not really a great workaround ! |
We are hitting the same issue trying to get a new Argo CD with Vault installation up. Using the kustomize install from argocd-vault-plugin/manifests/cmp-configmap. GPG error no permission. Node OS: CentOS Stream 8 Linux 4.18.0-394.el8.x86_64 |
Hi brother, I also met the same problem. The final solution is to check whether your argocd version and kubernetes version correspond. Refer to the link https://argo-cd.readthedocs.io/en/stable/operator-manual/installation/ |
Just in case it helps anyone in the future. We were running into this issue on only a couple of nodes where the repo server was being deployed. I saw this #9888 (comment) which hinted at a Docker version discrepancy. After upgrading Docker to latest version things started to work without any workarounds.
|
@mohahmed13: Thanks for the info! I will also document my case for the community, note the Docker version details: My caseDeploying "official" Argo CD Helm chart ❌ Cluster/node setup not working out-of-the-box
Needs the seccompProfile: type: Unconfined fix described above to work, so that the Argo CD repo server pods stop keeping crashing. ✅ Cluster/node setup working out-of-the-box
|
Checklist:
argocd version
.Describe the bug
After upgrading argo-cd from version v2.3.5 to v.2.4.3 the argo-repo-server stopped working with the logs:
This leads to Argo CD UI showing error:
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.3.43.220:8081: connect: connection refused"
To Reproduce
For me it was just the upgrade.
Expected behavior
argo-repo-server starts up without errors.
Screenshots
Version
The text was updated successfully, but these errors were encountered: