Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to mount the PV when using Anthos Service Mesh #40

Open
ybelleguic opened this issue Jun 27, 2023 · 5 comments
Open

Fail to mount the PV when using Anthos Service Mesh #40

ybelleguic opened this issue Jun 27, 2023 · 5 comments
Labels
bug Something isn't working question Further information is requested

Comments

@ybelleguic
Copy link

Hello,

I'm encoutering issue when mounting a bucket as a PV with Anthos Service Mesh. Please find the following yaml at the end of the issue. It works perfectly fine when istio injection is disabled.

  Type     Reason       Age              From               Message
  ----     ------       ----             ----               -------
  Normal   Scheduled    14s              default-scheduler  Successfully assigned nginx/nginx-d576dc799-6dmvs to xxxxxxxxxxx
  Normal   Pulled       11s              kubelet            Container image "gcr.io/gke-release/asm/proxyv2:1.15.7-asm.8" already present on machine
  Normal   Created      11s              kubelet            Created container istio-init
  Normal   Started      11s              kubelet            Started container istio-init
  Normal   Pulled       10s              kubelet            Container image "gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter:v0.1.3-gke.0@sha256:854e1aa1178dc3f7e3ec5fa03cea5e32f0385ff6230efd836a22e86beb876740" already present on machine
  Normal   Created      10s              kubelet            Created container gke-gcsfuse-sidecar
  Normal   Started      9s               kubelet            Started container gke-gcsfuse-sidecar
  Warning  Failed       2s               kubelet            Error: failed to generate container "77ccfad98f48aa01e248fed7e7a444e14a348b06bc55531a158a14462c4b406e" spec: failed to generate spec: failed to stat "/var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount": stat /var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount: transport endpoint is not connected
  Normal   Pulled       2s               kubelet            Container image "gcr.io/gke-release/asm/proxyv2:1.15.7-asm.8" already present on machine
  Normal   Created      2s               kubelet            Created container istio-proxy
  Normal   Started      2s               kubelet            Started container istio-proxy
  Warning  Unhealthy    1s               kubelet            Readiness probe failed: Get "http://100.64.128.58:15021/healthz/ready": dial tcp 100.64.128.58:15021: connect: connection refused
  Warning  Failed       1s               kubelet            Error: failed to generate container "8c309e092fd45b084460c54349deff6d01e55bfd8b4db97e5041032dc3a10bca" spec: failed to generate spec: failed to stat "/var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount": stat /var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount: transport endpoint is not connected
  Normal   Pulled       0s (x3 over 9s)  kubelet            Container image "nginx:1.14.2" already present on machine
  Warning  FailedMount  0s (x2 over 1s)  kubelet            MountVolume.SetUp failed for volume "gcs-fuse-csi-pv" : rpc error: code = Internal desc = the sidecar container failed with error: mountWithArgs: failed to open connection - getConnWithRetry: get token source: DefaultTokenSource: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
gcsfuse exited with error: exit status 1
  Warning  Failed  0s  kubelet  Error: failed to generate container "5b5229a1b7ccaf54885b2dbbe34b1ec0d41e42d783934c30e49c2b7e816019eb" spec: failed to generate spec: failed to stat "/var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount": stat /var/lib/kubelet/pods/0b6ace6a-5d43-445b-8381-fb2e6da75f15/volumes/kubernetes.io~csi/gcs-fuse-csi-pv/mount: transport endpoint is not connected
apiVersion: v1
kind: PersistentVolume
metadata:
  name: gcs-fuse-csi-pv
spec:
  accessModes:
  - ReadOnlyMany
  capacity:
    storage: 5Gi
  storageClassName: static-files-bucket
  claimRef:
    namespace: nginx
    name: gcs-fuse-csi-static-pvc
  mountOptions:
    - implicit-dirs
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: my-bucket
    readOnly: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gcs-fuse-csi-static-pvc
  namespace: nginx
spec:
  accessModes:
  - ReadOnlyMany
  resources:
    requests:
      storage: 5Gi
  volumeName: gcs-fuse-csi-pv
  storageClassName: static-files-bucket
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nginx
  namespace: nginx
  annotations:
    iam.gke.io/gcp-service-account: nginx-gcs@{PROJECT_ID}.iam.gserviceaccount.com
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2 # tells deployment to run 2 pods matching the template
  template:
    metadata:
      annotations:
        gke-gcsfuse/volumes: "true"
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        volumeMounts:
        - name: gcs-fuse-csi-static
          mountPath: /data
          readOnly: true
      serviceAccountName: nginx
      volumes:
      - name: gcs-fuse-csi-static
        persistentVolumeClaim:
          claimName: gcs-fuse-csi-static-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: nginx
spec:
  ports:
  - name: http
    port: 80
  selector:
    app: nginx
---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: googleapi
  namespace: nginx
spec:
  hosts:
  - googleapis.com
  location: MESH_EXTERNAL
  ports:
  - name: https
    number: 443
    protocol: HTTPS
  resolution: DNS
@songjiaxun songjiaxun added the bug Something isn't working label Jul 13, 2023
@songjiaxun
Copy link
Collaborator

Hi @ybelleguic , I could not reproduce the error on my end. The error mountWithArgs: failed to open connection - getConnWithRetry: get token source: DefaultTokenSource: google: could not find default credentials. indicates that the service account was not setup correctly. Could you double check the doc https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/blob/main/docs/authentication.md and make sure the Workload Identity is setup correctly?

@songjiaxun songjiaxun added the question Further information is requested label Aug 11, 2023
@zhangluva
Copy link

I have exactly the same errors from the sidecar. Does this related to the federated workload identity mentioned here? My workload identity pool has federation setup and I think Anthos probably also uses federation, that seems to be common across 3 different issues.
I tried to start a container using gcr.io/google.com/cloudsdktool/cloud-sdk:latest with the same service account and verified I am able to list/upload/download from the GCS bucket. So service account/IAM/permissions are all setup correctly.

@ybelleguic
Copy link
Author

Hello,

workload identity was setup correctly on my side.

my problem was related to the outboundTrafficPolicy mode set in the cluster. When the mode is set to REGISTRY_ONLY, we have to declare an istio ServiceEntry for storage.googleapis.com and add the annotation traffic.sidecar.istio.io/excludeOutboundIPRanges: "169.254.169.254/32" on the pods 1.

So I guess this issue can be closed ?

@songjiaxun
Copy link
Collaborator

Ah I see, thanks @ybelleguic for the troubleshooting step!

@zhangluva , could you follow this step and retry on your side? If it helps, please let me know, and I will update the documentation. Thank you!

@zhangluva
Copy link

Thanks @songjiaxun for your quick reply. I did go though the IAM and permission settings and everything looked good. Following are my steps to verify IAM/permission.

  • Followed instruction here to enable FUSE on my cluster and prepared service accounts (GSA and KSA) and bucket.
  • When creating the pod, got the same errors as OP posted.
  • In the same namespace, created another pod
    • Using the service account (KSA) created earlier
    • Using image gcr.io/google.com/cloudsdktool/cloud-sdk:latest
    • No reference to fuse
  • Once the pod is running, got a shell on the pod and run gcloud storage commands against the prepared bucket
    • Verified KSA can list the bucket
    • Verified KSA can download object from bucket
    • Verified KSA can upload object to bucket

So I don't think it's an IAM permission issue. K8s service account impersonate GCP service account and then access GCS bucket all worked as expected if not using the sidecar.

Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants