Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-hosted runner(EKS) Docker Cache Not working(w. efs) #3303

Open
4 tasks done
95jinhong opened this issue Feb 26, 2024 · 2 comments
Open
4 tasks done

Self-hosted runner(EKS) Docker Cache Not working(w. efs) #3303

95jinhong opened this issue Feb 26, 2024 · 2 comments
Labels
bug Something isn't working community Community contribution needs triage Requires review from the maintainers

Comments

@95jinhong
Copy link

Checks

Controller Version

0.23.3

Deployment Method

ArgoCD

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Configure and use a self-hosted runner on top of EKS.
2. we want to share and use the cache on Docker Build, so we want to share the /var/lib/docker subfolder with EFS or configure a separate cache folder with EFS and declare the cache in the docker container with cache-from and cache-to options.
3. When I build, it says that the cache folder has no permissions.
4. there is no part in RunnderDeployment Object to declare docker init contianer, so I am not sure how to solve it.

Describe the bug

  1. I'm using Self-hosted Runner with Goldenimage with Docker and buildx installed beforehand.
  2. to implement Docker Cache, I created /var/lib/docker-cache through EFS and declared the workflow as below.
      - name: Build and Push Image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ env.ECR_URI }}:${{ env.image_tag }}
          build-args: |
            NPM_TOKEN=${{ secrets.READONLY_GPR }}
            BUILD_SERVICE_NAME=${{ env.BUILD_SERVICE_NAME }}
          cache-from: type=local,src=/var/lib/docker-cache
          cache-to: type=local,dest=/var/lib/docker-cache,mode=max
  1. However, it says that I don't have permission to the /var/lib/docker-cache folder, as shown below. I'm not sure how to grant that permission.
## workflow log
Run docker/build-push-action@v5
GitHub Actions runtime token ACs
Docker info
Proxy configuration
Buildx version
/usr/local/bin/docker buildx build --build-arg NPM_TOKEN=*** --build-arg BUILD_SERVICE_NAME=api --cache-from type=local,src=/var/lib/docker-cache --cache-to type=local,dest=/var/lib/docker-cache,mode=max --iidfile /runner/_work/_temp/docker-actions-toolkit-uLcqtI/iidfile --provenance mode=min,inline-only=true,builder-id=https://github.com/boostbrothers/backend-ci-test/actions/runs/8044060472 --tag ***.dkr.ecr.ap-northeast-2.amazonaws.com/backend-ci-test/test:54d2[32](https://github.com/boostbrothers/backend-ci-test/actions/runs/8044060472/job/21967103534#step:13:33)49 --metadata-file /runner/_work/_temp/docker-actions-toolkit-uLcqtI/metadata-file --push .
ERROR: mkdir /var/lib/docker-cache: permission denied
Error: buildx failed with: ERROR: mkdir /var/lib/docker-cache: permission denied

Describe the expected behavior

I want the /var/lib/docker container to be used for each Pod (Job) and I want them to share the docker local cache with each other.

Additional Context

My runnerdeployment.yaml looks like this

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: k8s-action-runner-backend-test
  namespace: actions-runner-system
spec:
  template:
    spec:
      ...
      resources:
        requests:
          cpu: 1500m
          memory: 5Gi
      dockerVolumeMounts:
        - mountPath: /var/lib/docker-cache
          name: runner-npm-dockercache-storage
        - mountPath: /var/lib/docker
          name: docker-extra
      volumeMounts:
        - mountPath: /opt/hostedtoolcache
          name: runner-npm-cache-storage
      volumes:
        - name: runner-npm-cache-storage
          persistentVolumeClaim:
            claimName: k8s-action-runner-backend-test-ci
        - name: runner-npm-dockercache-storage
          persistentVolumeClaim:
            claimName: k8s-action-runner-backend-test-ci-dockercache
        - name: docker-extra
          hostPath:
            path: /mnt/docker-extra
            type: DirectoryOrCreate


### Controller Logs

```shell
There doesn't appear to be any special logs.

Runner Pod Logs

## docker container log

time="2024-02-26T05:53:38.304870260Z" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
time="2024-02-26T05:53:38.304926735Z" level=info msg="containerd successfully booted in 0.038174s"
time="2024-02-26T05:53:54.703885619Z" level=info msg="Loading containers: start."
time="2024-02-26T05:53:54.909627125Z" level=info msg="Loading containers: done."
time="2024-02-26T05:53:54.920842001Z" level=info msg="Docker daemon" commit=311b9ff graphdriver=overlay2 version=24.0.7
time="2024-02-26T05:53:54.921021834Z" level=info msg="Daemon has completed initialization"
time="2024-02-26T05:53:54.947228154Z" level=info msg="API listen on /run/docker.sock"

time="2024-02-26T06:09:58.167664345Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2024-02-26T06:09:58.167730410Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.pause\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-02-26T06:09:58.168723970Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2024-02-26T06:09:58.168780188Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-02-26T06:10:00.636146598Z" level=info msg="ignoring event" container=8e7f9e894c12ff796b51ec341e39091b490aeeb3409f261f0e1ab1afa970056b module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2024-02-26T06:10:00.636124578Z" level=info msg="shim disconnected" id=8e7f9e894c12ff796b51ec341e39091b490aeeb3409f261f0e1ab1afa970056b namespace=moby
time="2024-02-26T06:10:00.636237278Z" level=warning msg="cleaning up after shim disconnected" id=8e7f9e894c12ff796b51ec341e39091b490aeeb3409f261f0e1ab1afa970056b namespace=moby
time="2024-02-26T06:10:00.636253531Z" level=info msg="cleaning up dead shim" namespace=moby
Prestop hook started
Waiting for dockerd to start
15
Prestop hook stopped
time="2024-02-26T06:10:14.209947178Z" level=info msg="Processing signal 'terminated'"
time="2024-02-26T06:10:14.211190041Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
time="2024-02-26T06:10:14.211758404Z" level=info msg="Daemon shutdown complete"
time="2024-02-26T06:10:14.211835623Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
time="2024-02-26T06:10:14.211855654Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
rpc error: code = NotFound desc = an error occurred when try to find container "2a7e0e4509fa8609f62baf7f753337dc169841eb7afe41b0bd5de00d545decdd": not found

runner container log

√ Connected to GitHub

Current runner version: '2.313.0'
2024-02-26 05:54:20Z: Listening for Jobs
2024-02-26 06:09:28Z: Running job: api / build
2024-02-26 06:10:12Z: Job api / build completed with result: Failed
√ Removed .credentials
√ Removed .runner
Runner listener exit with 0 return code, stop the service, no retry needed.
Exiting runner...
2024-02-26 06:10:12.912  NOTICE --- Runner init exited. Exiting this process with code 0 so that the container and the pod is GC'ed Kubernetes soon.
@95jinhong 95jinhong added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Feb 26, 2024
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@nikola-jokic nikola-jokic added community Community contribution and removed gha-runner-scale-set Related to the gha-runner-scale-set mode labels Feb 26, 2024
@AlonAvrahami
Copy link

#3244 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community Community contribution needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

3 participants