Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rep unable to removed cached items after management API restart #852

Open
brunograz opened this issue Nov 8, 2023 · 1 comment
Open
Assignees
Labels

Comments

@brunograz
Copy link

Summary

We've observed recently that Diego Cells throw error logs (see below) once the management API is updated during a CloudFoundry upgrade.
These error logs are thrown in a loop until the diego cell is update / recreated during the lifecycle.
Not every Diego Cell throws these error logs initially but once the drain process starts the same can be observed in the VM where the drain is running.

Both cases trigger a prolonged drain for rep during the lifecycle which waits until the configured timeout before killing the process and proceeding with the update.

Restarting the rep process seems to fix this issue and the error logs are not thrown anymore. Furthermore, restarting rep before the drain happens also fixes the issue with the prolonged update as rep is able to exit properly.

{"timestamp":"2023-10-25T09:46:46.989872950Z","level":"error","source":"rep","message":"rep.evacuation-cleanup.delete-container.failed-to-delete-garden-container","data":{"error":"failed to cleanup bindmount artifacts","guid":"99008969-8540-4dd7-7249-0c72","session":"13.4"}} {"timestamp":"2023-10-25T09:46:46.989892741Z","level":"error","source":"rep","message":"rep.evacuation-cleanup.failed-to-delete-container","data":{"container-guid":"99008969-8540-4dd7-7249-0c72","error":"failed to cleanup bindmount artifacts","session":"13"}} {"timestamp":"2023-10-25T09:46:46.989758325Z","level":"error","source":"rep","message":"rep.evacuation-cleanup.delete-container.containerstore.destroy.node-destroy.failed-releasing-cache-key","data":{"Guid":"99008969-8540-4dd7-7249-0c72","cache-key":"buildpack-cflinuxfs3-lifecycle","dir":"/var/vcap/data/rep/shared/garden/download_cache/38b2a7ccd052cc6ca87458d02a7c6c7a-1695808881784314774-12.d","error":"Entry Not Found","guid":"99008969-8540-4dd7-7249-0c72","session":"13.4.1.1"}} {"timestamp":"2023-10-25T09:46:46.989831937Z","level":"error","source":"rep","message":"rep.evacuation-cleanup.delete-container.containerstore.destroy.failed-to-destroy-container","data":{"Guid":"99008969-8540-4dd7-7249-0c72","error":"failed to cleanup bindmount artifacts","guid":"99008969-8540-4dd7-7249-0c72","session":"13.4.1"}} {"timestamp":"2023-10-25T09:46:46.989770441Z","level":"error","source":"rep","message":"rep.evacuation-cleanup.delete-container.containerstore.destroy.node-destroy.failed-to-release-cached-deps","data":{"Guid":"99008969-8540-4dd7-7249-0c72","error":"Entry Not Found","guid":"99008969-8540-4dd7-7249-0c72","session":"13.4.1.1"}} {"timestamp":"2023-10-25T09:46:38.264401894Z","level":"info","source":"guardian","message":"guardian.destroy.start","data":{"handle":"99008969-8540-4dd7-7249-0c72","session":"28655456"}}

Steps to Reproduce

A stemcell update on the management plane is sufficient to observe this behavior. We are currently looking at the specific process that triggers this.

Environment Details

The issue is observed since upgrading from cf-deployment 29.0.0 to 30.5.0

name: capi
version: 1.152.0 -
version: 1.153.0 +
name: diego
version: 2.76.0 -
version: 2.78.0 +
name: garden-runc
version: 1.29.0 -
version: 1.33.0 +

Additional information

Further information: https://cloudfoundry.slack.com/archives/C2U7KA7M4/p1693997791135449

@brunograz brunograz added the bug label Nov 8, 2023
@MarcPaquette
Copy link
Contributor

Hi @brunograz, are you still experiencing this issue with the latest versions of Diego, CAPI and Garden-runc?

@MarcPaquette MarcPaquette self-assigned this May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Pending Review | Discussion
Development

No branches or pull requests

2 participants