Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Pot sometimes leaves mount-ins behind #272

Open
grembo opened this issue Sep 15, 2023 · 1 comment
Open

[BUG] Pot sometimes leaves mount-ins behind #272

grembo opened this issue Sep 15, 2023 · 1 comment
Labels

Comments

@grembo
Copy link
Collaborator

grembo commented Sep 15, 2023

Describe the bug
When using pot with nomad, nomad's special directory mount-ins stay behind.

To Reproduce
Run a basic nomad pot example (like nginx) and migrate it a couple of times (start/stop etc.).

After a while you will see something like this, even though only one container is running:

/var/tmp/nomad/alloc/3682d419-1017-3be9-5bae-11e4307ffee9/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_1b46f3a2_3682d419-1017-3be9-5bae-11e4307ffee9/m/local
/var/tmp/nomad/alloc/3682d419-1017-3be9-5bae-11e4307ffee9/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_1b46f3a2_3682d419-1017-3be9-5bae-11e4307ffee9/m/secrets
/var/tmp/nomad/alloc/b1f2fdba-21e7-f932-9f4b-b669fd81acde/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_cd56d63b_b1f2fdba-21e7-f932-9f4b-b669fd81acde/m/local
/var/tmp/nomad/alloc/b1f2fdba-21e7-f932-9f4b-b669fd81acde/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_cd56d63b_b1f2fdba-21e7-f932-9f4b-b669fd81acde/m/secrets
/var/tmp/nomad/alloc/f364feb0-c8af-b232-52b2-2bb3df073c20/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_9eb3303e_f364feb0-c8af-b232-52b2-2bb3df073c20/m/local
/var/tmp/nomad/alloc/f364feb0-c8af-b232-52b2-2bb3df073c20/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_9eb3303e_f364feb0-c8af-b232-52b2-2bb3df073c20/m/secrets
/var/tmp/nomad/alloc/8dd36275-c514-be0a-4925-efdd1447e5ec/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_3a708d45_8dd36275-c514-be0a-4925-efdd1447e5ec/m/local
/var/tmp/nomad/alloc/8dd36275-c514-be0a-4925-efdd1447e5ec/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_3a708d45_8dd36275-c514-be0a-4925-efdd1447e5ec/m/secrets
/var/tmp/nomad/alloc/04c43aae-b5b9-1fe1-b2d4-9bcbd39b6997/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_2db4780c_04c43aae-b5b9-1fe1-b2d4-9bcbd39b6997/m/local
/var/tmp/nomad/alloc/04c43aae-b5b9-1fe1-b2d4-9bcbd39b6997/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_2db4780c_04c43aae-b5b9-1fe1-b2d4-9bcbd39b6997/m/secrets
/var/tmp/nomad/alloc/32f9430c-5392-ebf2-609a-60b57ba3eae5/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_4e4600ed_32f9430c-5392-ebf2-609a-60b57ba3eae5/m/local
/var/tmp/nomad/alloc/32f9430c-5392-ebf2-609a-60b57ba3eae5/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_4e4600ed_32f9430c-5392-ebf2-609a-60b57ba3eae5/m/secrets

Expected behavior
No leftover mounts

Additional context
My suspicion is that umounts fail when the jail stops (maybe due to some processes still using the mountpoint). Later the ZFS filesystem is purged. Normal manual umount of these mounts work ok.

@grembo grembo added the bug label Sep 15, 2023
@grembo
Copy link
Collaborator Author

grembo commented Dec 21, 2023

It seems like this depends on the order in which nomad-pot-driver issues certain commands:

Example of a command sequence that left mounts behind:

2023-12-21T11:38:25+00:00 10.20.20.231 pot[42497]: pot-destroy -p myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e myservice -F 
2023-12-21T11:38:25+00:00 10.20.20.231 pot[42476]: pot-set-status -p myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e -s stopped
2023-12-21T11:38:24+00:00 10.20.20.231 pot[40184]: pot-destroy -p myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e -F
2023-12-21T11:38:24+00:00 10.20.20.231 pot[40017]: pot-set-status -p myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e -s stopping
2023-12-21T11:38:18+00:00 10.20.20.231 pot[39032]: pot-set-status -p myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e -s stopping
2023-12-21T11:38:18+00:00 10.20.20.231 pot[38992]: pot-stop myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e myservice

Two things are of interest here:

So it looks like, the pot is stopped and destroyed twice. The second stopping call is after 5s, which looks like a nomad timeout. So the solution for this might be inside nomad, but it also feels like there's a lack of locking involved, being able to call stop and destroy multiple times in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant