[BUG] Pot sometimes leaves mount-ins behind #272

grembo · 2023-09-15T13:00:47Z

Describe the bug
When using pot with nomad, nomad's special directory mount-ins stay behind.

To Reproduce
Run a basic nomad pot example (like nginx) and migrate it a couple of times (start/stop etc.).

After a while you will see something like this, even though only one container is running:

/var/tmp/nomad/alloc/3682d419-1017-3be9-5bae-11e4307ffee9/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_1b46f3a2_3682d419-1017-3be9-5bae-11e4307ffee9/m/local
/var/tmp/nomad/alloc/3682d419-1017-3be9-5bae-11e4307ffee9/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_1b46f3a2_3682d419-1017-3be9-5bae-11e4307ffee9/m/secrets
/var/tmp/nomad/alloc/b1f2fdba-21e7-f932-9f4b-b669fd81acde/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_cd56d63b_b1f2fdba-21e7-f932-9f4b-b669fd81acde/m/local
/var/tmp/nomad/alloc/b1f2fdba-21e7-f932-9f4b-b669fd81acde/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_cd56d63b_b1f2fdba-21e7-f932-9f4b-b669fd81acde/m/secrets
/var/tmp/nomad/alloc/f364feb0-c8af-b232-52b2-2bb3df073c20/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_9eb3303e_f364feb0-c8af-b232-52b2-2bb3df073c20/m/local
/var/tmp/nomad/alloc/f364feb0-c8af-b232-52b2-2bb3df073c20/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_9eb3303e_f364feb0-c8af-b232-52b2-2bb3df073c20/m/secrets
/var/tmp/nomad/alloc/8dd36275-c514-be0a-4925-efdd1447e5ec/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_3a708d45_8dd36275-c514-be0a-4925-efdd1447e5ec/m/local
/var/tmp/nomad/alloc/8dd36275-c514-be0a-4925-efdd1447e5ec/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_3a708d45_8dd36275-c514-be0a-4925-efdd1447e5ec/m/secrets
/var/tmp/nomad/alloc/04c43aae-b5b9-1fe1-b2d4-9bcbd39b6997/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_2db4780c_04c43aae-b5b9-1fe1-b2d4-9bcbd39b6997/m/local
/var/tmp/nomad/alloc/04c43aae-b5b9-1fe1-b2d4-9bcbd39b6997/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_2db4780c_04c43aae-b5b9-1fe1-b2d4-9bcbd39b6997/m/secrets
/var/tmp/nomad/alloc/32f9430c-5392-ebf2-609a-60b57ba3eae5/www1/local      26740536       816   26739720     0%    /opt/pot/jails/www1_4e4600ed_32f9430c-5392-ebf2-609a-60b57ba3eae5/m/local
/var/tmp/nomad/alloc/32f9430c-5392-ebf2-609a-60b57ba3eae5/www1/secrets    26740536       816   26739720     0%    /opt/pot/jails/www1_4e4600ed_32f9430c-5392-ebf2-609a-60b57ba3eae5/m/secrets

Expected behavior
No leftover mounts

Additional context
My suspicion is that umounts fail when the jail stops (maybe due to some processes still using the mountpoint). Later the ZFS filesystem is purged. Normal manual umount of these mounts work ok.

The text was updated successfully, but these errors were encountered:

grembo · 2023-12-21T13:09:41Z

It seems like this depends on the order in which nomad-pot-driver issues certain commands:

Example of a command sequence that left mounts behind:

2023-12-21T11:38:25+00:00 10.20.20.231 pot[42497]: pot-destroy -p myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e myservice -F 
2023-12-21T11:38:25+00:00 10.20.20.231 pot[42476]: pot-set-status -p myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e -s stopped
2023-12-21T11:38:24+00:00 10.20.20.231 pot[40184]: pot-destroy -p myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e -F
2023-12-21T11:38:24+00:00 10.20.20.231 pot[40017]: pot-set-status -p myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e -s stopping
2023-12-21T11:38:18+00:00 10.20.20.231 pot[39032]: pot-set-status -p myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e -s stopping
2023-12-21T11:38:18+00:00 10.20.20.231 pot[38992]: pot-stop myservice_fdf1f644_ad0150ca-d40b-f752-b564-8fe4d86c657e myservice

Two things are of interest here:

See how stopping is set twice
The stopped status is set, after the first destroy command is launched
Destroy is also issued a second time
Extra parameters to certain commands (addressed in Fix container stop and destroy procedure nomad-pot-driver#49).

So it looks like, the pot is stopped and destroyed twice. The second stopping call is after 5s, which looks like a nomad timeout. So the solution for this might be inside nomad, but it also feels like there's a lack of locking involved, being able to call stop and destroy multiple times in parallel.

grembo added the bug label Sep 15, 2023

grembo mentioned this issue Dec 21, 2023

Fix container stop and destroy procedure bsdpot/nomad-pot-driver#49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Pot sometimes leaves mount-ins behind #272

[BUG] Pot sometimes leaves mount-ins behind #272

grembo commented Sep 15, 2023 •

edited

grembo commented Dec 21, 2023

[BUG] Pot sometimes leaves mount-ins behind #272

[BUG] Pot sometimes leaves mount-ins behind #272

Comments

grembo commented Sep 15, 2023 • edited

grembo commented Dec 21, 2023

grembo commented Sep 15, 2023 •

edited