Share docker volume name in server state #384

1kastner · 2020-07-02T19:27:45Z

Proposed change

Add the user's docker volume name to the server state. This would allow removing them in an automated fashion through the API by the jupyterhub-idle-culler service as described in this GitHub issue.

Alternative options

I fork this repository and maintain my personal copy that does exactly that. Or I do some monkey-patching similar to this JupyterHub config.

Who would use this feature?

Everyone who wants to automatically tidy up docker volumes of inactive users measured by the user's activity at the JupyterHub.

The text was updated successfully, but these errors were encountered:

welcome · 2020-07-02T19:27:46Z

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.

You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

meeseeksmachine · 2020-07-07T07:14:05Z

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/14

1kastner · 2020-07-07T13:46:10Z

From my first inspection it seems rather straight-forward: The volume name is stored in its dict - either as a string or as a dictionairy. It only needs to be stored into the state dictionary in the get_state method. Well, that I believed until I gave it a second glance.

Now the details start - first the intention of sharing this is to identify the docker volumes that only contain the user's data and therefore can be deleted after a certain amount of time (see top). Hence, I am only interested in the docker volumes that contain the username to avoid deleting docker volumes shared between several users. Furthermore, to avoid malicious attacks, the check should be quite strict to avoid loopholes, (e.g. a user is called foo and a shared docker volume is called foobar, a check of username in docker_volume_name would return true). Therefore, the check needs to be done by looking whether the docker volume name equals the docker volume name template including the (now replaced) variable username for a given format_volume_name callable or any other template variable. Somehow this sounds messy to me.

In case things are implemented as described above, are there any naming conventions to consider regarding the state dictionairy / JSON object key? What should it be?

Because the previous description of reverse-engineering sounds slightly dangerous to me, is there a chance of keeping track of docker volumes in another way? Maybe even just some kind of flat file representation such as JSON?

1kastner · 2020-07-13T09:34:19Z

One suggestion I have is that when the docker volume template is filled, the answer could be more than just the prepared string. Instead, the answer could be a dictionairy that includes a flag for each template variable that has been used. If user-specific template variables have been used, that means the docker volume is user-specific as well. That would require to make adjustments to the volumenamingstrategy. Would that be acceptable for everybody?

meeseeksmachine · 2020-10-20T08:54:43Z

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/16

minrk · 2021-02-26T12:44:35Z

I wonder if docker volume labels would be a better fit for this information? Rather than passing docker-specific information (e.g. volume names) through jupyterhub to a culler that also needs custom docker logic to understand the information, maybe it makes sense to query docker volumes by label directly and not rely on jupyterhub for passing the information along? To implement the deletion, you need an understanding of finding and deleting docker volumes anyway and how to identify which ones should be deleted. It seems like it would be simpler to do that directly via docker rather than pass that info all the way through jupyterhub.

For example (cli, not Python, but should be doable either way):

# create the volume with labels
docker volume create --label jupyterhub_user=foo --label prune=true
# find the volume(s) by label
docker volume ls --label jupyterhub_user=foo --label prune=true
# delete them, if any

What do you think?

1kastner · 2021-02-26T20:42:04Z

Well, for the general approach the only issue here is that I need a way how to get the information whether the user was inactive. The logic which is currently kept in the idle-culler service is required here. I have no beautiful solution how that and your proposed commands can go hand in hand.

For my specific usecase, however, your approach is very useful. But instead of CLI, I would need the label as a custom setting in https://github.com/jupyterhub/dockerspawner/blob/master/dockerspawner/dockerspawner.py . But I guess that docker volume labels should be a suitable extension of the dockerspawner functionality?

1kastner · 2021-03-02T21:03:58Z

Another lead to follow might be not to use the idle-culler service but instead the culling logic that is incorporated into the JupyterHub itself, which you also highlighted in this forum post. However, neither NotebookApp nor MappingKernelManager seems like an appropriate place here.

minrk · 2021-04-30T07:34:36Z

The volume template changes further suggests we should have perhaps been storing the volume(s) somehow in state, so that loading jupyterhub with an updated configuration could use the new scheme only for servers that didn't already have a volume mounted. It seems pretty hard, though, to reconcile:

volumes have changed because of a template change, and names from state should be used, and
volumes have changed because of a config change and names from state should not be used

It's an unrelated use case, but another indicator that we should have been persisting it in state.

1kastner added the enhancement label Jul 2, 2020

manics mentioned this issue Jul 3, 2020

persist pvc name and/or uid in state for template changes jupyterhub/kubespawner#415

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share docker volume name in server state #384

Share docker volume name in server state #384

1kastner commented Jul 2, 2020 •

edited

welcome bot commented Jul 2, 2020

meeseeksmachine commented Jul 7, 2020

1kastner commented Jul 7, 2020

1kastner commented Jul 13, 2020 •

edited

meeseeksmachine commented Oct 20, 2020

minrk commented Feb 26, 2021

1kastner commented Feb 26, 2021

1kastner commented Mar 2, 2021

minrk commented Apr 30, 2021

Share docker volume name in server state #384

Share docker volume name in server state #384

Comments

1kastner commented Jul 2, 2020 • edited

Proposed change

Alternative options

Who would use this feature?

welcome bot commented Jul 2, 2020

meeseeksmachine commented Jul 7, 2020

1kastner commented Jul 7, 2020

1kastner commented Jul 13, 2020 • edited

meeseeksmachine commented Oct 20, 2020

minrk commented Feb 26, 2021

1kastner commented Feb 26, 2021

1kastner commented Mar 2, 2021

minrk commented Apr 30, 2021

1kastner commented Jul 2, 2020 •

edited

1kastner commented Jul 13, 2020 •

edited