Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share docker volume name in server state #384

Open
1kastner opened this issue Jul 2, 2020 · 9 comments
Open

Share docker volume name in server state #384

1kastner opened this issue Jul 2, 2020 · 9 comments

Comments

@1kastner
Copy link

1kastner commented Jul 2, 2020

Proposed change

Add the user's docker volume name to the server state. This would allow removing them in an automated fashion through the API by the jupyterhub-idle-culler service as described in this GitHub issue.

Alternative options

I fork this repository and maintain my personal copy that does exactly that. Or I do some monkey-patching similar to this JupyterHub config.

Who would use this feature?

Everyone who wants to automatically tidy up docker volumes of inactive users measured by the user's activity at the JupyterHub.

@welcome
Copy link

welcome bot commented Jul 2, 2020

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/14

@1kastner
Copy link
Author

1kastner commented Jul 7, 2020

From my first inspection it seems rather straight-forward: The volume name is stored in its dict - either as a string or as a dictionairy. It only needs to be stored into the state dictionary in the get_state method. Well, that I believed until I gave it a second glance.

Now the details start - first the intention of sharing this is to identify the docker volumes that only contain the user's data and therefore can be deleted after a certain amount of time (see top). Hence, I am only interested in the docker volumes that contain the username to avoid deleting docker volumes shared between several users. Furthermore, to avoid malicious attacks, the check should be quite strict to avoid loopholes, (e.g. a user is called foo and a shared docker volume is called foobar, a check of username in docker_volume_name would return true). Therefore, the check needs to be done by looking whether the docker volume name equals the docker volume name template including the (now replaced) variable username for a given format_volume_name callable or any other template variable. Somehow this sounds messy to me.

In case things are implemented as described above, are there any naming conventions to consider regarding the state dictionairy / JSON object key? What should it be?

Because the previous description of reverse-engineering sounds slightly dangerous to me, is there a chance of keeping track of docker volumes in another way? Maybe even just some kind of flat file representation such as JSON?

@1kastner
Copy link
Author

1kastner commented Jul 13, 2020

One suggestion I have is that when the docker volume template is filled, the answer could be more than just the prepared string. Instead, the answer could be a dictionairy that includes a flag for each template variable that has been used. If user-specific template variables have been used, that means the docker volume is user-specific as well. That would require to make adjustments to the volumenamingstrategy. Would that be acceptable for everybody?

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/16

@minrk
Copy link
Member

minrk commented Feb 26, 2021

I wonder if docker volume labels would be a better fit for this information? Rather than passing docker-specific information (e.g. volume names) through jupyterhub to a culler that also needs custom docker logic to understand the information, maybe it makes sense to query docker volumes by label directly and not rely on jupyterhub for passing the information along? To implement the deletion, you need an understanding of finding and deleting docker volumes anyway and how to identify which ones should be deleted. It seems like it would be simpler to do that directly via docker rather than pass that info all the way through jupyterhub.

For example (cli, not Python, but should be doable either way):

# create the volume with labels
docker volume create --label jupyterhub_user=foo --label prune=true
# find the volume(s) by label
docker volume ls --label jupyterhub_user=foo --label prune=true
# delete them, if any

What do you think?

@1kastner
Copy link
Author

Well, for the general approach the only issue here is that I need a way how to get the information whether the user was inactive. The logic which is currently kept in the idle-culler service is required here. I have no beautiful solution how that and your proposed commands can go hand in hand.

For my specific usecase, however, your approach is very useful. But instead of CLI, I would need the label as a custom setting in https://github.com/jupyterhub/dockerspawner/blob/master/dockerspawner/dockerspawner.py . But I guess that docker volume labels should be a suitable extension of the dockerspawner functionality?

@1kastner
Copy link
Author

1kastner commented Mar 2, 2021

Another lead to follow might be not to use the idle-culler service but instead the culling logic that is incorporated into the JupyterHub itself, which you also highlighted in this forum post. However, neither NotebookApp nor MappingKernelManager seems like an appropriate place here.

@minrk
Copy link
Member

minrk commented Apr 30, 2021

The volume template changes further suggests we should have perhaps been storing the volume(s) somehow in state, so that loading jupyterhub with an updated configuration could use the new scheme only for servers that didn't already have a volume mounted. It seems pretty hard, though, to reconcile:

  • volumes have changed because of a template change, and names from state should be used, and
  • volumes have changed because of a config change and names from state should not be used

It's an unrelated use case, but another indicator that we should have been persisting it in state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants