Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for named volumes #353

Open
PauloMigAlmeida opened this issue Oct 6, 2021 · 7 comments
Open

Support for named volumes #353

PauloMigAlmeida opened this issue Oct 6, 2021 · 7 comments
Labels
enhancement New feature or request

Comments

@PauloMigAlmeida
Copy link

Is your feature request related to a problem? Please describe.

I cannot count how many times I've (accidentally) deleted my 'persistent' paths before as I'm always tempted to have those mapped folders within my project directory structure for tidiness reasons 😬

That's clearly my mistake above anything else but this also made evident some of the hidden benefits of docker volumes. I was wondering if we could get something similar.

Describe the solution you'd like

The solutions I propose is to use the ~/.singularity/volumes as the default location to store named volumes.

This location could be overridden using a env var (let's say SINGULARITY_VOLUMES_PATH).

The creation process would be something like singularity volume create|remove <volume_name>

And last but not least, when the user specified a bind path, if the name matches with an existing named volumes, it would map it to the location where the named volume resides like singularity run --bind <volume_name>:/my/path

Describe alternatives you've considered

I initially considered adding this feature to singularity-compose but it was suggested to me that maybe having this feature in singularity could be a better fit for the problem.

singularityhub/singularity-compose#50

Additional context

I was wondering if there is any interest in having such feature or if this is a no-go type of feature for whatever reason. I'm keen to assist on the development of that if necessary.

@PauloMigAlmeida PauloMigAlmeida added the enhancement New feature or request label Oct 6, 2021
@dtrudg
Copy link
Member

dtrudg commented Oct 7, 2021

In my view, this one is a bit difficult. It's definitely something that would be nice to have, but the devil is in the details.

For example, in Docker, a named volume is auto-created if it does not exist when a container run attempts to use it. This is probably something people would expect here. We strive to ensure that SingularityCE works well by default in common HPC environments. In these environments a container is quite frequently run in parallel... either as completely separate (independent but concurrent) invocations, or a coordinated invocation like an MPI job or similar. These runs happen across various machines and there is no singularity daemon or central service that manages them. Filesystems like $HOME are generally network filesystems. There can be cache consistency and locking issues (or lack of those features).

If we add named volumes, and there are parallel runs, how should we handle creation of the volume if multiple singularity instances are all trying to use a named volume at once? How can we detect, and handle, any issues with cache consistency or locking on the underlying filesystem that may make this process dangerous or error prone?

These issues are certainly still present when you manually --bind a directory in, but they aren't 'hidden'. Supporting named volumes infers some special magic around volume creation and management, that should always work, while with a --bind it's pretty clear it is only mounting a host directory into the container.

What would be good here, if you'd like to pursue it, is to think a bit about this type of stuff and try to define very clearly and explicitly what should happen in normal workflows, and some pathological cases. E.g. what if I put volume create or volume remove in a batch script and submit it for parallel execution. How should singularity react?

With enough examples we can then start to see clearly what makes technical sense, and how this might be implemented.

A minor thought - we've recently added the --mount option that mirrors (partially) Dockers --mount syntax... so we'd probably want to use that for named volumes. --bind can't really be overloaded for named volumes as you can't distinguish between a volume name and a identical relative path easily.

@vsoch
Copy link
Contributor

vsoch commented Oct 7, 2021

+1 what @PauloMigAlmeida said - it would be very nice to implement for Singularity compose! I think it would be rather simple? Just have some place to keep volumes in the Singularity cache and then bind to them. The use case is that it's easy to accidentally delete volumes that are in the present working directory. TLDR: a named volume is just a managed filesystem bind (to still get the same features you would wit a normal bind).

@dtrudg
Copy link
Member

dtrudg commented Oct 7, 2021

If an explicit volume create being required in order to use a volume is an acceptable trade-off, and anonymous volumes are not implemented, then this is relatively straightforward. So long as the failure paths for volume create / volume remove being called concurrently (accidentally) are well defined and make sense to the user.

If we want to support operating in the same way as people are familiar with docker it is more difficult, as then the implicit creation, and e.g. cleanup of anonymous volumes etc. has no persistent management process to co-ordinate things. I would have anticipated, from a singularity-compose standpoint, as close to docker as possible would be beneficial?

Edit - I guess what I'm getting at here is that this is the type of feature in which we really need a complete set of specific use cases in order to define the level of complexity that will be necessary in the technical implementation.

@vsoch
Copy link
Contributor

vsoch commented Oct 7, 2021

@PauloMigAlmeida you have more experience with wanting this feature - do we need the complexity of what docker does?

@PauloMigAlmeida
Copy link
Author

PauloMigAlmeida commented Oct 7, 2021

@vsoch: @PauloMigAlmeida you have more experience with wanting this feature - do we need the complexity of what docker does?

No, I don't think we need to have the same level of complexity that docker has for their implementation of this feature.

@dtrudg: If an explicit volume create being required in order to use a volume is an acceptable trade-off, and anonymous volumes are not implemented, then this is relatively straightforward.

I agree with that. Explicit volume creation seems the way to go here to reduce to the minimum that 'hidden magic' from the user in which the devil can hide

@dtrudg: So long as the failure paths for volume create / volume remove being called concurrently (accidentally) are well defined and make sense to the user.

I think that if we approach that purely from a race condition point of view, we will go down a rabbit hole that won't progress much as, given the absence of a daemon to transactionally control when and how a volume can be deleted, things get 'too fun' 🥲

I suggest that we approach this feature as @vsoch succinctly described: a named volume is just a managed filesystem bind (to still get the same features you would with a normal bind).

You can expect precisely the same behaviour you would get if instead of running volume create <name> and volume delete <name> the user was running mkdir <name> and rm -rf <name>. So the benefits of using the volume mechanism would be translated to:

  1. volumes/folders will reside in a different place than the project which helps a lot when you have lousy people like me working on the HPC server 😓 (remember this could have its default location changed by either singularity.conf or a ENV var to ensure the right filesystem with the correct capabilities is in place.
  2. Most importantly, this would reduce significantly the command execution differences between environments. Today we have to remap binds whether we are using the CLI or singularity-compose every time we run in a different machine... which is not only a pain in the bum but very error prone.

EDIT:
One thing that still isn't clear to me is whether the best place to implement this is singularity or singularity-compose. I will defer this decision to you both as I can see both of them implementing it.

@PauloMigAlmeida
Copy link
Author

Hi @vsoch @dtrudg, just following up on this thread.

Have you guys thought about whether this functionality fits either singularity or singularity-compose (or none of them 😅 )?

@vsoch
Copy link
Contributor

vsoch commented Oct 28, 2021

My opinion is the same - that it should be supported in Singularity natively, and then extended to singularity-compose. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants