Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second Idea: Addition of Singularity Template #16

Closed
wants to merge 5 commits into from
Closed

Second Idea: Addition of Singularity Template #16

wants to merge 5 commits into from

Conversation

vsoch
Copy link
Contributor

@vsoch vsoch commented Jul 24, 2018

This is a second effort to add Singularity support, to come after #9. Instead of having the permission change be default, we instead provide a second template, and give the user complete control over the username of the build (in the case that they want to generate a container for their userspace only) OR the value of the permissions to be different from the default. Specifically, this PR does the following:

  • adds a .gitignore for ipynb_checkpoints. I would assume that most users will do a git add * and then also include various files that wouldn't be desired, and this will help with that :)
  • addition of the Singularity template in the .circleci folder for Singularity containers. Instead of this being default, a user can just have the option to switch to a slightly modified configuration, if Singularity container use on a shared resource is desired. As we discussed in Update repo2docker to support Singularity #9, the write happens by way of the overlayfs, so the user would be able to interact (write) using the notebooks provided by the container, but would need to save locally or to a different spot on the resource (I've tested this and it works ok!) My concern about security is mitigated - the save is happening via the overlayfs.
  • README.md to describe variables for each configuration, the user should be given a clear table that shows what variables are available to customize the build (we thus far were doing this in comments in the configuration file.) You can read the rendered version here.
  • Addition of pushing latest I probably should have done this in a separate PR (bad dinosaur!) but I think it's a small enough (and useful) addition to both recipes that the "latest" image is also pushed. That said, if there is huge opposition I can remove it from here for another PR / more discussion. This will close [suggestion] tag of "latest" pushed with commit tag ? #14

|`DOCKER_TAG`|A custom tag for the deployed image.|no|the first 10 characters of the commit.|


## config.singularity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be config.singularity.yml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't putting the extension, but I can add it.


## Where do I define the variables?
For each set of variables, some you can define in the config.yml, in the `environment`
section for a workflow step, for example, here is setting the variable `TIMEZONE`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education: do we have to set a timezone or is it an example of how to set a variable? (and why LA?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to set a timezone (although I've seen it frequently so it must be useful?) and no reason in particular for LA, if you want to find a different one you like I'm happy to change it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we add a comment saying "this isn't strictly necessary"? Just to help people who are trying to take these examples and applying them to their setup. Otherwise you end up with cargo cult kind of things like "Yeah, you gotta run this in the LA timezone because otherwise it gets out of sync with the fizzbuzz-shizzle" :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, will add!


This configuration is equivalent to the default config.yml, with the additional change in permissions of the
repo2docker folder to allow for the user to write given export into a Singularity container. If you want your
container notebooks to be usable on a shared resource, you should choose this configuration. Note that the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a shared resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a supercomputer cluster with multiple users (HPC)

|---|---|---|---|
|`CONTAINER_NAME`|The name of the Docker Hub container to deploy|no|`<ORG>/<REPO>`|
|`DOCKER_TAG`|A custom tag for the deployed image.|no|the first 10 characters of the commit.|
|`NOTEBOOK_PERMISSION`|The permission to (recursively) give the notebook, to allow write.|no|777|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove NOTEBOOK_ from the permission and username variable? Neither really have anything to do with the fact that we use notebooks. As in we'd still have to set them if we were using RStudio or OpenRefine.

Maybe | PERMISSION | The permissions to (recursively) set on the repository contents copied into the container |?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However the NOTEBOOK is relevant - these variables specifically change just the folder for the notebook housing and permission for it. The NOTEBOOK also serves as a namespace, so there is less liklihood of conflict with some other user specific variable. If you want to think of a different prefix to replace this, I'm happy to change it. I think notebook is more appropriate than something like container, because it is truly just a username for the notebook, and permission for the notebook folder. That may be different outside of using repo2docker, but right now that's all this recipe does, and so the variable names are clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe REPO2DOCKER_ as prefix? There doesn't have to be a notebook in the repository and this would set the permissions on all things that are copied over right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I like that too.

Yes that's correct. We are changing the permission of the user "joyvan's" folder in $HOME:

docker exec repo2docker chmod -R ${REPO2DOCKER_PERMISSION} "/home/${REPO2DOCKER_USERNAME}"

and we have to do this for Singularity, otherwise the notebooks aren't usable for the general user.

|`CONTAINER_NAME`|The name of the Docker Hub container to deploy|no|`<ORG>/<REPO>`|
|`DOCKER_TAG`|A custom tag for the deployed image.|no|the first 10 characters of the commit.|
|`NOTEBOOK_PERMISSION`|The permission to (recursively) give the notebook, to allow write.|no|777|
|`NOTEBOOK_USERNAME`|The username (to derive the home directory) to build the notebook and virtual environment. (e.g., `/home/joyvan`|no|`joyvan`|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> "The username used inside the container"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Singularity world, this is just the notebook username (the user remains him/herself when in use).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with the notebook username? I was assuming that inside the container there is a user that owns the files copied over from the repository and that this variable sets that username. (and that for singularity the username that owns the files has to match the username of the person executing the container)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container is going to be sitting on a shared HPC resource, and more than one user could use it. The username is the name of the folder, and the name of the --username arg given for the container. If it's not general (or assumes use by any user) the container doesn't have much purpose for reproducibility (e.g., sharing with others).

@@ -168,6 +168,9 @@ jobs:
if [[ -n "$DOCKER_PASS" ]]; then
docker login -u $DOCKER_USER -p $DOCKER_PASS
docker push ${CONTAINER_NAME}:${DOCKER_TAG}
echo "Tagging latest image..."
docker tag ${CONTAINER_NAME}:${DOCKER_TAG} ${CONTAINER_NAME}:latest
docker push ${CONTAINER_NAME}:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purist in me is a bit hesitant to make a "latest" tag standard. It gives people a way to not explicitly define which tag/version they want. Which means it becomes a nightmare to try and work out what they actually used later on.

It is super convenient to have latest, but for your long term health it isn't so smart. A bit like eating only cake for breakfast is super cool :-/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not standard, but it's provided because people want and expect it. if I had a dollar for every time someone posted an issue because they claimed the endpoint didn't work (and had just misspelled the tag). I'm all for encouraging reproducibility, but making it really hard to use a tool I don't think is the right way to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

¯_(ツ)_/¯ you decide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

docker tag ${CONTAINER_NAME}:${DOCKER_TAG} ${CONTAINER_NAME}:latest
docker push ${CONTAINER_NAME}:latest
fi
workflows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot of shared steps between config.yaml and config.singularity.yml. Is there a way to avoid that or automatically check that they are the same (except for where they are meant to be different)? Wondering what we can do to stop the two (or later more) versions from drifting apart.

Would it be better to make one big config.yaml with several different workflows and tell users to pick one workflow?

Only questions, no solutions :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I've already developed this, but it will be done in a separate PR.


|Variable|Description|Required|Default|
|---|---|---|---|
|`CONTAINER_NAME`|The name of the Docker Hub container to deploy|no|`<ORG>/<REPO>`|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it "image_name" since it's the name of the image, not the container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

@vsoch vsoch closed this by deleting the head repository Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[suggestion] tag of "latest" pushed with commit tag ?
3 participants