Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: add a staleness check for cached container Images #2846

Open
tabedzki opened this issue May 14, 2024 · 5 comments
Open

Feature Request: add a staleness check for cached container Images #2846

tabedzki opened this issue May 14, 2024 · 5 comments
Labels
container Issues related to container (docker/singularity) versions of sorters enhancement New feature or request

Comments

@tabedzki
Copy link

Is your feature request related to a problem? Please describe.
When using run_sorter, users can combine docker/singularity_image=True with delete_container_files=False in their scripts. Assuming a users implements a "set it and forget it` approach, it is conceivable that the eventually the user will not be running on the latest version of the container for whichever package they are using.

Describe the solution you'd like
An ability to update (or auto-update) the container used, checking for if a container is old and updating if necessary. Maybe a flag like auto_update_container_image

Describe alternatives you've considered
I've considered working with IT to discuss how to manage the images and whether or not the users should be manually specifying the container images.

@alejoe91 alejoe91 added enhancement New feature or request container Issues related to container (docker/singularity) versions of sorters labels May 14, 2024
@JoeZiminski
Copy link
Contributor

JoeZiminski commented May 14, 2024

This is a nice idea, I get a little confused about how the docker images are generated. As an aside on this, it would be great to have a page that fully documents the general build-and-deploy pipeline for the docker images, I sometimes get a bit stuck understanding from the repo or documentation (please let me know if I've missed any existing docs).

My understanding is that when the images are built they have a version tag e.g. here. When SI pulls a docker image it will pull the latest version and install on your machine. If you don't do anything else, it will continue to use this pulled image forever and you will miss updates. It doesn't look like the image .sif is deleted by delete_container_files, so I think(?) this argument will have no effect on this process.

As far as I can tell on run_sorter there is no option to specify the tag for the image. It would be nice to expose this with some convenience options e.g. None will just use the current spikeinterface default behaviour. "always_latest" could make sure your downloaded image is the latest version. You could also specify specific versions to use.

In general I think the docker versioning is a hard problem to get right because it can be very hairy for users. For example you run all your analysis with SI v100.0. Then 6 months later you come back to do some additional analysis for paper review. You again install SI v100.0 but now the image downloaded is for a newer version of the KS image that gives you different results. This is very hard to track down. It would be nice to pin specific versions of SI to specific version of images, but this is difficult and also confusing in a different way. Alternatively (if not already) this could be made extremely clear in the logs e.g. a big banner that says the docker image version that's been used, it's release date, and how to request a different version.

@tabedzki
Copy link
Author

Thanks for highlighting the fact that the final container image is not deleted when using delete_container_files.

As far as I can tell on run_sorter there is no option to specify the tag for the image. It would be nice to expose this with some convenience options e.g. None will just use the current spikeinterface default behaviour. "always_latest" could make sure your downloaded image is the latest version. You could also specify specific versions to use.

It appears that one is able to specify the tag explicitly. See here. As for having an "always_latest" I think that would be useful.

I agree with the idea of having the docker image version outputted somewhere in the logs for future reference. I didn't see that implemented when looking through my logs but I might've missed it.

@JoeZiminski
Copy link
Contributor

JoeZiminski commented May 15, 2024

Cool thanks @tabedzki I did not know about that ability to request the exact version of the image, that is nice.

As a quick aside, I wonder if run_container_local() and run_sorter_container are intended for public use, if not these could be prefixed with a _.

There are a couple of ways this could be implemented. One is to keep everything as-is but add some new keywords for docker_image and singularity_image such as "always_latest". However these arguments are already quite overloaded (e.g. can be bool for download, str to an existing local sorter, str of the docker image with tag to download).

Alternatively, there could be a new argument image_version and the ability to select a specific container with docker_image or singularity_image removed. These would now only accept bool or str to an existing container. The image_version would download the requested image (if downloading the image), and accept keyword arguments like always_latest. If the path pointed to previously downloaded image but it did not match image_version (e.g. it was not the latest image) it would crash. This could use the 'digest' rather than the tag (because 'latest' tag would not work for this purpose).

Finally adding some clearler logs to indicate exactly what image version is used. I don't know if there is any appetite for pinning SI versions to docker image versions. I guess each release would by default have to point to a specific docker image release rather than 'latest'. Each time a new image is released the corresponding version of SI would need to be updated. However, this would have to be handled separately for very image, and would be error prone. I think logs instead are the way to go.

@alejoe91
Copy link
Member

We should definitely at least add the image hash to the spike sorting log/params ASAP! Great point!

For usage, I would tend to aboid additional complexities to the run_sorter function, which is already quite complicated.

IMO, if one uses docker_image=True, it means that the "latest" should be used. Alternatively, we can simplify this argument so that it only accepts str | None, so docker_image=True will become `docker_image="latest``.

The code should check that if a latest tag is already cached locally, it corresponds to the remote tag. If not, download the latest.

@tabedzki
Copy link
Author

@alejoe91, if we do update the options/documentation, I think we should update the singularity_image documentation to highlight that one can provide a path to the image via str since I believe that is something that Apptainer/Singularity provides. Based off reading the documentation, I didn't realize that this was possible; it was something that @JoeZiminski pointed out to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
container Issues related to container (docker/singularity) versions of sorters enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants