Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166

leothomas · 2024-02-26T20:22:36Z

In order to make it easier to run the model/notebooks without having to manage installing dependencies across various machines/environments, I've added a micromamba based Dockerfile which will create the conda environment with the specified libraries.

I've also added a docker compose file, in order to specify the build-time and run-time arguments for exposing the jupyter lab port and mounting the current directly as a volume, into the docker container. This will allow users to modify any of the model/notebook code locally, without having to re-build the image.

By default the docker image starts with running jupyter lab but this can be overridden both in the docker-compose or even in the command line with any other python or bash command.

The platform=linux/amd64 build and run-time args enable the image to be built on Mac M1 while maintaining compatibility with Linux.

The container can be run with:
docker-compose up or docker-compose run claymodel <command> where command is a command which override the jupyter lab startup

The container can also be built directly (bypassing the need for docker-compose) with:

docker build . -t clay --platform linux/amd64

and then run with:

docker run --rm -it -v $(pwd):/model -p 8888:8888 -e ENV_NAME=claymodel --platform linux/amd64 clay:latest

Need to increase check-added-large-files limit from 512kb to 768kb because conda-lock.yml is now >512kb!

Test on the M1 macOS runners, see https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source.

… a docker container

weiji14

Thanks @leothomas! Just some small comments for now. Do you think we should also add a .dockerignore file to keep the Docker image a pure virtual environment? Also, what are your thoughts on setting up some CI to push pre-built containers to a docker registry (can be done in a separate PR)?

weiji14 · 2024-02-27T20:33:10Z

environment.yml

+  - pytorch~=2.1.0
+  - pyarrow~=15.0.0


Could you resolve the merge conflict here with the main branch? Also see if removing the pyarrow pin works. I managed to get conda to solve for osx-arm64 recently on #164 after https://github.com/conda-forge/torchvision-feedstock/pull/89/files was merged.

Agreed. You should be able to simply ditch your changes to environment.yml. If not, then keep in mind that whenever you modify environment.yml, you must regenerate conda-lock.yml, so you should never be committing changes to only environment.yml.

Did you mean conflicts with ci/osx-arm64, rather than conflicts with main? I didn't notice any conflicts with main.

I merged the changes from ci/osx-arm64 and, while I'm able to install from conda-lock.yml (as @chuckwondo suggested) I'm not able to install from environment.yml due to cuda being unavailable in the docker container:

> [claymodel 4/4] RUN micromamba create -y -n claymodel --file environment.yml && micromamba clean --all --yes: 135.1 error libmamba Could not solve for environment specs 135.1 The following package could not be installed 135.1 └─ pytorch ~=2.1.0 *cuda12* is not installable because it requires 135.1 └─ __cuda, which is missing on the system. 135.3 critical libmamba Could not solve for environment specs ------ failed to solve: process "/usr/local/bin/_dockerfile_shell.sh micromamba create -y -n claymodel --file environment.yml && micromamba clean --all --yes" did not complete successfully: exit code: 1

I'm thinking that the conda-lock.yml and environment.yml files aren't synchronized?

Let me know what the best course of action is. I can look into making a composite docker image, based off of both micromamba and nvidia/cuda so that we can install pytorch with cuda support in the docker container

Did you mean conflicts with ci/osx-arm64, rather than conflicts with main? I didn't notice any conflicts with main.

I meant with main actually. The changes in the ci/osx-arm64 branch aren't much.

I merged the changes from ci/osx-arm64 and, while I'm able to install from conda-lock.yml (as @chuckwondo suggested) I'm not able to install from environment.yml due to cuda being unavailable in the docker container:

Yet, best to go with conda-lock.yml as Chuck suggested. If you want to install from environment.yml on a device without CUDA GPUs, set CONDA_OVERRIDE_CUDA=12.0 following https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html#overriding-detected-packages.

Hm that's odd - there were no merge conflict with main in the case. I'm might have been missing something. Do the environment.yml and conda-lock.yml files look correct in their current state?

Dockerfile

docker-compose.yml

chuckwondo

In addition to the individual comments/suggestions, please also add a section to README.md about how to run the Docker container (docker compose up) as an alternative to installing things locally, and how to access JupyterLab once the container is started. In particular, mention the URLs that appears in the logging messages:

claymodel-1  |     Or copy and paste one of these URLs:
claymodel-1  |         http://ffd23ea64b9b:8888/lab?token=abebb7b9476b8fff1fb8b543cc552e8c9641b5b38547c3cb
claymodel-1  |         http://127.0.0.1:8888/lab?token=abebb7b9476b8fff1fb8b543cc552e8c9641b5b38547c3cb

Dockerfile

environment.yml

Dockerfile

chuckwondo · 2024-02-28T14:25:27Z

Thanks @leothomas! Just some small comments for now. Do you think we should also add a .dockerignore file to keep the Docker image a pure virtual environment? Also, what are your thoughts on setting up some CI to push pre-built containers to a docker registry (can be done in a separate PR)?

Absolutely add a .dockerignore file to this PR. You can probably start with much (or all) of what's in .gitignore.

However, keep in mind that there are cases where you must be more explicit in .dockerignore.

For example, you cannot use only __pycache__/ in .dockerignore because that will ignore only a top level __pycache__/ directory. To ignore such a directory at all levels, you must use **/__pycache__/ in .dockerignore. Another good candidate for this is to add **/.ipynb_checkpoints/ since there are notebooks in the docs directory.

chuckwondo · 2024-02-28T14:33:35Z

Thanks @leothomas! Just some small comments for now. Do you think we should also add a .dockerignore file to keep the Docker image a pure virtual environment? Also, what are your thoughts on setting up some CI to push pre-built containers to a docker registry (can be done in a separate PR)?

Absolutely add a .dockerignore file to this PR. You can probably start with much (or all) of what's in .gitignore.

However, keep in mind that there are cases where you must be more explicit in .dockerignore.

For example, you cannot use only __pycache__/ in .dockerignore because that will ignore only a top level __pycache__/ directory. To ignore such a directory at all levels, you must use **/__pycache__/ in .dockerignore. Another good candidate for this is to add **/.ipynb_checkpoints/ since there are notebooks in the docs directory.

Alternatively, as I mentioned to you ages ago, I tend to "invert" my use of .dockerignore to make it act more like an "allow" list rather than a "deny" list, which I find to be safer and clearer. For example, consider making a .dockerignore that ignores everything and then does not ignore the things you want to "allow":

*
!environment.yml
!conda-lock.yml
!**/*.py
!**/*.ipynb
!**/*.sh

for more information, see https://pre-commit.ci

leothomas · 2024-02-28T20:16:14Z

Awesome! Thank you both! I've addressed some of the request changes:

Regenerating the conda-lock.yml
Installing the micromamba environment from conda-lock.yml rather than environment.yml
Removing the pyarrow dependency
Adding a .dockerignore which ignores everything by default and only allows the files specifically needed
Updating the README with relevant info on the docker image/container and how to run it
Updating the jupyter lab flags to avoid the unnecessary usage of --allow-root and address some warning/debugging logs

Do y'all think it would be valuable to try to see if I can get this to run with a cuda docker image to enable the pytorch-cuda installation?

README.md

docs/partial-inputs-flood-tutorial.ipynb

conda-lock.yml

.github/workflows/test.yml

weiji14

Some suggested changes to reduce diff from merge conflict handling.

docs/partial-inputs-flood-tutorial.ipynb

docs/partial-inputs.ipynb

environment.yml

src/callbacks_wandb.py

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

…dation/model into feature/dockerized-environment

weiji14 · 2024-03-17T22:00:52Z

Dockerfile

+
+COPY --chown=$MAMBA_USER:$MAMBA_USER . .
+
+RUN micromamba create -y -n claymodel --file conda-lock.yml && \


Hmm, I'm getting this error when running docker build . -t clay --platform linux/amd64 locally:

0.384 Transaction starting 79.51 critical libmamba Failed to create dir 'info' 79.51 error libmamba Error opening for reading "/opt/conda/pkgs/cudnn-8.9.7.29-h092f7fd_3/info/index.json": No such file or directory 79.51 error libmamba Error when extracting package: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal 79.51 cudnn-8.9.7.29-h092f7fd_3.conda extraction failed 79.73 critical libmamba Found incorrect download: cudnn. Aborting

Not sure if it's because something changed when I re-locked the conda-lock.yml file manually when doing the merge from main at 5be9fa2#diff-63113c19c5d310b8e350b302f279e2297ba6faa9ac9c99c1c82e83e508447865

Oh jeez - I'm getting a segmentation fault:

2.612 Transaction starting 1441.7 qemu: uncaught target signal 11 (Segmentation fault) - core dumped 1441.7 bash: line 1: 10 Segmentation fault micromamba create -y -n claymodel --file conda-lock.yml ------ Dockerfile:11 -------------------- 10 | 11 | >>> RUN micromamba create -y -n claymodel --file conda-lock.yml && \ 12 | >>> micromamba clean --all --yes 13 | -------------------- ERROR: failed to solve: process "/usr/local/bin/_dockerfile_shell.sh micromamba create -y -n claymodel --file conda-lock.yml && micromamba clean --all --yes" did not complete successfully: exit code: 139

Wondering if it's either related to a lack of available memory or installing CUDA in docker.

I recall that when we installed the model libraries locally we had to install pytorch without CUDA - could the CUDA installation have made its way into the conda-lock file and causing issues?

weiji14 and others added 5 commits February 26, 2024 14:35

📌 Add osx-arm64 platform to conda-lock.yml file

05e0364

Need to increase check-added-large-files limit from 512kb to 768kb because conda-lock.yml is now >512kb!

👷 Add macos-14 to GitHub Actions CI test matrix

1cbf311

Test on the M1 macOS runners, see https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source.

Add dockerfile and docker-compose file for running a conda env within…

d860ac4

… a docker container

Remove cuda from pytorch installtion

46b2bea

Remove unused code from Dockerfile

d9771d8

weiji14 reviewed Feb 27, 2024

View reviewed changes

weiji14 added the maintenance Boring but important stuff for the core devs label Feb 27, 2024

chuckwondo requested changes Feb 28, 2024

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

environment.yml Outdated Show resolved Hide resolved

Dockerfile Outdated Show resolved Hide resolved

Dockerfile Outdated Show resolved Hide resolved

leothomas and others added 4 commits February 28, 2024 14:55

Update conda environment file and README

23c8db9

[pre-commit.ci] auto fixes from pre-commit.com hooks

bbe386a

for more information, see https://pre-commit.ci

Regenerate conda-lock.yml without pytorch-cuda

7bb26dd

Regenerate conda-lock.yml without pytorch-cuda

c4d1c74

chuckwondo requested changes Feb 29, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

docs/partial-inputs-flood-tutorial.ipynb Outdated Show resolved Hide resolved

conda-lock.yml Outdated Show resolved Hide resolved

chuckwondo reviewed Feb 29, 2024

View reviewed changes

.github/workflows/test.yml Outdated Show resolved Hide resolved

weiji14 mentioned this pull request Mar 5, 2024

EPIC: Support running on macOS #161

Closed

7 tasks

leothomas added 3 commits March 11, 2024 15:11

Merge branch 'main' into feature/dockerized-environment

cc9f404

Regenerate conda-lock.yml

e853f43

Update readme

8333835

weiji14 reviewed Mar 12, 2024

View reviewed changes

leothomas and others added 8 commits March 14, 2024 17:31

Update docs/partial-inputs-flood-tutorial.ipynb

8ddafd6

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

Update docs/partial-inputs-flood-tutorial.ipynb

63b8125

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

Update .github/workflows/test.yml

ed21161

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

Update docs/partial-inputs.ipynb

774439c

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

Update docs/partial-inputs.ipynb

96a2d7f

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

Update environment.yml

d752c24

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>

re-add removed line

fec4f66

Merge branch 'feature/dockerized-environment' of github.com:Clay-foun…

3900ac4

…dation/model into feature/dockerized-environment

weiji14 mentioned this pull request Mar 17, 2024

Add osx-arm64 platform to conda-lock.yml file and GitHub Actions CI #164

Merged

weiji14 added 2 commits March 18, 2024 10:21

Merge branch 'main' into feature/dockerized-environment

5be9fa2

One extra line

867e3bd

weiji14 reviewed Mar 17, 2024

View reviewed changes

weiji14 mentioned this pull request Mar 18, 2024

Binder launch is broken because install requires NVIDIA GPUs #181

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166

Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166

leothomas commented Feb 26, 2024

weiji14 left a comment

weiji14 Feb 27, 2024

chuckwondo Feb 28, 2024

leothomas Feb 28, 2024

weiji14 Feb 28, 2024

leothomas Feb 28, 2024

chuckwondo left a comment

chuckwondo commented Feb 28, 2024

chuckwondo commented Feb 28, 2024

leothomas commented Feb 28, 2024

weiji14 left a comment

weiji14 Mar 17, 2024

leothomas Mar 22, 2024


		COPY --chown=$MAMBA_USER:$MAMBA_USER . .

		RUN micromamba create -y -n claymodel --file conda-lock.yml && \

Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166

Are you sure you want to change the base?

Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166

Conversation

leothomas commented Feb 26, 2024

weiji14 left a comment

Choose a reason for hiding this comment

weiji14 Feb 27, 2024

Choose a reason for hiding this comment

chuckwondo Feb 28, 2024

Choose a reason for hiding this comment

leothomas Feb 28, 2024

Choose a reason for hiding this comment

weiji14 Feb 28, 2024

Choose a reason for hiding this comment

leothomas Feb 28, 2024

Choose a reason for hiding this comment

chuckwondo left a comment

Choose a reason for hiding this comment

chuckwondo commented Feb 28, 2024

chuckwondo commented Feb 28, 2024

leothomas commented Feb 28, 2024

weiji14 left a comment

Choose a reason for hiding this comment

weiji14 Mar 17, 2024

Choose a reason for hiding this comment

leothomas Mar 22, 2024

Choose a reason for hiding this comment