Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to docker #81

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

StevePotter
Copy link

Hello friends, thank you for building such a wonderful project. I noticed a few non-standard uses of Docker and this PR provides some suggested improvements. I don't recommend merging it until there has been discussion and testing by the team.

The current dockerfile does the work of preparing the container for developing the code. None of the commands in the dockerfile touch any of the FoundationPose code. Then, run_container.sh mounts a bunch of volumes, which in turn gives the container access to the code. Since some other things still need to happen, build_all.sh needs to be run.

Normally, a docker image is rather complete and all someone needs to run the code is pull the image. That's not the case here. These extra steps require more work to set up, and make it very difficult to simply deploy the FP image to a server.

Luckily, docker has some other approaches that are widely accepted. For development time, when you want to share files with the container, there is Docker Compose. Docker compose makes it easy to set up networks, mount volumes, and just about everything else that run_container.sh was doing. Plus since the volumes are available to every RUN command, everything in build_all.sh can be moved to the dockerfile.

So now instead of run_container.sh, you can simply use the native docker compose up command!

For use in servers, docker provides a COPY command that can place files from the project into the container. So, I added a second dockerfile for that, dockerfile.prod. If you build that image, it'll include everything from the git repo in the container and will run all the commands from build_all.sh. Then this image can quickly run anywhere, without requiring the git repo code. You could even create a Github Action that will automatically publish this image whenever an PR is merged.

I'd love to hear people's thoughts on this. I'm not a FoundationPose developer and I realize you have a workflow, but I've been using docker for 10 years, and I thought I could share some of my experience. If you switch to Docker Compose, things will become easier and experiences will be more consistent across developers. Having a 2nd dockerfile may seem weird, but actually a lot of people do it.

If you think this is too much to do all at once, I can keep the old dockerfile, run_container.sh, build_all.sh and present this new way as an alternative. If and once it's determined Compose is easier, then you can remove those files later.

Also, it could be possible to make a much lightweight container, which is safer and easier to deploy. There are quite a bit of build tools (g++, gcc, build-essential, cmake, etc) that could be removed from the final container if we took advantage of Docker's Multi-Stage Builds. This allows you to have multiple FROM statements in your dockerfile. You can use some of them to build code, then you simply copy the built cover over to the final container. I could certainly help with this if you're interested.

Thanks, I hope you like this. Have a great day!

@wenbowen123
Copy link
Collaborator

Hi @StevePotter thanks for your great suggestion, this seems very useful! I still need to find a time to test this myself as I'm swamped with other projects now. To be backward compatible, would you mind creating a separate docker/ folder (e.g. docker_compose) and put the new stuff there? Right now I'd prefer to keep the old one as it is, but later if many folks have verified this, I'd be happy to replace that.

@StevePotter
Copy link
Author

Okay great, I will do that

…ing Docker that could replace the existing one
@StevePotter
Copy link
Author

I got a little sidetracked, but will devote some time to this next week. I also plan the following improvements:

  • Use requirements.txt or conda environment.yml to declare packages
  • Include weights in the docker image
  • Use multi-stage build so the runtime image uses a base image like nvidia/cudagl:11.3.0-runtime-ubuntu20.04. The current image is about 20gb and when I tried it out, it cut it down to about 10gb
  • Supply an argument to toggle cuda version. I tested on 11.8 and 12.1, and those work. Would be nice for users to have a choice

@EquilibriaW
Copy link

Hi! I was trying this method, and I ran into an issue where the line cd /foundationpose/mycpp/ failed because it couldn't find the directory. I was wondering if this could because I placed the docker-compose file in a different location than intended; I currently place it as a subdirectory inside of main.

@StevePotter
Copy link
Author

StevePotter commented Jun 6, 2024

@EquilibriaW you are right. somehow I messed it up. I'll fix

@mrtnbm
Copy link

mrtnbm commented Jun 11, 2024

Hey @StevePotter,
thank you for the amazing work of condensing everything into one docker-compose setup and thus saving us all lots of time!

I've tried to run your Dockerfile.prod inside WSL2 with Ubuntu 20.04 and using the 4090 fix, e.g. using FROM nvidia/cuda:12.1.0-devel-ubuntu20.04 instead of FROM nvidia/cuda:11.8.0-devel-ubuntu20.04 and also changing to C++17 inside /bundlesdf/mycuda/setup.py according to issue #27. Everything is working out fine, til this line (the same error did also appear when using the default approach, so it is very likely a configuration error on my side (Cuda does not get detected correctly), rather than an error in your docker compose file):

=> ERROR [foundationpose 15/15] RUN cd /foundationpose/bundlesdf/mycuda &&     rm -rf build *egg* &&      5.1s
------
 > [foundationpose 15/15] RUN cd /foundationpose/bundlesdf/mycuda &&     rm -rf build *egg* &&     pip install -e .:
0.866 Obtaining file:///foundationpose/bundlesdf/mycuda
0.867   Preparing metadata (setup.py): started
2.359   Preparing metadata (setup.py): finished with status 'done'
3.434 Installing collected packages: common
3.435   Running setup.py develop for common
4.969     error: subprocess-exited-with-error
4.969
4.969     × python setup.py develop did not run successfully.
4.969     │ exit code: 1
4.969     ╰─> [94 lines of output]
4.969         running develop
4.969         running egg_info
4.969         creating common.egg-info
4.969         writing common.egg-info/PKG-INFO
4.969         writing dependency_links to common.egg-info/dependency_links.txt
4.969         writing top-level names to common.egg-info/top_level.txt
4.969         writing manifest file 'common.egg-info/SOURCES.txt'
4.969         reading manifest file 'common.egg-info/SOURCES.txt'
4.969         writing manifest file 'common.egg-info/SOURCES.txt'
4.969         running build_ext
4.969         building 'common' extension
4.969         creating /foundationpose/bundlesdf/mycuda/build
4.969         creating /foundationpose/bundlesdf/mycuda/build/temp.linux-x86_64-cpython-38
4.969         No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
4.969         /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py:266: UserWarning: Unknown distribution option: 'extra_cflags'
4.969           warnings.warn(msg)
4.969         /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py:266: UserWarning: Unknown distribution option: 'extra_cuda_cflags'
4.969           warnings.warn(msg)
4.969         /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
4.969         !!
4.969
4.969                 ********************************************************************************
4.969                 Please avoid running ``setup.py`` and ``easy_install``.
4.969                 Instead, use pypa/build, pypa/installer or other
4.969                 standards-based tools.
4.969
4.969                 See https://github.com/pypa/setuptools/issues/917 for details.
4.969                 ********************************************************************************
4.969
4.969         !!
4.969           easy_install.initialize_options(self)
4.969         /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
4.969         !!
4.969
4.969                 ********************************************************************************
4.969                 Please avoid running ``setup.py`` directly.
4.969                 Instead, use pypa/build, pypa/installer or other
4.969                 standards-based tools.
4.969
4.969                 See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
4.969                 ********************************************************************************
4.969
4.969         !!
4.969           self.initialize_options()
4.969         /opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 12.1
4.969           warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
4.969         Traceback (most recent call last):
4.969           File "<string>", line 2, in <module>
4.969           File "<pip-setuptools-caller>", line 34, in <module>
4.969           File "/foundationpose/bundlesdf/mycuda/setup.py", line 21, in <module>
4.969             setup(
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/__init__.py", line 104, in setup
4.969             return distutils.core.setup(**attrs)
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 184, in setup
4.969             return run_commands(dist)
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
4.969             dist.run_commands()
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
4.969             self.run_command(cmd)
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/dist.py", line 967, in run_command
4.969             super().run_command(command)
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
4.969             cmd_obj.run()
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
4.969             self.install_for_development()
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py", line 111, in install_for_development
4.969             self.run_command('build_ext')
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
4.969             self.distribution.run_command(command)
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/dist.py", line 967, in run_command
4.969             super().run_command(command)
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
4.969             cmd_obj.run()
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 91, in run
4.969             _build_ext.run(self)
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
4.969             self.build_extensions()
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 873, in build_extensions
4.969             build_ext.build_extensions(self)
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
4.969             self._build_extensions_serial()
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
4.969             self.build_extension(ext)
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 252, in build_extension
4.969             _build_ext.build_extension(self, ext)
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 560, in build_extension
4.969             objects = self.compiler.compile(
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 677, in unix_wrap_ninja_compile
4.969             cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 576, in unix_cuda_flags
4.969             cflags + _get_cuda_arch_flags(cflags))
4.969           File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1980, in _get_cuda_arch_flags
4.969             arch_list[-1] += '+PTX'
4.969         IndexError: list index out of range
4.969         [end of output]
4.969
4.969     note: This error originates from a subprocess, and is likely not a problem with pip.
4.974 error: subprocess-exited-with-error
4.974
4.974 × python setup.py develop did not run successfully.
4.974 │ exit code: 1
4.974 ╰─> [94 lines of output]
4.974     running develop
4.974     running egg_info
4.974     creating common.egg-info
4.974     writing common.egg-info/PKG-INFO
4.974     writing dependency_links to common.egg-info/dependency_links.txt
4.974     writing top-level names to common.egg-info/top_level.txt
4.974     writing manifest file 'common.egg-info/SOURCES.txt'
4.974     reading manifest file 'common.egg-info/SOURCES.txt'
4.974     writing manifest file 'common.egg-info/SOURCES.txt'
4.974     running build_ext
4.974     building 'common' extension
4.974     creating /foundationpose/bundlesdf/mycuda/build
4.974     creating /foundationpose/bundlesdf/mycuda/build/temp.linux-x86_64-cpython-38
4.974     No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
4.974     /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py:266: UserWarning: Unknown distribution option: 'extra_cflags'
4.974       warnings.warn(msg)
4.974     /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py:266: UserWarning: Unknown distribution option: 'extra_cuda_cflags'
4.974       warnings.warn(msg)
4.974     /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
4.974     !!
4.974
4.974             ********************************************************************************
4.974             Please avoid running ``setup.py`` and ``easy_install``.
4.974             Instead, use pypa/build, pypa/installer or other
4.974             standards-based tools.
4.974
4.974             See https://github.com/pypa/setuptools/issues/917 for details.
4.974             ********************************************************************************
4.974
4.974     !!
4.974       easy_install.initialize_options(self)
4.974     /opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
4.974     !!
4.974
4.974             ********************************************************************************
4.974             Please avoid running ``setup.py`` directly.
4.974             Instead, use pypa/build, pypa/installer or other
4.974             standards-based tools.
4.974
4.974             See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
4.974             ********************************************************************************
4.974
4.974     !!
4.974       self.initialize_options()
4.974     /opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 12.1
4.974       warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
4.974     Traceback (most recent call last):
4.974       File "<string>", line 2, in <module>
4.974       File "<pip-setuptools-caller>", line 34, in <module>
4.974       File "/foundationpose/bundlesdf/mycuda/setup.py", line 21, in <module>
4.974         setup(
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/__init__.py", line 104, in setup
4.974         return distutils.core.setup(**attrs)
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 184, in setup
4.974         return run_commands(dist)
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
4.974         dist.run_commands()
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
4.974         self.run_command(cmd)
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/dist.py", line 967, in run_command
4.974         super().run_command(command)
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
4.974         cmd_obj.run()
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
4.974         self.install_for_development()
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/develop.py", line 111, in install_for_development
4.974         self.run_command('build_ext')
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
4.974         self.distribution.run_command(command)
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/dist.py", line 967, in run_command
4.974         super().run_command(command)
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
4.974         cmd_obj.run()
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 91, in run
4.974         _build_ext.run(self)
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
4.974         self.build_extensions()
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 873, in build_extensions
4.974         build_ext.build_extensions(self)
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
4.974         self._build_extensions_serial()
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
4.974         self.build_extension(ext)
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 252, in build_extension
4.974         _build_ext.build_extension(self, ext)
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 560, in build_extension
4.974         objects = self.compiler.compile(
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 677, in unix_wrap_ninja_compile
4.974         cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 576, in unix_cuda_flags
4.974         cflags + _get_cuda_arch_flags(cflags))
4.974       File "/opt/conda/envs/my/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1980, in _get_cuda_arch_flags
4.974         arch_list[-1] += '+PTX'
4.974     IndexError: list index out of range
4.974     [end of output]
4.974
4.974 note: This error originates from a subprocess, and is likely not a problem with pip.
------
failed to solve: process "/bin/bash --login -c cd /foundationpose/bundlesdf/mycuda &&     rm -rf build *egg* &&     pip install -e ." did not complete successfully: exit code: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants