Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Dockerfiles to use common base layer #92

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

benbarsdell
Copy link
Collaborator

  • Dockerfile.base contains all the Bifrost dependencies.
  • Dockerfile inherits Dockerfile.base and contains the Bifrost build.
  • The old Dockerfile.cpu and Dockerfile.gpu are removed.
  • The FROM line and build args in the Dockerfiles are parameterised
    using ARGs, which are inserted by the Makefile (a WAR is currently
    needed for the FROM line because it doesn't support ARG, but this was
    recently fixed in upstream Docker so we can eventually do it
    properly).
  • The Makefile defines 4 targets: docker-base (GPU container with
    dependencies only), docker (GPU container with Bifrost installed),
    docker-base-cpu (CPU container with dependencies only), and docker-cpu
    (CPU container with Bifrost installed).
  • We can now also build containers with different OS images by
    specifying CUDA_IMAGE_NAME or CPU_IMAGE_NAME. E.g.,
    "make docker
    CUDA_IMAGE_NAME=nvidia/cuda:7.5
    IMAGE_NAME=ledatelescope/bifrost-cuda-7.5", which allows us to easily
    test compatibility with older versions of CUDA.

- Dockerfile.base contains all the Bifrost dependencies.
- Dockerfile inherits Dockerfile.base and contains the Bifrost build.
- The old Dockerfile.cpu and Dockerfile.gpu are removed.
- The FROM line and build args in the Dockerfiles are parameterised
  using ARGs, which are inserted by the Makefile (a WAR is currently
  needed for the FROM line because it doesn't support ARG, but this was
  recently fixed in upstream Docker so we can eventually do it
  properly).
- The Makefile defines 4 targets: docker-base (GPU container with
  dependencies only), docker (GPU container with Bifrost installed),
  docker-base-cpu (CPU container with dependencies only), and docker-cpu
  (CPU container with Bifrost installed).
- We can now also build containers with different OS images by
  specifying CUDA_IMAGE_NAME or CPU_IMAGE_NAME. E.g.,
  "make docker
   CUDA_IMAGE_NAME=nvidia/cuda:7.5
   IMAGE_NAME=ledatelescope/bifrost-cuda-7.5", which allows us to easily
   test compatibility with older versions of CUDA.
Copy link
Collaborator

@MilesCranmer MilesCranmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that my review has a big red "X" on the top of it, I wasn't sure what the difference between the options were for suggesting changes. This PR is great.

Some general comments:

(I can push some of these changes in the morning (AEST morning))

  • README needs an update to the new build instructions to make
  • vim, nano, doxygen could be removed as dependencies (if we are aiming for a minimal container with working bifrost, as the user can always add more)
  • Having never seen a project with a Dockerfile which requires pre-processing by Make (though I do really like the idea), I think we could have an additional "standard" dockerfile for people who don't like to read every word of a README, the file having an explicit FROM at the top so you can build it with a vanilla docker build ....
    • Since Dockerfile is a filename required by the docker hub (weird, I know, but even the official library images are all just Dockerfile in different directories), the Dockerfile's actually can't have pre-processing if we want to host images on docker hub
    • We can still have them as is, but with different names (and maybe in a different directory because there are so many front-page files now... I'll move out the travis docs builder too), then with explicit Dockerfiles on the front
    • The Dockerfile should pull from a ledatelescope/bifrost:deps image from docker hub once we get it set up (to avoid the rebuilding of unchanged dependencies)
      • I can set one up off my fork until I get approval from @ledatelescope to set up a docker hub under ledatelescope.
  • We may want to rename the default local images to be of the form bifrost... instead of ledatelescope/bifrost, as that is where the docker hub images will live
  • I get errors with the version.py file during make docker:
Traceback (most recent call last):
  File "setup.py", line 34, in <module>
    with open(bifrost_version_file, 'r') as version_file:
IOError: [Errno 2] No such file or directory: 'bifrost/version.py'
make[1]: *** [clean] Error 1

(it still builds and runs though!)

  • Everything builds and works (aside from a couple broken unit tests in the GPU container, I'll post at bottom)
  • The git clone of pyclibrary should be replaced by the instructions I put on the README (pip install git+https...), as it's faster and can be combined with the above pip for one less layer. Thanks @caseyjlaw for showing me this magic.

Other

(Unrelated) Here's a nice hack I found (arguments to docker run)

-v /var/run/docker.sock:/var/run/docker.sock -v /usr/bin/docker:/usr/bin/docker

This lets you control the docker daemon from within a container, so you never have to leave.

Here are the errors I saw from the GPU unit tests. The last one is weird. I think it should be np.testing.assert_almost_equal to kill tiny rounding errors?

linalg.cu:148 Condition failed: "Supported dtype for array a"
linalg.cu:148 error 22: BF_STATUS_UNSUPPORTED_DTYPE
linalg.cu:243 error 22: BF_STATUS_UNSUPPORTED_DTYPE
E..................F................s.........................
======================================================================
ERROR: test_matmul_aa_ci8 (test_linalg.TestLinAlg)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspace/bifrost/test/test_linalg.py", line 75, in test_matmul_aa_ci8
    self.run_test_matmul_aa_ci8_shape((11,23))
  File "/workspace/bifrost/test/test_linalg.py", line 54, in run_test_matmul_aa_ci8_shape
    self.linalg.matmul(1, a, None, 0, c)
  File "build/bdist.linux-x86_64/egg/bifrost/linalg.py", line 59, in matmul
    beta, c_array))
  File "build/bdist.linux-x86_64/egg/bifrost/libbifrost.py", line 154, in _check
    raise RuntimeError(status_str)
RuntimeError: BF_STATUS_UNSUPPORTED_DTYPE

======================================================================
FAIL: test_repr (test_ndarray.NDArrayTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspace/bifrost/test/test_ndarray.py", line 62, in test_repr
    self.assertEqual(repr_f, repr_k)
AssertionError: '([[ 0.,  1.],\n         [ 2.,  3.],\n         [ 4.,  5.]], dtype=float32)' != '([[ 0.,  1.],\n       [ 2.,  3.],\n       [ 4.,  5.]], dtype=float32)'

----------------------------------------------------------------------
Ran 160 tests in 189.538s

FAILED (failures=1, errors=1, skipped=3)

docker:
docker build --pull -t $(IMAGE_NAME):$(LIBBIFROST_MAJOR).$(LIBBIFROST_MINOR) -f Dockerfile.gpu -t $(IMAGE_NAME) .
docker-base:
echo "FROM $(CUDA_IMAGE_NAME)" > _Dockerfile.tmp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could all be .dockerfile.tmp, instead of _dockerfile.tmp, the latter of which seems like a python-specific convention for private/hidden objects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

COPY . .
RUN make clean && \
make -j16 $make_args && \
make doc && \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve heard the best docker practice is to make containers
as minimal as possible (?), so I don't think the full doxygen reference should go in. It would be hard to visualize it efficiently without setting up x11 as well, and we have a website to hold it all anyways.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this make clean is probably causing the version.py error. Does that need to be in there? Is it there just in case the user already built bifrost outside the container?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the make doc sounds like the right idea.

Indeed the make clean is important to avoid strange errors when the user already built the library outside. I'll look into what's going on with version.py.

We could probably also do make clean again after make install.

@@ -0,0 +1,19 @@
# Note: FROM command is inserted by Makefile
# TODO: As of Docker-CE 17.05, we could use this better approach instead:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also build from streams (moby/moby#31236) which is nice, so we won't have to create a temporary file within the build context.


# Build the library
WORKDIR /bifrost
COPY . .
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should create a .dockerignore file with at least .git included.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it can be useful to have the git history inside the container, but perhaps I should be doing that with your approach of using the base container and mapping the source directory instead. In that case, this sounds like a good idea.

@MilesCranmer
Copy link
Collaborator

The specification of base images from the command line is very nice. I will have to read how travis tests different languages, and if there is a similar settings where we can set different environment variables to run over.

@MilesCranmer
Copy link
Collaborator

Sorry that my review has a big red "X" on the top of it, I wasn't sure what the difference between the options were for suggesting changes. This PR is great.

@benbarsdell
Copy link
Collaborator Author

Thanks Miles, fantastic review! Lots of excellent points.

Re the test failures, what CUDA version and GPU model is this running with?

@MilesCranmer
Copy link
Collaborator

CUDA 8.0 on a Tesla K80 (it's an AWS p2.xlarge), default build settings (git checkout <this-pull-request> then make docker)

@MilesCranmer
Copy link
Collaborator

I set up docker hub on my fork. You can pull with, e.g.,

docker pull mcranmer/bifrost:gpu-base

The default (:latest) is the cpu docker.

Here are all the images: https://hub.docker.com/r/mcranmer/bifrost/tags/
Each one is off a different branch, e.g., https://github.com/MilesCranmer/bifrost/tree/docker-gpu
We could get travis to send a POST request to the hub every time a new commit passes all tests to build every image again.

@MilesCranmer
Copy link
Collaborator

We could also pull from docker library images that eliminate redundancy in our builds, e.g., python:2.7-slim (which comes with pip/setuptools, and basic build dependencies)

https://github.com/docker-library/python/blob/1ca4a57b20a2f66328e5ef72df866f701c0cd306/2.7/slim/Dockerfile

@MilesCranmer
Copy link
Collaborator

Actually, I forgot that that won't work, as they need the nvidia/cuda base instead of debian:jessie.

Do you know of any NVIDIA docker library which attempts to re-create some of the official docker-library using an nvidia/cuda base? It looks like quite a few of the official images are built off of debian variants, which means it wouldn't break anything to use the base image as nvidia/cuda but otherwise leave the dockerfile identical.

I made one for buildpack-deps:xenial - https://github.com/MilesCranmer/docker-cuda-buildpack, but it is obviously not official.

@MilesCranmer
Copy link
Collaborator

ledatelescope now has a Docker hub: https://hub.docker.com/r/ledatelescope/bifrost/

The only image is :gpu-base, based off of the single-commit (orphaned) gpu-base branch.

@MilesCranmer
Copy link
Collaborator

New image built on top of pypy is :gpu-base-pypy, off the gpu-base-pypy branch

@MilesCranmer
Copy link
Collaborator

:gpu-pypy exists now as well. I have it delete the entirety of the Bifrost source to save space with the module still existing as an import:

nvidia-docker run -it --rm ledatelescope/bifrost:gpu-pypy

python is symbolically to pypy so the default make install works without a modification of the Makefiles.

league added a commit that referenced this pull request Jul 20, 2023
The nvidia/cuda:10.2 image disappeared from the hub, so the GPU
versions in master were not buildable anymore.  Also, the two that
build bifrost were not yet using the configure script.

However, the full-build `.gpu` version doesn't quite work because
nvidia-docker seems to provide access to GPU during run phase, but not
during build phase.  Perhaps relevant:
https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime

The `_prereq.gpu` version builds fully and is still helpful.

Also pinging PR #92 which is outdated but relevant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants