Refactor Dockerfiles to use common base layer #92

benbarsdell · 2017-06-10T19:42:47Z

Dockerfile.base contains all the Bifrost dependencies.
Dockerfile inherits Dockerfile.base and contains the Bifrost build.
The old Dockerfile.cpu and Dockerfile.gpu are removed.
The FROM line and build args in the Dockerfiles are parameterised
using ARGs, which are inserted by the Makefile (a WAR is currently
needed for the FROM line because it doesn't support ARG, but this was
recently fixed in upstream Docker so we can eventually do it
properly).
The Makefile defines 4 targets: docker-base (GPU container with
dependencies only), docker (GPU container with Bifrost installed),
docker-base-cpu (CPU container with dependencies only), and docker-cpu
(CPU container with Bifrost installed).
We can now also build containers with different OS images by
specifying CUDA_IMAGE_NAME or CPU_IMAGE_NAME. E.g.,
"make docker
CUDA_IMAGE_NAME=nvidia/cuda:7.5
IMAGE_NAME=ledatelescope/bifrost-cuda-7.5", which allows us to easily
test compatibility with older versions of CUDA.

- Dockerfile.base contains all the Bifrost dependencies. - Dockerfile inherits Dockerfile.base and contains the Bifrost build. - The old Dockerfile.cpu and Dockerfile.gpu are removed. - The FROM line and build args in the Dockerfiles are parameterised using ARGs, which are inserted by the Makefile (a WAR is currently needed for the FROM line because it doesn't support ARG, but this was recently fixed in upstream Docker so we can eventually do it properly). - The Makefile defines 4 targets: docker-base (GPU container with dependencies only), docker (GPU container with Bifrost installed), docker-base-cpu (CPU container with dependencies only), and docker-cpu (CPU container with Bifrost installed). - We can now also build containers with different OS images by specifying CUDA_IMAGE_NAME or CPU_IMAGE_NAME. E.g., "make docker CUDA_IMAGE_NAME=nvidia/cuda:7.5 IMAGE_NAME=ledatelescope/bifrost-cuda-7.5", which allows us to easily test compatibility with older versions of CUDA.

MilesCranmer

Sorry that my review has a big red "X" on the top of it, I wasn't sure what the difference between the options were for suggesting changes. This PR is great.

Some general comments:

(I can push some of these changes in the morning (AEST morning))

README needs an update to the new build instructions to make
vim, nano, doxygen could be removed as dependencies (if we are aiming for a minimal container with working bifrost, as the user can always add more)
Having never seen a project with a Dockerfile which requires pre-processing by Make (though I do really like the idea), I think we could have an additional "standard" dockerfile for people who don't like to read every word of a README, the file having an explicit FROM at the top so you can build it with a vanilla docker build ....
- Since Dockerfile is a filename required by the docker hub (weird, I know, but even the official library images are all just Dockerfile in different directories), the Dockerfile's actually can't have pre-processing if we want to host images on docker hub
- We can still have them as is, but with different names (and maybe in a different directory because there are so many front-page files now... I'll move out the travis docs builder too), then with explicit Dockerfiles on the front
- The Dockerfile should pull from a ledatelescope/bifrost:deps image from docker hub once we get it set up (to avoid the rebuilding of unchanged dependencies)
  - I can set one up off my fork until I get approval from @ledatelescope to set up a docker hub under ledatelescope.
We may want to rename the default local images to be of the form bifrost... instead of ledatelescope/bifrost, as that is where the docker hub images will live
I get errors with the version.py file during make docker:

Traceback (most recent call last):
  File "setup.py", line 34, in <module>
    with open(bifrost_version_file, 'r') as version_file:
IOError: [Errno 2] No such file or directory: 'bifrost/version.py'
make[1]: *** [clean] Error 1

(it still builds and runs though!)

Everything builds and works (aside from a couple broken unit tests in the GPU container, I'll post at bottom)
The git clone of pyclibrary should be replaced by the instructions I put on the README (pip install git+https...), as it's faster and can be combined with the above pip for one less layer. Thanks @caseyjlaw for showing me this magic.

Other

(Unrelated) Here's a nice hack I found (arguments to docker run)

-v /var/run/docker.sock:/var/run/docker.sock -v /usr/bin/docker:/usr/bin/docker

This lets you control the docker daemon from within a container, so you never have to leave.

Here are the errors I saw from the GPU unit tests. The last one is weird. I think it should be np.testing.assert_almost_equal to kill tiny rounding errors?

linalg.cu:148 Condition failed: "Supported dtype for array a"
linalg.cu:148 error 22: BF_STATUS_UNSUPPORTED_DTYPE
linalg.cu:243 error 22: BF_STATUS_UNSUPPORTED_DTYPE
E..................F................s.........................
======================================================================
ERROR: test_matmul_aa_ci8 (test_linalg.TestLinAlg)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspace/bifrost/test/test_linalg.py", line 75, in test_matmul_aa_ci8
    self.run_test_matmul_aa_ci8_shape((11,23))
  File "/workspace/bifrost/test/test_linalg.py", line 54, in run_test_matmul_aa_ci8_shape
    self.linalg.matmul(1, a, None, 0, c)
  File "build/bdist.linux-x86_64/egg/bifrost/linalg.py", line 59, in matmul
    beta, c_array))
  File "build/bdist.linux-x86_64/egg/bifrost/libbifrost.py", line 154, in _check
    raise RuntimeError(status_str)
RuntimeError: BF_STATUS_UNSUPPORTED_DTYPE

======================================================================
FAIL: test_repr (test_ndarray.NDArrayTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspace/bifrost/test/test_ndarray.py", line 62, in test_repr
    self.assertEqual(repr_f, repr_k)
AssertionError: '([[ 0.,  1.],\n         [ 2.,  3.],\n         [ 4.,  5.]], dtype=float32)' != '([[ 0.,  1.],\n       [ 2.,  3.],\n       [ 4.,  5.]], dtype=float32)'

----------------------------------------------------------------------
Ran 160 tests in 189.538s

FAILED (failures=1, errors=1, skipped=3)

MilesCranmer · 2017-06-11T10:42:19Z

Makefile

-docker:
-	docker build --pull -t $(IMAGE_NAME):$(LIBBIFROST_MAJOR).$(LIBBIFROST_MINOR) -f Dockerfile.gpu -t $(IMAGE_NAME) .
+docker-base:
+	echo "FROM $(CUDA_IMAGE_NAME)" > _Dockerfile.tmp


These could all be .dockerfile.tmp, instead of _dockerfile.tmp, the latter of which seems like a python-specific convention for private/hidden objects.

MilesCranmer · 2017-06-11T10:46:28Z

Dockerfile

+COPY . .
+RUN make clean && \
+    make -j16 $make_args && \
+    make doc && \


I’ve heard the best docker practice is to make containers
as minimal as possible (?), so I don't think the full doxygen reference should go in. It would be hard to visualize it efficiently without setting up x11 as well, and we have a website to hold it all anyways.

Oh, this make clean is probably causing the version.py error. Does that need to be in there? Is it there just in case the user already built bifrost outside the container?

Removing the make doc sounds like the right idea.

Indeed the make clean is important to avoid strange errors when the user already built the library outside. I'll look into what's going on with version.py.

We could probably also do make clean again after make install.

MilesCranmer · 2017-06-11T10:50:02Z

Dockerfile

@@ -0,0 +1,19 @@
+# Note: FROM command is inserted by Makefile
+# TODO: As of Docker-CE 17.05, we could use this better approach instead:


We can also build from streams (moby/moby#31236) which is nice, so we won't have to create a temporary file within the build context.

MilesCranmer · 2017-06-11T10:52:06Z

Dockerfile

+
+# Build the library
+WORKDIR /bifrost
+COPY . .


We should create a .dockerignore file with at least .git included.

I find it can be useful to have the git history inside the container, but perhaps I should be doing that with your approach of using the base container and mapping the source directory instead. In that case, this sounds like a good idea.

MilesCranmer · 2017-06-11T11:31:27Z

The specification of base images from the command line is very nice. I will have to read how travis tests different languages, and if there is a similar settings where we can set different environment variables to run over.

MilesCranmer · 2017-06-11T11:33:33Z

Sorry that my review has a big red "X" on the top of it, I wasn't sure what the difference between the options were for suggesting changes. This PR is great.

benbarsdell · 2017-06-11T17:40:52Z

Thanks Miles, fantastic review! Lots of excellent points.

Re the test failures, what CUDA version and GPU model is this running with?

MilesCranmer · 2017-06-12T01:42:36Z

CUDA 8.0 on a Tesla K80 (it's an AWS p2.xlarge), default build settings (git checkout <this-pull-request> then make docker)

MilesCranmer · 2017-06-19T00:51:03Z

I set up docker hub on my fork. You can pull with, e.g.,

docker pull mcranmer/bifrost:gpu-base

The default (:latest) is the cpu docker.

Here are all the images: https://hub.docker.com/r/mcranmer/bifrost/tags/
Each one is off a different branch, e.g., https://github.com/MilesCranmer/bifrost/tree/docker-gpu
We could get travis to send a POST request to the hub every time a new commit passes all tests to build every image again.

MilesCranmer · 2017-06-23T04:35:54Z

We could also pull from docker library images that eliminate redundancy in our builds, e.g., python:2.7-slim (which comes with pip/setuptools, and basic build dependencies)

https://github.com/docker-library/python/blob/1ca4a57b20a2f66328e5ef72df866f701c0cd306/2.7/slim/Dockerfile

MilesCranmer · 2017-06-23T11:58:53Z

Actually, I forgot that that won't work, as they need the nvidia/cuda base instead of debian:jessie.

Do you know of any NVIDIA docker library which attempts to re-create some of the official docker-library using an nvidia/cuda base? It looks like quite a few of the official images are built off of debian variants, which means it wouldn't break anything to use the base image as nvidia/cuda but otherwise leave the dockerfile identical.

I made one for buildpack-deps:xenial - https://github.com/MilesCranmer/docker-cuda-buildpack, but it is obviously not official.

MilesCranmer · 2017-07-05T14:03:03Z

ledatelescope now has a Docker hub: https://hub.docker.com/r/ledatelescope/bifrost/

The only image is :gpu-base, based off of the single-commit (orphaned) gpu-base branch.

MilesCranmer · 2017-07-13T17:31:33Z

New image built on top of pypy is :gpu-base-pypy, off the gpu-base-pypy branch

MilesCranmer · 2017-07-13T18:23:36Z

:gpu-pypy exists now as well. I have it delete the entirety of the Bifrost source to save space with the module still existing as an import:

nvidia-docker run -it --rm ledatelescope/bifrost:gpu-pypy

python is symbolically to pypy so the default make install works without a modification of the Makefiles.

The nvidia/cuda:10.2 image disappeared from the hub, so the GPU versions in master were not buildable anymore. Also, the two that build bifrost were not yet using the configure script. However, the full-build `.gpu` version doesn't quite work because nvidia-docker seems to provide access to GPU during run phase, but not during build phase. Perhaps relevant: https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime The `_prereq.gpu` version builds fully and is still helpful. Also pinging PR #92 which is outdated but relevant.

benbarsdell assigned MilesCranmer Jun 10, 2017

benbarsdell requested a review from MilesCranmer June 10, 2017 19:42

Fix travis build by setting docker tag :latest-cpu

8b5bf91

benbarsdell mentioned this pull request Jun 10, 2017

Clean-up: Have a dependency-only image hosted on docker hub #87

Closed

MilesCranmer requested changes Jun 11, 2017

View reviewed changes

MilesCranmer added the clean-up label Jun 11, 2017

league mentioned this pull request Jul 20, 2023

Update dockerfiles to ubuntu-20.04 and autoconf #215

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Dockerfiles to use common base layer #92

Refactor Dockerfiles to use common base layer #92

benbarsdell commented Jun 10, 2017

MilesCranmer left a comment •

edited

MilesCranmer Jun 11, 2017

benbarsdell Jun 11, 2017

MilesCranmer Jun 11, 2017

MilesCranmer Jun 11, 2017

benbarsdell Jun 11, 2017 •

edited

MilesCranmer Jun 11, 2017

MilesCranmer Jun 11, 2017

benbarsdell Jun 11, 2017

MilesCranmer commented Jun 11, 2017

MilesCranmer commented Jun 11, 2017

benbarsdell commented Jun 11, 2017

MilesCranmer commented Jun 12, 2017

MilesCranmer commented Jun 19, 2017

MilesCranmer commented Jun 23, 2017

MilesCranmer commented Jun 23, 2017

MilesCranmer commented Jul 5, 2017

MilesCranmer commented Jul 13, 2017

MilesCranmer commented Jul 13, 2017

		@@ -0,0 +1,19 @@
		# Note: FROM command is inserted by Makefile
		# TODO: As of Docker-CE 17.05, we could use this better approach instead:

Refactor Dockerfiles to use common base layer #92

Are you sure you want to change the base?

Refactor Dockerfiles to use common base layer #92

Conversation

benbarsdell commented Jun 10, 2017

MilesCranmer left a comment • edited

Choose a reason for hiding this comment

Some general comments:

Other

MilesCranmer Jun 11, 2017

Choose a reason for hiding this comment

benbarsdell Jun 11, 2017

Choose a reason for hiding this comment

MilesCranmer Jun 11, 2017

Choose a reason for hiding this comment

MilesCranmer Jun 11, 2017

Choose a reason for hiding this comment

benbarsdell Jun 11, 2017 • edited

Choose a reason for hiding this comment

MilesCranmer Jun 11, 2017

Choose a reason for hiding this comment

MilesCranmer Jun 11, 2017

Choose a reason for hiding this comment

benbarsdell Jun 11, 2017

Choose a reason for hiding this comment

MilesCranmer commented Jun 11, 2017

MilesCranmer commented Jun 11, 2017

benbarsdell commented Jun 11, 2017

MilesCranmer commented Jun 12, 2017

MilesCranmer commented Jun 19, 2017

MilesCranmer commented Jun 23, 2017

MilesCranmer commented Jun 23, 2017

MilesCranmer commented Jul 5, 2017

MilesCranmer commented Jul 13, 2017

MilesCranmer commented Jul 13, 2017

MilesCranmer left a comment •

edited

benbarsdell Jun 11, 2017 •

edited