New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update notebook server images + support ARM64 #7357
feat: update notebook server images + support ARM64 #7357
Conversation
/assign @kimwnasptd I am very happy to finally have this PR ready for review! If you want to test the images, I have been building them on my |
@thesuperzapper
|
@alekseyolg let's discuss your changes in a follow-up PR (you can make one once we merge this), as they don't impact the functionality of the images, and we need to get these merged so we can test them with Kubeflow 1.8. Also, my general principle is that layers aren't really that much of a concern, but clarity (and the ability for people to modify them easily, e.g. to remove |
@kimwnasptd I think this is ready to merge, I just made one small final change in the code-server image with 99c7f6a |
See the [custom images guide](#custom-images) to learn how to extend them with your own packages. | ||
```mermaid | ||
graph TD | ||
Base[<a href='https://github.com/thesuperzapper/kubeflow/tree/master/components/example-notebook-servers/base'>Base</a>] --> Jupyter[<a href='https://github.com/thesuperzapper/kubeflow/tree/master/components/example-notebook-servers/jupyter'>Jupyter</a>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be better if we point to the kubeflow/kubeflow repository here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch, will quickly fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in e38abb7
Just want to remind any reviewers about the pre-built images from this PR that people can test with. They are linked in #7357 (comment) |
@thesuperzapper I tried running the build in my M2 and saw the following: ARCH=linux/arm64/v8 make docker-build-multi-arch
...
------------------------------------------------------------------------------
Building 'jupyter-pytorch-cuda' image for 'linux/arm64/v8'...
------------------------------------------------------------------------------
...
#7 [2/4] RUN python3 -m pip install --quiet --no-cache-dir --index-url https://download.pytorch.org/whl/cu121 torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0
#7 1.344 ERROR: Could not find a version that satisfies the requirement torch==2.1.0 (from versions: 2.0.0, 2.0.1)
#7 1.344 ERROR: No matching distribution found for torch==2.1.0
#7 ERROR: executor failed running [/bin/bash -c python3 -m pip install --quiet --no-cache-dir --index-url https://download.pytorch.org/whl/cu121 torch==${PYTORCH_VERSION} torchvision==${TORCHVISION_VERSION} torchaudio==${TORCHAUDIO_VERSION}]: exit code: 1
------
> importing cache manifest from ghcr.io/kubeflow/kubeflow/notebook-servers/build-cache:jupyter-pytorch-cuda:
------
------
> [2/4] RUN python3 -m pip install --quiet --no-cache-dir --index-url https://download.pytorch.org/whl/cu121 torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0:
#7 1.344 ERROR: Could not find a version that satisfies the requirement torch==2.1.0 (from versions: 2.0.0, 2.0.1)
#7 1.344 ERROR: No matching distribution found for torch==2.1.0
------
ERROR: failed to solve: executor failed running [/bin/bash -c python3 -m pip install --quiet --no-cache-dir --index-url https://download.pytorch.org/whl/cu121 torch==${PYTORCH_VERSION} torchvision==${TORCHVISION_VERSION} torchaudio==${TORCHAUDIO_VERSION}]: exit code: 1
make[1]: *** [../common.mk:88: docker-build-multi-arch] Error 1
make[1]: Leaving directory '/home/ubuntu/Code/git/kubeflow/components/example-notebook-servers/jupyter-pytorch-cuda'
make: *** [Makefile:41: docker-build-multi-arch--jupyter-pytorch-cuda] Error 2 |
needs: [ base_images ] | ||
secrets: inherit | ||
with: | ||
build_arch: linux/amd64,linux/arm64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we instead use linux/arm64/v8
here for M2 architecture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the implied default, and I don't know why we are explicitly specifying it as v8 in the other ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting, I remember when I had checked this wasn't the default. But am also not 100% sure if I had seen this in the docker official docs or someplace else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kimwnasptd Either way, I have never had an issue with images that are built for linux/arm64
, and I actually think that v7
is 32bit.
needs: [ base_images ] | ||
secrets: inherit | ||
with: | ||
build_arch: linux/amd64,linux/arm64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you also tried this build in GH runners? Am afraid that with trying to build both architectures at the same time we'll exhaust the resources, for which we ended up building serially in the other workflows
https://github.com/kubeflow/kubeflow/blob/master/.github/workflows/poddefaults_docker_publish.yaml#L46-L48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kimwnasptd yep, the workflows are designed very carefully to not exceed resource limits, and even when the build cache is empty, they still run successfully.
See the most recent few runs on my thesuperzapper/kubeflow
repo: https://github.com/thesuperzapper/kubeflow/actions
@kimwnasptd The CUDA images don't support ARM (and in the CI/CD workflow they are only built for X86) Also, in an unrelated note, most of the time you will want to use |
The changes look good and also tried to run a couple of notebooks. @thesuperzapper this is solid work! Exciting to see those images being simplified and also have support for ARM. As mentioned in the CM I didn't take a full blown view due to time constraints. Let's keep an eye on any user feedback in case any issues occur and I'll try to help with any review necessary. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kimwnasptd The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@kimwnasptd There is still a small chance that the first build fails, because I was not testing with pushing to DockerHub, but I will quickly follow up if anything goes wrong. (It might take a while for the first build, so let it finish before cherry picking) |
* feat: update example notebook servers * docs: update example notebook servers readme * feat: update code-server notebook image start args * docs: update links to use kubeflow/kubeflow repo
* feat: update example notebook servers * docs: update example notebook servers readme * feat: update code-server notebook image start args * docs: update links to use kubeflow/kubeflow repo Co-authored-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
This PR significantly updates and improves all our example notebook server images.
Key changes
make docker-build-dep
to build with all base images that that image depends on.ghcr.io/kubeflow/kubeflow/notebook-servers/build-cache
image to store caches, which should significantly speed up builds.2.13.0
by default.11.8
in the TensorFlow CUDA images.2.1.0
by default.12.1
in the PyTorch CUDA images.4.0.7
by default.jupyterlab-git
plugin, as it does not yet support JupyterLab 4.0, but hopefully they will release an update in the coming days.3.11.6
by default.I have tested the images in real-world use cases for Tensorflow and PyTorch (including on GPUs), but we will need to get more feedback after we release Kubeflow 1.8 with these images.
Other Notes
components/example-notebook-servers/
folder, will trigger a build of all notebook servers (which should be fast, because of the caching, unless the PR changes the base images).base
,jupyter
, etc), this PR changed that, and now all images are always pushed.example-notebook-servers/jupyter-pytorch-cuda
example-notebook-servers/jupyter-pytorch-cuda-full
example-notebook-servers/jupyter-tensorflow-cuda
example-notebook-servers/jupyter-tensorflow-cuda-full
Next steps
kubeflow.org
to reflect the same changes made in theREADME
of theexample-notebook-servers/
folder