Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda-nvcc missing again #438

Open
dhruvbalwada opened this issue Feb 1, 2023 · 12 comments · May be fixed by #549
Open

cuda-nvcc missing again #438

dhruvbalwada opened this issue Feb 1, 2023 · 12 comments · May be fixed by #549

Comments

@dhruvbalwada
Copy link
Member

It seems that the problem detected and solved in issue #387
has resurfaced again. I think this happened after #435 was merged.

The problem:

There is a ptxas based error that shows up. Can be easily reproduced as:

from jax import random
random.PRNGkey(0)

gives the error that

2023-02-01 19:08:39.849007: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:85] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2023-02-01 19:08:39.849939: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:454] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas'  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Aborted

During the last discussion, @ngam had asked to check what version of cuda-nvcc existed. When I check this

conda list | grep cuda-nvcc

This returns nothing, showing that there is no cuda-nvcc in the tensorflow/jax based ml-notebook.

Installing cuda-nvcc by using mamba install cuda-nvcc==11.6.* -c nvidia solves the problem.

However, it would be good if the user did not have to manually do this installation, and the docker image was properly setup.

@scottyhq
Copy link
Member

scottyhq commented Feb 3, 2023

@dhruvbalwada I thought it was removed intentionally b/c no longer needed? See conversation here #398 ...

@dhruvbalwada
Copy link
Member Author

dhruvbalwada commented Feb 3, 2023

Maybe @yuvipanda or @ngam or @weiji14 can chip in about why the problem has resurfaced?

@ngam
Copy link
Contributor

ngam commented Feb 4, 2023

It’s a complicated issue with all sorts of stuff. I think for now the best thing is to keep it out and let the user find a resolution. This is generally a tricky problem with, and mismatches are bound to happen.

The good news is that cuda-nvcc is coming to conda-forge soon; the bad news is that it’ll be a while before the lengthy migration effort concludes.

Xref:

@ngam
Copy link
Contributor

ngam commented Feb 4, 2023

Btw, thanks @dhruvbalwada for keeping an eye on this, and for the detailed report :)

@ngam
Copy link
Contributor

ngam commented May 15, 2023

Small update: This is finally getting resolved... hopefully very soon! xref #450

@weiji14
Copy link
Member

weiji14 commented Jun 27, 2023

Looks like cuda-nvcc is now on conda-forge - https://github.com/conda-forge/cuda-nvcc-feedstock. Is it better to install in directly in the ml-notebook image, or wait for the ML libraries like Tensorflow/Jax to depend on cuda-nvcc directly first? I see some mention of it e.g. at conda-forge/tensorflow-feedstock#296 (comment).

@ngam
Copy link
Contributor

ngam commented Jul 5, 2023

We should likely wait. I am still trying to assess how best to migrate Jax and TensorFlow to the new packaging format. We in a bit of a bind here... with volunteer maintainers occupied with other tasks... but tensorflow 2.12 is very close and I am making small progress on jaxlib.

@weiji14
Copy link
Member

weiji14 commented Sep 14, 2023

Someone reported on the forum at https://discourse.pangeo.io/t/how-to-run-code-using-gpu-on-pangeo-saying-libdevice-not-found-at-libdevice-10-bc/3672 about missing cuda-nvcc and XLA_FLAGS causing issues. Can we revisit adding cuda-nvcc to the docker image again, if the matter is resolved on conda-forge @ngam? @yuvipanda mentioned that 2i2c doesn't use the old K80 GPUs anymore, so we don't need to worry about backward compatibility if it helps.

@weiji14
Copy link
Member

weiji14 commented May 21, 2024

Quick note to say that jaxlib-0.4.23-cuda120py* actually has an explicit runtime dependency on cuda-nvcc now (see conda-forge/jaxlib-feedstock#241), but we'll need some more updates on tensorflow to resolve an incompatibility with libabseil versions. See #549 (comment), and keep an eye on conda-forge/tensorflow-feedstock#385.

Once those PRs are merged, users shouldn't have to install cuda-nvcc manually anymore, as they should be installed directly with jaxlib.

@benz0li
Copy link

benz0li commented Jun 4, 2024

@dhruvbalwada Try my/b-data's CUDA-enabled JupyterLab Python docker stack

On the host

ℹ️ NVIDIA Driver v555.42.02 required

docker run --gpus all --rm -ti glcr.b-data.ch/jupyterlab/cuda/python/base bash

==========
== CUDA ==
==========

CUDA Version 12.5.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

=============
== JUPYTER ==
=============

Entered start.sh with args: bash
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/start-notebook.d/10-populate.sh
Done running hooks in: /usr/local/bin/start-notebook.d
Running hooks in: /usr/local/bin/before-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/before-notebook.d/10-env.sh
Sourcing shell script: /usr/local/bin/before-notebook.d/11-home.sh
Sourcing shell script: /usr/local/bin/before-notebook.d/30-code-server.sh
Sourcing shell script: /usr/local/bin/before-notebook.d/90-limits.sh
Done running hooks in: /usr/local/bin/before-notebook.d
Executing the command: bash

In the container

pip install "jax[cuda12]" jaxlib
Defaulting to user installation because normal site-packages is not writeable
Collecting jaxlib
  Downloading jaxlib-0.4.28-cp312-cp312-manylinux2014_x86_64.whl.metadata (1.8 kB)
Collecting jax[cuda12]
  Downloading jax-0.4.28-py3-none-any.whl.metadata (23 kB)
Collecting ml-dtypes>=0.2.0 (from jax[cuda12])
  Downloading ml_dtypes-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)
Collecting numpy>=1.22 (from jax[cuda12])
  Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.0/61.0 kB 934.8 kB/s eta 0:00:00
Collecting opt-einsum (from jax[cuda12])
  Downloading opt_einsum-3.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting scipy>=1.9 (from jax[cuda12])
  Downloading scipy-1.13.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.6/60.6 kB 6.6 MB/s eta 0:00:00
Collecting jax-cuda12-plugin==0.4.28 (from jax[cuda12])
  Downloading jax_cuda12_plugin-0.4.28-cp312-cp312-manylinux2014_x86_64.whl.metadata (560 bytes)
Collecting nvidia-cublas-cu12>=12.1.3.1 (from jax[cuda12])
  Downloading nvidia_cublas_cu12-12.5.2.13-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12>=12.1.105 (from jax[cuda12])
  Downloading nvidia_cuda_cupti_cu12-12.5.39-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cuda-nvcc-cu12>=12.1.105 (from jax[cuda12])
  Downloading nvidia_cuda_nvcc_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12>=12.1.105 (from jax[cuda12])
  Downloading nvidia_cuda_runtime_cu12-12.5.39-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cudnn-cu12<9.0,>=8.9.2.26 (from jax[cuda12])
  Downloading nvidia_cudnn_cu12-8.9.7.29-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cufft-cu12>=11.0.2.54 (from jax[cuda12])
  Downloading nvidia_cufft_cu12-11.2.3.18-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12>=11.4.5.107 (from jax[cuda12])
  Downloading nvidia_cusolver_cu12-11.6.2.40-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12>=12.1.0.106 (from jax[cuda12])
  Downloading nvidia_cusparse_cu12-12.4.1.24-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nccl-cu12>=2.18.1 (from jax[cuda12])
  Downloading nvidia_nccl_cu12-2.21.5-py3-none-manylinux2014_x86_64.whl.metadata (1.8 kB)
Collecting nvidia-nvjitlink-cu12>=12.1.105 (from jax[cuda12])
  Downloading nvidia_nvjitlink_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting jax-cuda12-pjrt==0.4.28 (from jax-cuda12-plugin==0.4.28->jax[cuda12])
  Downloading jax_cuda12_pjrt-0.4.28-py3-none-manylinux2014_x86_64.whl.metadata (349 bytes)
Collecting nvidia-cuda-nvrtc-cu12 (from nvidia-cudnn-cu12<9.0,>=8.9.2.26->jax[cuda12])
  Downloading nvidia_cuda_nvrtc_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Downloading jaxlib-0.4.28-cp312-cp312-manylinux2014_x86_64.whl (77.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.6/77.6 MB 14.7 MB/s eta 0:00:00
Downloading jax_cuda12_plugin-0.4.28-cp312-cp312-manylinux2014_x86_64.whl (12.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.4/12.4 MB 47.2 MB/s eta 0:00:00
Downloading jax_cuda12_pjrt-0.4.28-py3-none-manylinux2014_x86_64.whl (86.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.8/86.8 MB 10.4 MB/s eta 0:00:00
Downloading ml_dtypes-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 39.8 MB/s eta 0:00:00
Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.0/18.0 MB 34.2 MB/s eta 0:00:00
Downloading nvidia_cublas_cu12-12.5.2.13-py3-none-manylinux2014_x86_64.whl (363.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 363.3/363.3 MB 3.8 MB/s eta 0:00:00
Downloading nvidia_cuda_cupti_cu12-12.5.39-py3-none-manylinux2014_x86_64.whl (13.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.8/13.8 MB 3.7 MB/s eta 0:00:00
Downloading nvidia_cuda_nvcc_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl (22.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 22.5/22.5 MB 2.4 MB/s eta 0:00:00
Downloading nvidia_cuda_runtime_cu12-12.5.39-py3-none-manylinux2014_x86_64.whl (895 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 895.1/895.1 kB 6.6 MB/s eta 0:00:00
Downloading nvidia_cudnn_cu12-8.9.7.29-py3-none-manylinux1_x86_64.whl (704.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 704.7/704.7 MB 2.4 MB/s eta 0:00:00
Downloading nvidia_cufft_cu12-11.2.3.18-py3-none-manylinux2014_x86_64.whl (192.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 192.5/192.5 MB 1.8 MB/s eta 0:00:00
Downloading nvidia_cusolver_cu12-11.6.2.40-py3-none-manylinux2014_x86_64.whl (130.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.3/130.3 MB 934.7 kB/s eta 0:00:00
Downloading nvidia_cusparse_cu12-12.4.1.24-py3-none-manylinux2014_x86_64.whl (209.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.2/209.2 MB 5.8 MB/s eta 0:00:00
Downloading nvidia_nccl_cu12-2.21.5-py3-none-manylinux2014_x86_64.whl (188.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 188.7/188.7 MB 9.4 MB/s eta 0:00:00
Downloading nvidia_nvjitlink_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl (21.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.3/21.3 MB 30.6 MB/s eta 0:00:00
Downloading scipy-1.13.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.2/38.2 MB 22.0 MB/s eta 0:00:00
Downloading jax-0.4.28-py3-none-any.whl (1.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 24.7 MB/s eta 0:00:00
Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.5/65.5 kB 7.4 MB/s eta 0:00:00
Downloading nvidia_cuda_nvrtc_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl (24.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.9/24.9 MB 23.6 MB/s eta 0:00:00
Installing collected packages: jax-cuda12-pjrt, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-nvcc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, jax-cuda12-plugin, scipy, opt-einsum, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, ml-dtypes, nvidia-cusolver-cu12, jaxlib, jax
Successfully installed jax-0.4.28 jax-cuda12-pjrt-0.4.28 jax-cuda12-plugin-0.4.28 jaxlib-0.4.28 ml-dtypes-0.4.0 numpy-1.26.4 nvidia-cublas-cu12-12.5.2.13 nvidia-cuda-cupti-cu12-12.5.39 nvidia-cuda-nvcc-cu12-12.5.40 nvidia-cuda-nvrtc-cu12-12.5.40 nvidia-cuda-runtime-cu12-12.5.39 nvidia-cudnn-cu12-8.9.7.29 nvidia-cufft-cu12-11.2.3.18 nvidia-cusolver-cu12-11.6.2.40 nvidia-cusparse-cu12-12.4.1.24 nvidia-nccl-cu12-2.21.5 nvidia-nvjitlink-cu12-12.5.40 opt-einsum-3.3.0 scipy-1.13.1
python
Python 3.12.3 (main, Apr  9 2024, 18:09:17) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jax
>>> jax.random.PRNGKey(0)
Array([0, 0], dtype=uint32)
>>> jax.devices()
[cuda(id=0)]
>>> 

What makes my/b-data's images different:

  1. Multi-arch: linux/amd64, linux/arm64/v8
  2. Derived from nvidia/cuda:12.5.0-devel-ubuntu22.04
    • including development libraries and headers
  3. TensortRT and TensorRT plugin libraries
    • including development libraries and headers
  4. IDE: code-server next to JupyterLab
  5. Just Python – no Conda / Mamba

@benz0li
Copy link

benz0li commented Jun 4, 2024

@dhruvbalwada Because jax[cuda12] brings its own CUDA libraries, you could also use

docker run --gpus all --rm -ti glcr.b-data.ch/jupyterlab/python/base bash

which does not have a CUDA Toolkit pre-installed and is therefore much smaller.

@benz0li
Copy link

benz0li commented Jun 4, 2024

Final note: Using pip, the above also works with the official python:3.12 image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants