Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda toolkit not installed for user #198

Open
81reap opened this issue Feb 23, 2024 · 0 comments
Open

cuda toolkit not installed for user #198

81reap opened this issue Feb 23, 2024 · 0 comments

Comments

@81reap
Copy link

81reap commented Feb 23, 2024

Steps To Recreate

  1. Perform a clean install of bazzite-nvidia.
  2. Login as the user.
  3. Check for cuda by running nvcc --version. It will fail to find the command.

Expected Behavior

rpm-ostree and nvidia-smi show that cuda and cuda toolkit should be installed, however nvcc --version fails to work.

reap@fedora:~$ nvidia-smi
Thu Feb 22 18:58:12 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX 4000 SFF Ada ...    Off | 00000000:01:00.0 Off |                  Off |
| 30%   33C    P8               5W /  70W |      2MiB / 20475MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

reap@fedora:~$ rpm -qa | grep nvidia
nvidia-gpu-firmware-20240115-2.fc39.noarch
ublue-os-nvidia-addons-0.10-1.fc39.noarch
xorg-x11-drv-nvidia-cuda-libs-545.29.06-2.fc39.x86_64
nvidia-modprobe-545.29.06-1.fc39.x86_64
nvidia-persistenced-545.29.06-1.fc39.x86_64
nvidia-container-toolkit-base-1.14.5-1.x86_64
libnvidia-container1-1.14.5-1.x86_64
libnvidia-container-tools-1.14.5-1.x86_64
nvidia-container-toolkit-1.14.5-1.x86_64
xorg-x11-drv-nvidia-kmodsrc-545.29.06-2.fc39.x86_64
libva-nvidia-driver-0.0.11-1.fc39.x86_64
xorg-x11-drv-nvidia-libs-545.29.06-2.fc39.i686
xorg-x11-drv-nvidia-libs-545.29.06-2.fc39.x86_64
nvidia-settings-545.29.06-1.fc39.x86_64
xorg-x11-drv-nvidia-power-545.29.06-2.fc39.x86_64
kmod-nvidia-6.7.5-201.fsync.fc39.x86_64-545.29.06-3.fc39.x86_64
xorg-x11-drv-nvidia-545.29.06-2.fc39.x86_64
xorg-x11-drv-nvidia-cuda-libs-545.29.06-2.fc39.i686
xorg-x11-drv-nvidia-cuda-545.29.06-2.fc39.x86_64
xorg-x11-drv-nvidia-devel-545.29.06-2.fc39.x86_64

reap@fedora:~$ nvcc --version
# only works after the workaround

Hardware

B550I Aurus Pro AX
AMD Ryzen 7 5700G
Nvidia RTX 4000 SFF Ada Gen
2x32GB @ 3200 MHz
2TB NVME Drive

Setup Notes

  • Secureboot is disabled in the BIOS.
  • OS and KDE run on the AMD GPU. Steam Games are able to successfully launch on the Nvidia gpu.
  • After applying the workaround PyTorch is also able to successfully run on the Nvidia gpu.

The Workaround

note :: The workaround does not fix the issue for podman containers running with CDI. Any cuda required workloads will have to be run in the userspace.

$ nvidia-smi
# this shows the correct output and says that cuda 12.3 is installed
$ nvcc --version
# this should fail to find nvcc
$ ls /etc/local
# this output does not contain cuda which confirms that the cuda toolkit is not installed

$ wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda_12.3.2_545.23.08_linux.run
$ sudo sh cuda_12.3.2_545.23.08_linux.run
# this will require you to accept the licence first. You should only be installing the cuda drivers as the system already has nvidia drivers.
$ ls /etc/local
# now we have the cuda toolkit, but nvcc will still fail as it is not on your path

# add this to your ~/.bashrc so that it is loaded every boot
$ export PATH=/usr/local/cuda-12.3/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
$ nvcc --version 
# nvcc now works

Related Issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant