Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libGL error: MESA-LOADER: failed to retrieve device information #257

Open
woensug-choi opened this issue Nov 29, 2023 · 5 comments
Open

Comments

@woensug-choi
Copy link
Contributor

woensug-choi commented Nov 29, 2023

On freshly install Ubuntu 22.04 Jammy LTS. Without doing anything,
I've installed rocker with,

pip3 install rocker
pip3 install --force-reinstall git+https://github.com/osrf/rocker.git@main
rocker --version
# rocker 0.2.12

and ran Example in README

rocker --nvidia --x11 osrf/ros:noetic-desktop-full gazebo

and Got error saying

libGL error: MESA-LOADER: failed to retrieve device information

I was able to fix the problem by adding --volume /dev:/dev in rocker argument. which adds -v /dev:/dev to docker argument.

rocker --volume /dev:/dev --nvidia --x11 osrf/ros:noetic-desktop-full gazebo

Related articles
#206
kinu-garage/hut_10sqft#819

@tfoote
Copy link
Collaborator

tfoote commented Nov 29, 2023

On your fresh install. Do you have the NVIDIA drivers installed And you should make sure that you've installed and setup nvidia-docker or now the NVIDIA Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html ?

@woensug-choi
Copy link
Contributor Author

Yes. I had toolkit installed.

@tfoote
Copy link
Collaborator

tfoote commented Mar 1, 2024

As far as I can tell your device isn't being mounted correctly. Your solution of mounting all of /dev tells me that the device is available. It's just a matter of understanding what your device is and making sure to mount it. Mounting all devices is too broad a brush. More specific feedback in #258 which I closed as too broad a solution. But with a more targeted fix we could add a solution.

@noah-curran
Copy link

I'll preface this that I'm a Docker noob so its entirely possible I'm doing something wrong... but I've been having this issue with my fresh install as well. I've spent several hours over the past 5 days on this so I've ruled out many of the common points of advice and came here to discover this chain.

Similar to @woensug-choi I'm using Ubuntu 22.04 and Nvidia driver version 535. I have an RTX 4070 and understand that 535 is not a tested driver version, but my GPU does not support the maximum driver version tested, 470.

Before coming to rocker, I have been experimenting with mounting individual devices in /dev instead of mounting the whole directory like @woensug-choi has suggested in his solution. Doing this while I boot up some example dockers to inspect the issues I notice two things: (1) I need to add --device /dev/nvidiactl. I'm uncertain why because before doing this a simple ls into /dev shows that this device is present before adding this to the docker line... but without this step I get the notorious Failed to initialize NVML: Unknown Error if I try running nvidia-smi. After adding nvidiactl, I get No devices were found, but at least nvidia-smi works. This leads to (2) I need to add --device /dev/nvidia0:/dev/nvidia0 to the docker line. After this, it works as expected.

My solution is a bit less general than @woensug-choi since it is directly resolving the pain points that I discovered, but I don't think its quite where it needs to be to merge into rocker since I imagine it will fail for users who have more than one GPU. Maybe this info will help guide this issue.

FWIW I do not believe this is a rocker-specific issue. It appears to be either a docker issue or an nvidia-docker issue. I think --gpus all is what should make all of this work, but for whatever reason it has just lead to broken mounts. I have yet to dive deeper into the code of docker to understand what that flag is actually doing so I can't comment on it further besides for having a hunch about it being the root of the issue.

# nvidia_extension.py
# ...
class X11(RockerExtension):
    @staticmethod
    def get_name():
        return 'x11'

    def __init__(self):
        self.name = X11.get_name()
        self._env_subs = None
        self._xauth = None

    def get_docker_args(self, cliargs):
        assert self._xauth, 'xauth not initialized, get_docker_args must be called after precodition_environment'
        xauth = self._xauth.name
        return "  -e DISPLAY -e TERM \
  -e QT_X11_NO_MITSHM=1 \
  -e XAUTHORITY=%(xauth)s -v %(xauth)s:%(xauth)s \
  -v /tmp/.X11-unix:/tmp/.X11-unix \


#####
# Here is where I have my changes.

  --device /dev/nvidiactl \
  --device /dev/nvidia0:/dev/nvidia0 \

#####


  -v /etc/localtime:/etc/localtime:ro " % locals()
# ...

@tfoote
Copy link
Collaborator

tfoote commented May 13, 2024

Thanks for the extra info and debugging. That sounds parallel to the need for Intel integrated /dev/dri/card0 It seems like the different cards/drivers for NVIDIA may need different devices mounted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants