Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rviz2 RenderSystem errors inside the autoware-container on ubuntu 20.04 on nvidia GPU #206

Open
AndriiChumak145 opened this issue Nov 9, 2022 · 7 comments

Comments

@AndriiChumak145
Copy link

Environment:

  • Ubuntu 20.04, x86-6
  • nvidia GPU with Driver Version: 520.61.05 CUDA Version: 11.8.
  • rocker 0.2.10

I have run an autoware container as described in their docs:
rocker --nvidia --x11 --user --volume $HOME/autoware --volume $HOME/autoware_map -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda
After I tried to use rviz2 I received the following output:

[ERROR] [1667991076.248236751] [rviz2]: Unable to create the rendering window after 100 tries
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unable to create the rendering window after 100 tries
Aborted (core dumped)

After that I tried to run your example:
rocker --nvidia --x11 osrf/ros:crystal-desktop rviz2
and rviz window was displayed. However, I got the following libGL errors and as the output suggests it was trying to load intel drivers instead of nvidia:

QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: Version 4 or later of flush extension not found
libGL error: failed to load driver: i915
libGL error: failed to open drm device: No such file or directory
libGL error: failed to load driver: iris
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: Version 4 or later of flush extension not found
libGL error: failed to load driver: i915
libGL error: failed to open drm device: No such file or directory
libGL error: failed to load driver: iris
[INFO] [rviz2]: Stereo is NOT SUPPORTED
[INFO] [rviz2]: OpenGl version: 3.1 (GLSL 1.4)
[INFO] [rviz2]: Stereo is NOT SUPPORTED

I also managed to run rviz2 without errors without nvidia but the performance was unsatisfactory.

@tfoote
Copy link
Collaborator

tfoote commented Nov 11, 2022

I cannot reproduce your issue. There might be something in your environment or your volumes selecting the intel driver?

Here's my attempt I skipped the volumes as I don't have those setup.

🟢 SUCCESS] ❯ rocker --nvidia --x11 --user -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda
Extension volume doesn't support default arguments. Please extend it.
Active extensions ['nvidia', 'x11', 'user']
Step 1/12 : FROM python:3-slim-stretch as detector
 ---> 7691d3cb6cbc
Step 2/12 : RUN mkdir -p /tmp/distrovenv
 ---> Using cache
 ---> 3d5c8b8d9105
Step 3/12 : RUN python3 -m venv /tmp/distrovenv
 ---> Using cache
 ---> cd138b5d6c5c
Step 4/12 : RUN apt-get update && apt-get install -qy patchelf binutils
 ---> Using cache
 ---> 4cbc7f2267e0
Step 5/12 : RUN . /tmp/distrovenv/bin/activate && pip install distro pyinstaller==4.0 staticx==0.12.3
 ---> Using cache
 ---> 6a00c185aa67
Step 6/12 : RUN echo 'import distro; import sys; output = (distro.name(), distro.version(), distro.codename()); print(output) if distro.name() else sys.exit(1)' > /tmp/distrovenv/detect_os.py
 ---> Using cache
 ---> c6bca879a236
Step 7/12 : RUN . /tmp/distrovenv/bin/activate && pyinstaller --onefile /tmp/distrovenv/detect_os.py
 ---> Using cache
 ---> 69c32080cef1
Step 8/12 : RUN . /tmp/distrovenv/bin/activate && staticx /dist/detect_os /dist/detect_os_static && chmod go+xr /dist/detect_os_static
 ---> Using cache
 ---> 632c0e2ea327
Step 9/12 : FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda
 ---> 6e0960405f3b
Step 10/12 : COPY --from=detector /dist/detect_os_static /tmp/detect_os
 ---> 11d239ef6fe2
Step 11/12 : ENTRYPOINT [ "/tmp/detect_os" ]
 ---> Running in 11a27539889d
Removing intermediate container 11a27539889d
 ---> 63af9129a7c9
Step 12/12 : CMD [ "" ]
 ---> Running in d9d07f4a8393
Removing intermediate container d9d07f4a8393
 ---> 0b8084b7ba90
Successfully built 0b8084b7ba90
Successfully tagged rocker:os_detect_ghcr.io_autowarefoundation_autoware-universe_latest-cuda
running,  docker run -it --rm 0b8084b7ba90
output:  ('Ubuntu', '20.04', 'focal')

Writing dockerfile to /tmp/tmpq2ypdyll/Dockerfile
vvvvvv
# Preamble from extension [nvidia]
# Ubuntu 16.04 with nvidia-docker2 beta opengl support
FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd

# Preamble from extension [x11]

# Preamble from extension [user]


FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda
USER root
# Snippet from extension [nvidia]
RUN apt-get update && apt-get install -y --no-install-recommends \
    libglvnd0 \
    libgl1 \
    libglx0 \
    libegl1 \
    libgles2 \
    && rm -rf /var/lib/apt/lists/*
COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json




ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all}
ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all}

# Snippet from extension [x11]

# Snippet from extension [user]
# make sure sudo is installed to be able to give user sudo access in docker
RUN if ! command -v sudo >/dev/null; then \
      apt-get update \
      && apt-get install -y sudo \
      && apt-get clean; \
    fi

RUN existing_user_by_uid=`getent passwd "1000" | cut -f1 -d: || true` && \
    if [ -n "${existing_user_by_uid}" ]; then userdel -r "${existing_user_by_uid}"; fi && \
    existing_user_by_name=`getent passwd "tfoote" | cut -f1 -d: || true` && \
    existing_user_uid=`getent passwd "tfoote" | cut -f3 -d: || true` && \
    if [ -n "${existing_user_by_name}" ]; then find / -uid ${existing_user_uid} -exec chown -h 1000 {} + || true ; find / -gid ${existing_user_uid} -exec chgrp -h 1000 {} + || true ; fi && \
    if [ -n "${existing_user_by_name}" ]; then userdel -r "${existing_user_by_name}"; fi && \
    existing_group_by_gid=`getent group "1000" | cut -f1 -d: || true` && \
    if [ -z "${existing_group_by_gid}" ]; then \
      groupadd -g "1000" "tfoote"; \
    fi && \
    useradd --no-log-init --no-create-home --uid "1000" -s /bin/bash -c "Tully Foote,,," -g "1000" -d "/home/tfoote" "tfoote" && \
    echo "tfoote ALL=NOPASSWD: ALL" >> /etc/sudoers.d/rocker

# Making sure a home directory exists if we haven't mounted the user's home directory explicitly
RUN mkdir -p "$(dirname "/home/tfoote")" && mkhomedir_helper tfoote
# Commands below run as the developer user
USER tfoote
WORKDIR /home/tfoote


^^^^^^
Building docker file with arguments:  {'path': '/tmp/tmpq2ypdyll', 'rm': True, 'nocache': False, 'pull': False}
building > Step 1/12 : FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd
building >  ---> 333290bd2e04
building > Step 2/12 : FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda
building >  ---> 6e0960405f3b
building > Step 3/12 : USER root
building >  ---> Running in e54bdce7a314
building > Removing intermediate container e54bdce7a314
building >  ---> 1bc9d42f8b04
building > Step 4/12 : RUN apt-get update && apt-get install -y --no-install-recommends     libglvnd0     libgl1     libglx0     libegl1     libgles2     && rm -rf /var/lib/apt/lists/*
building >  ---> Running in f5cb09046247
building > Get:1 http://ppa.launchpad.net/longsleep/golang-backports/ubuntu focal InRelease [17.5 kB]
building > Get:2 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
building > Get:3 http://packages.ros.org/ros2/ubuntu focal InRelease [4685 B]
building > Get:4 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
building > Ign:5 https://s3.amazonaws.com/autonomoustuff-repo focal InRelease
building > Get:6 https://s3.amazonaws.com/autonomoustuff-repo focal Release [2922 B]
building > Ign:7 https://s3.amazonaws.com/autonomoustuff-repo focal Release.gpg
building > Get:8 http://packages.ros.org/ros2/ubuntu focal/main amd64 Packages [1152 kB]
building > Get:9 https://s3.amazonaws.com/autonomoustuff-repo focal/main amd64 Packages [4286 B]
building > Get:10 http://ppa.launchpad.net/longsleep/golang-backports/ubuntu focal/main amd64 Packages [4544 B]
building > Get:11 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2269 kB]
building > Get:12 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
building > Get:13 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
building > Get:14 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
building > Get:15 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
building > Get:16 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
building > Get:17 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [931 kB]
building > Get:18 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.5 kB]
building > Get:19 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1661 kB]
building > Get:20 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
building > Get:21 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2738 kB]
building > Get:22 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.2 kB]
building > Get:23 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1229 kB]
building > Get:24 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1778 kB]
building > Get:25 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB]
building > Get:26 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [27.5 kB]
building > Fetched 25.4 MB in 19s (1368 kB/s)
Reading package lists...
building > Reading package lists...
building > Building dependency tree...
building > 
Reading state information...
building > libegl1 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libegl1 set to manually installed.
libgl1 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libgl1 set to manually installed.
libgles2 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libgles2 set to manually installed.
libglvnd0 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libglvnd0 set to manually installed.
libglx0 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libglx0 set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded.
building > Removing intermediate container f5cb09046247
building >  ---> 581a73c3f7c8
building > Step 5/12 : COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
building >  ---> faeb9301a9b5
building > Step 6/12 : ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all}
building >  ---> Running in 57348d841e1c
building > Removing intermediate container 57348d841e1c
building >  ---> 6fbe6e8a269d
building > Step 7/12 : ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all}
building >  ---> Running in 15047be356a2
building > Removing intermediate container 15047be356a2
building >  ---> d5f06d23835c
building > Step 8/12 : RUN if ! command -v sudo >/dev/null; then       apt-get update       && apt-get install -y sudo       && apt-get clean;     fi
building >  ---> Running in c31e870b4aba
building > Removing intermediate container c31e870b4aba
building >  ---> a9cff202a37a
building > Step 9/12 : RUN existing_user_by_uid=`getent passwd "1000" | cut -f1 -d: || true` &&     if [ -n "${existing_user_by_uid}" ]; then userdel -r "${existing_user_by_uid}"; fi &&     existing_user_by_name=`getent passwd "tfoote" | cut -f1 -d: || true` &&     existing_user_uid=`getent passwd "tfoote" | cut -f3 -d: || true` &&     if [ -n "${existing_user_by_name}" ]; then find / -uid ${existing_user_uid} -exec chown -h 1000 {} + || true ; find / -gid ${existing_user_uid} -exec chgrp -h 1000 {} + || true ; fi &&     if [ -n "${existing_user_by_name}" ]; then userdel -r "${existing_user_by_name}"; fi &&     existing_group_by_gid=`getent group "1000" | cut -f1 -d: || true` &&     if [ -z "${existing_group_by_gid}" ]; then       groupadd -g "1000" "tfoote";     fi &&     useradd --no-log-init --no-create-home --uid "1000" -s /bin/bash -c "Tully Foote,,," -g "1000" -d "/home/tfoote" "tfoote" &&     echo "tfoote ALL=NOPASSWD: ALL" >> /etc/sudoers.d/rocker
building >  ---> Running in 2fc242551567
building > Removing intermediate container 2fc242551567
building >  ---> 586ff5675e26
building > Step 10/12 : RUN mkdir -p "$(dirname "/home/tfoote")" && mkhomedir_helper tfoote
building >  ---> Running in 7dd5dcb04439
building > Removing intermediate container 7dd5dcb04439
building >  ---> eac87d1566c7
building > Step 11/12 : USER tfoote
building >  ---> Running in ae3fbfd78854
building > Removing intermediate container ae3fbfd78854
building >  ---> c00cd9cefed5
building > Step 12/12 : WORKDIR /home/tfoote
building >  ---> Running in 6805c866cf8a
building > Removing intermediate container 6805c866cf8a
building >  ---> ae8f24ddd4c4
building > Successfully built ae8f24ddd4c4
Executing command: 
docker run --rm -it  --gpus all  -e DISPLAY -e TERM   -e QT_X11_NO_MITSHM=1   -e XAUTHORITY=/tmp/.docker4vfc7zea.xauth -v /tmp/.docker4vfc7zea.xauth:/tmp/.docker4vfc7zea.xauth   -v /tmp/.X11-unix:/tmp/.X11-unix   -v /etc/localtime:/etc/localtime:ro  ae8f24ddd4c4 
tfoote@c44a1cb476a3:~$ rviz2 
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-tfoote'
[INFO] [1668152346.771416264] [rviz2]: Stereo is NOT SUPPORTED
[INFO] [1668152346.771498605] [rviz2]: OpenGl version: 3.1 (GLSL 1.4)
[INFO] [1668152346.794573875] [rviz2]: Stereo is NOT SUPPORTED
 

Screenshot from 2022-11-10 23-40-54

There's the obvious question do you have an nvidia graphics card? Do you have an appropriate nvidia driver installed and enabled?

@tfoote
Copy link
Collaborator

tfoote commented Nov 11, 2022

I also verified it works w/o the --user option aka rocker --nvidia --x11 -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda

Please try to make a minimum working example of your issue (preferably with a smaller image) to isolate the issue you're encountering.

@AndriiChumak145
Copy link
Author

  1. Yes, I have nvidia GPU with installed and working drivers (running nvidia-smi inside the container):
nvidia-smi
Fri Nov 11 11:46:30 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   48C    P0    59W /  N/A |    787MiB /  6144MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

  1. I have just run the same command without volumes and the output of the docker is almost the same (different user names and caches). But the rviz error is the same (also without --user option). Here is my output:
rocker --nvidia --x11 --user -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda
Extension volume doesn't support default arguments. Please extend it.
Active extensions ['nvidia', 'x11', 'user']
Step 1/12 : FROM python:3-slim-stretch as detector
 ---> 7691d3cb6cbc
Step 2/12 : RUN mkdir -p /tmp/distrovenv
 ---> Using cache
 ---> 24e842389995
Step 3/12 : RUN python3 -m venv /tmp/distrovenv
 ---> Using cache
 ---> b21aa3d8e3eb
Step 4/12 : RUN apt-get update && apt-get install -qy patchelf binutils
 ---> Using cache
 ---> df0a59acf6f2
Step 5/12 : RUN . /tmp/distrovenv/bin/activate && pip install distro pyinstaller==4.0 staticx==0.12.3
 ---> Using cache
 ---> 3d111c42cc3c
Step 6/12 : RUN echo 'import distro; import sys; output = (distro.name(), distro.version(), distro.codename()); print(output) if distro.name() else sys.exit(1)' > /tmp/distrovenv/detect_os.py
 ---> Using cache
 ---> 3dbbfc370808
Step 7/12 : RUN . /tmp/distrovenv/bin/activate && pyinstaller --onefile /tmp/distrovenv/detect_os.py
 ---> Using cache
 ---> a23feb565b15
Step 8/12 : RUN . /tmp/distrovenv/bin/activate && staticx /dist/detect_os /dist/detect_os_static && chmod go+xr /dist/detect_os_static
 ---> Using cache
 ---> d1af46c69120
Step 9/12 : FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda
 ---> 6e0960405f3b
Step 10/12 : COPY --from=detector /dist/detect_os_static /tmp/detect_os
 ---> 8b99eb5c569e
Step 11/12 : ENTRYPOINT [ "/tmp/detect_os" ]
 ---> Running in 8559fe1ff4e4
Removing intermediate container 8559fe1ff4e4
 ---> 260a6e2a31a0
Step 12/12 : CMD [ "" ]
 ---> Running in bb9b005a9960
Removing intermediate container bb9b005a9960
 ---> ce6ba6f0cc80
Successfully built ce6ba6f0cc80
Successfully tagged rocker:os_detect_ghcr.io_autowarefoundation_autoware-universe_latest-cuda
running,  docker run -it --rm ce6ba6f0cc80
output:  ('Ubuntu', '20.04', 'focal')

Writing dockerfile to /tmp/tmp284jbw5x/Dockerfile
vvvvvv
# Preamble from extension [nvidia]
# Ubuntu 16.04 with nvidia-docker2 beta opengl support
FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd

# Preamble from extension [x11]

# Preamble from extension [user]


FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda
USER root
# Snippet from extension [nvidia]
RUN apt-get update && apt-get install -y --no-install-recommends \
    libglvnd0 \
    libgl1 \
    libglx0 \
    libegl1 \
    libgles2 \
    && rm -rf /var/lib/apt/lists/*
COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json




ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all}
ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all}

# Snippet from extension [x11]

# Snippet from extension [user]
# make sure sudo is installed to be able to give user sudo access in docker
RUN if ! command -v sudo >/dev/null; then \
      apt-get update \
      && apt-get install -y sudo \
      && apt-get clean; \
    fi

RUN existing_user_by_uid=`getent passwd "1000" | cut -f1 -d: || true` && \
    if [ -n "${existing_user_by_uid}" ]; then userdel -r "${existing_user_by_uid}"; fi && \
    existing_user_by_name=`getent passwd "andrii" | cut -f1 -d: || true` && \
    existing_user_uid=`getent passwd "andrii" | cut -f3 -d: || true` && \
    if [ -n "${existing_user_by_name}" ]; then find / -uid ${existing_user_uid} -exec chown -h 1000 {} + || true ; find / -gid ${existing_user_uid} -exec chgrp -h 1000 {} + || true ; fi && \
    if [ -n "${existing_user_by_name}" ]; then userdel -r "${existing_user_by_name}"; fi && \
    existing_group_by_gid=`getent group "1000" | cut -f1 -d: || true` && \
    if [ -z "${existing_group_by_gid}" ]; then \
      groupadd -g "1000" "andrii"; \
    fi && \
    useradd --no-log-init --no-create-home --uid "1000" -s /bin/bash -c "Andrii Chumak,,," -g "1000" -d "/home/andrii" "andrii" && \
    echo "andrii ALL=NOPASSWD: ALL" >> /etc/sudoers.d/rocker

# Making sure a home directory exists if we haven't mounted the user's home directory explicitly
RUN mkdir -p "$(dirname "/home/andrii")" && mkhomedir_helper andrii
# Commands below run as the developer user
USER andrii
WORKDIR /home/andrii


^^^^^^
Building docker file with arguments:  {'path': '/tmp/tmp284jbw5x', 'rm': True, 'nocache': False, 'pull': False}
building > Step 1/12 : FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd
building >  ---> 9d806b36b807
building > Step 2/12 : FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda
building >  ---> 6e0960405f3b
building > Step 3/12 : USER root
building >  ---> Running in 59c4c572c623
building > Removing intermediate container 59c4c572c623
building >  ---> 24a56b7f855a
building > Step 4/12 : RUN apt-get update && apt-get install -y --no-install-recommends     libglvnd0     libgl1     libglx0     libegl1     libgles2     && rm -rf /var/lib/apt/lists/*
building >  ---> Running in 004fa747eaef
building > Get:1 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
building > Get:2 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
building > Get:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:4 http://packages.ros.org/ros2/ubuntu focal InRelease [4685 B]
building > Get:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
building > Get:6 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
building > Get:7 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
building > Get:8 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
building > Ign:9 https://s3.amazonaws.com/autonomoustuff-repo focal InRelease
building > Get:10 http://packages.ros.org/ros2/ubuntu focal/main amd64 Packages [1152 kB]
building > Get:11 https://s3.amazonaws.com/autonomoustuff-repo focal Release [2922 B]
building > Ign:12 https://s3.amazonaws.com/autonomoustuff-repo focal Release.gpg
building > Get:13 http://ppa.launchpad.net/longsleep/golang-backports/ubuntu focal InRelease [17.5 kB]
building > Get:14 https://s3.amazonaws.com/autonomoustuff-repo focal/main amd64 Packages [4286 B]
building > Get:15 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [931 kB]
building > Get:16 http://ppa.launchpad.net/longsleep/golang-backports/ubuntu focal/main amd64 Packages [4544 B]
building > Get:17 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
building > Get:18 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1778 kB]
building > Get:19 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2738 kB]
building > Get:20 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.2 kB]
building > Get:21 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1229 kB]
building > Get:22 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB]
building > Get:23 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [27.5 kB]
building > Get:24 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2269 kB]
building > Get:25 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.5 kB]
building > Get:26 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1661 kB]
building > Fetched 25.4 MB in 5s (4725 kB/s)
Reading package lists...
building > Reading package lists...
building > Building dependency tree...
building > 
Reading state information...
building > libegl1 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libegl1 set to manually installed.
libgl1 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libgl1 set to manually installed.
libgles2 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libgles2 set to manually installed.
libglvnd0 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libglvnd0 set to manually installed.
libglx0 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libglx0 set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded.
building > Removing intermediate container 004fa747eaef
building >  ---> 36c03d12e09a
building > Step 5/12 : COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
building >  ---> ea3b901a36cc
building > Step 6/12 : ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all}
building >  ---> Running in 8eed33abe04b
building > Removing intermediate container 8eed33abe04b
building >  ---> 5c9e9114f31d
building > Step 7/12 : ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all}
building >  ---> Running in 2f7f36ac9cd5
building > Removing intermediate container 2f7f36ac9cd5
building >  ---> 025671b61b12
building > Step 8/12 : RUN if ! command -v sudo >/dev/null; then       apt-get update       && apt-get install -y sudo       && apt-get clean;     fi
building >  ---> Running in ebc2c22aab8c
building > Removing intermediate container ebc2c22aab8c
building >  ---> c792d1ff20d6
building > Step 9/12 : RUN existing_user_by_uid=`getent passwd "1000" | cut -f1 -d: || true` &&     if [ -n "${existing_user_by_uid}" ]; then userdel -r "${existing_user_by_uid}"; fi &&     existing_user_by_name=`getent passwd "andrii" | cut -f1 -d: || true` &&     existing_user_uid=`getent passwd "andrii" | cut -f3 -d: || true` &&     if [ -n "${existing_user_by_name}" ]; then find / -uid ${existing_user_uid} -exec chown -h 1000 {} + || true ; find / -gid ${existing_user_uid} -exec chgrp -h 1000 {} + || true ; fi &&     if [ -n "${existing_user_by_name}" ]; then userdel -r "${existing_user_by_name}"; fi &&     existing_group_by_gid=`getent group "1000" | cut -f1 -d: || true` &&     if [ -z "${existing_group_by_gid}" ]; then       groupadd -g "1000" "andrii";     fi &&     useradd --no-log-init --no-create-home --uid "1000" -s /bin/bash -c "Andrii Chumak,,," -g "1000" -d "/home/andrii" "andrii" &&     echo "andrii ALL=NOPASSWD: ALL" >> /etc/sudoers.d/rocker
building >  ---> Running in ba737cc89f33
building > Removing intermediate container ba737cc89f33
building >  ---> de0aa5dfbf80
building > Step 10/12 : RUN mkdir -p "$(dirname "/home/andrii")" && mkhomedir_helper andrii
building >  ---> Running in 8ff7d0f7b3c0
building > Removing intermediate container 8ff7d0f7b3c0
building >  ---> ec13643e48a3
building > Step 11/12 : USER andrii
building >  ---> Running in 85d13314852a
building > Removing intermediate container 85d13314852a
building >  ---> 7e14b99dde7c
building > Step 12/12 : WORKDIR /home/andrii
building >  ---> Running in 05f492cad5af
building > Removing intermediate container 05f492cad5af
building >  ---> 3320e5cf582f
building > Successfully built 3320e5cf582f
Executing command: 
docker run --rm -it  --gpus all  -e DISPLAY -e TERM   -e QT_X11_NO_MITSHM=1   -e XAUTHORITY=/tmp/.docker9d7ongx9.xauth -v /tmp/.docker9d7ongx9.xauth:/tmp/.docker9d7ongx9.xauth   -v /tmp/.X11-unix:/tmp/.X11-unix   -v /etc/localtime:/etc/localtime:ro  3320e5cf582f 
andrii@de22c16eba70:~$ rviz2
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-andrii'
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: MESA-LOADER: failed to retrieve device information
[ERROR] [1668162674.121348354] [rviz2]: RenderingAPIException: OpenGL 1.5 is not supported in GLRenderSystem::initialiseContext at /tmp/binarydeb/ros-galactic-rviz-ogre-vendor-8.5.1/.obj-x86_64-linux-gnu/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLRenderSystem.cpp (line 1201)

[ERROR] [1668162674.129364641] [rviz2]: Unable to create the rendering window after 100 tries
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unable to create the rendering window after 100 tries
Aborted (core dumped)

So, it looks like the volumes are not the issue.

  1. I have also tried a simpler image with rocker --nvidia --x11 osrf/ros:galactic-desktop and got the same rviz error. Interestingly, when I tried crystal-desktop instead of galactic-desktop (galactic is used for the autoware image) I got the different intel driver error described in my first comment.

@tfoote
Copy link
Collaborator

tfoote commented Dec 1, 2022

You're running a much newer NVIDIA driver than I've ever tested with nvidia-520

Do you know of others using this same grapics driver with the nvidia/opengl images: https://hub.docker.com/r/nvidia/opengl In the past not all nvidia drivers have been cross compatible. I don't know if that's the case here.

The crystal base image is going to be a much older version of Ubuntu which also likely doesn't have the same graphics drivers. That one might be old enough that it detects nvidia incompatability and goes for the Intel driver instead.

@130s
Copy link

130s commented Apr 25, 2023

I cannot repro the issue with ghcr.io/autowarefoundation/autoware-universe:latest-cuda -- I can spawn a GUI (*1).

I do see a similar error with a privately built ROS Galactic image though, so reporting here.
Building docker file with arguments:  {'path': '/tmp/tmp2s44km5w', 'rm': True, 'nocache': False, 'pull': False}
building > Step 1/7 : FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd
building >  ---> 9d806b36b807
building > Step 2/7 : FROM d130s:galactic-focal-fooo
building >  ---> ced0bb716153
building > Step 3/7 : USER root
building >  ---> Using cache
building >  ---> c59020888b1c
building > Step 4/7 : RUN apt-get update && apt-get install -y --no-install-recommends     libglvnd0     libgl1     libglx0     libegl1     libgles2     && rm -rf /var/lib/apt/lists/*
building >  ---> Using cache
building >  ---> 07d8e201cccd
building > Step 5/7 : COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
building >  ---> Using cache
building >  ---> b5685ade76c7
building > Step 6/7 : ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all}
building >  ---> Using cache
building >  ---> 06a721d4a341
building > Step 7/7 : ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all}
building >  ---> Using cache
building >  ---> c3629fc8352a
building > Successfully built c3629fc8352a
Executing command: 
docker run --rm -it -v /home/noodler:/home/noodler   --gpus all  -e DISPLAY -e TERM   -e QT_X11_NO_MITSHM=1   -e XAUTHORITY=/tmp/.dockerlw08_ofr.xauth -v /tmp/.dockerlw08_ofr.xauth:/tmp/.dockerlw08_ofr.xauth   -v /tmp/.X11-unix:/tmp/.X11-unix   -v /etc/localtime:/etc/localtime:ro  c3629fc8352a bash
root@2ce6613e000e:/# gazebo
libGL error: MESA-LOADER: failed to retrieve device information
$ apt-cache policy python3-rocker
python3-rocker:
  Installed: 0.2.10-100
  Candidate: 0.2.10-100
  Version table:
 *** 0.2.10-100 500
        500 http://packages.ros.org/ros2/ubuntu jammy/main amd64 Packages
        100 /var/lib/dpkg/status

$ nvidia-smi
Tue Apr 25 10:11:44 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA T550 Lap...  Off  | 00000000:03:00.0 Off |                  N/A |
| N/A   55C    P0     9W /  N/A |      4MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3037      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+
root@2ce6613e000e:/# gazebo
libGL error: MESA-LOADER: failed to retrieve device information

A few weeks ago I was able to spawn Gazebo from the older version of the Docker image with the same setting, on the same host (apt packages have been updated since then).

*1...Actually, with the autoware Docker image, I can even spawn a GUI e.g. rviz2, from bash commandline attached to the container, which is very nice. Has it been a normal usecase of rocker?? If so I've been missing a great feature. Same error occurs when I pass the GUI's executable command via rocker command's argument.

@130s
Copy link

130s commented Apr 25, 2023

Nevermind the libGL error I reported in #206 (comment), that may be an FAQ as I found #181.

@woensug-choi
Copy link
Contributor

kinu-garage/hut_10sqft#819 (comment) worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants