Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unity Process exiting - no Vulkan detection of local GPU [only in the Docker container] #14

Open
knightpegasus7382 opened this issue Oct 5, 2022 · 2 comments

Comments

@knightpegasus7382
Copy link

knightpegasus7382 commented Oct 5, 2022

Hi,

I am unable to create an AI2-THOR controller in any docker container built on the ai2thor-docker image. It seems that Vulkan is having difficulties initializing, and connecting to my local machine's hardware. I expect the docker container to eventually use the hardware of whatever machine I run my training jobs on.

After using ./scripts/build.sh to create the image, trying ./scripts/run.sh gives the following error:

./scripts/run.sh: line 3: cd: too many arguments
Traceback (most recent call last):
  File "/app/example_agent.py", line 7, in <module>
    controller = ai2thor.controller.Controller(platform=ai2thor.platform.CloudRendering, scene='FloorPlan28')
  File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 492, in __init__
    self.start(
  File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1296, in start
    self._start_unity_thread(env, width, height, unity_params, image_name)
  File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1020, in _start_unity_thread
    raise Exception(message)
Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11

I also tried executing a bash shell through running a container with the -it options, and the same errors popped up. I tried two different scenarios - one using docker and the other using nvidia-docker. These two scenarios differed in their outputs for nvidia-smi, but not while creating an AI2-THOR controller.

Scenario 1 (using docker):

(Run apt install vulkan-utils inside the container in order to use the vulkaninfo command as below.)

$ docker run -it ai2thor-docker:latest-3-9-7
root@<some-address>:/app# nvidia-smi
Failed to initialize NVML: Unknown error

(root@<some-address>:/app# apt install -y vulkan-utils)
root@<some-address>:/app# vulkaninfo
===========
VULKAN INFO
===========

Vulkan Instance Version: 1.1.70

ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Could not get 'vkCreateInstance' via 'vk_icdGetInstanceProcAddr' for ICD libGLX_nvidia.so.0
Cannot create Vulkan instance.
/build/vulkan-UL09PJ/vulkan-1.1.70+dfsg1/demos/vulkaninfo.c:768: failed with VK_ERROR_INCOMPATIBLE_DRIVER

root@<some-address>:/app# python
Python 3.9.7 (default, Oct  4 2022, 04:34:06) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ai2thor.controller import Controller
>>> from ai2thor.platform import CloudRendering
>>> ctr = Controller(platform=CloudRendering)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 492, in __init__
    self.start(
  File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1296, in start
    self._start_unity_thread(env, width, height, unity_params, image_name)
  File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1020, in _start_unity_thread
    raise Exception(message)
Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11

root@<some-address>:/app# cat ~/.config/unity3d/Allen\ Institute\ for\ Artificial\ Intelligence/AI2-THOR/Player.log 
Preloaded 'libjpeg.so'
Preloaded 'libpngslz.so'
Preloaded 'libturbojpeg.so'
Unable to load player prefs
Initialize engine version: 2021.2.0b11 (0bffbba03cb9)
[Subsystems] Discovering subsystems at path /root/.ai2thor/releases/thor-CloudRendering-54535f6b9d76896c2ccb4532727aeda5741a9061/thor-CloudRendering-54535f6b9d76896c2ccb4532727aeda5741a9061_Data/UnitySubsystems
Forcing GfxDevice: Vulkan
GfxDevice: creating device client; threaded=1; jobified=1
[Vulkan init] extensions: count=2
[Vulkan init] extensions: name=VK_EXT_debug_report, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_utils, enabled=0
Vulkan detection: 0
GfxDevice: creating device client; threaded=1; jobified=1
NullGfxDevice:
    Version:  NULL 1.0 [1.0]
    Renderer: Null Device
    Vendor:   Unity Technologies

Scenario 2 (using nvidia-docker):

The only difference from Scenario 1 is the output of nvidia-smi. (Again, run apt install vulkan-utils in order to use the vulkaninfo command as below.)

$ nvidia-docker run -it ai2thor-docker:latest-3-9-7
root@<some-address>:/app# nvidia-smi
Wed Oct  5 15:37:55 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   49C    P8    N/A /  N/A |      7MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

(root@<some-address>:/app# apt install -y vulkan-utils)
root@<some-address>:/app# vulkaninfo
===========
VULKAN INFO
===========

Vulkan Instance Version: 1.1.70

ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Could not get 'vkCreateInstance' via 'vk_icdGetInstanceProcAddr' for ICD libGLX_nvidia.so.0
Cannot create Vulkan instance.
/build/vulkan-UL09PJ/vulkan-1.1.70+dfsg1/demos/vulkaninfo.c:768: failed with VK_ERROR_INCOMPATIBLE_DRIVER

root@<some-address>:/app# python
Python 3.9.7 (default, Oct  4 2022, 04:34:06) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ai2thor.controller import Controller
>>> from ai2thor.platform import CloudRendering
>>> ctr = Controller(platform=CloudRendering)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 492, in __init__
    self.start(
  File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1296, in start
    self._start_unity_thread(env, width, height, unity_params, image_name)
  File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1020, in _start_unity_thread
    raise Exception(message)
Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11

root@<some-address>:/app# cat ~/.config/unity3d/Allen\ Institute\ for\ Artificial\ Intelligence/AI2-THOR/Player.log 
Preloaded 'libjpeg.so'
Preloaded 'libpngslz.so'
Preloaded 'libturbojpeg.so'
Unable to load player prefs
Initialize engine version: 2021.2.0b11 (0bffbba03cb9)
[Subsystems] Discovering subsystems at path /root/.ai2thor/releases/thor-CloudRendering-54535f6b9d76896c2ccb4532727aeda5741a9061/thor-CloudRendering-54535f6b9d76896c2ccb4532727aeda5741a9061_Data/UnitySubsystems
Forcing GfxDevice: Vulkan
GfxDevice: creating device client; threaded=1; jobified=1
[Vulkan init] extensions: count=2
[Vulkan init] extensions: name=VK_EXT_debug_report, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_utils, enabled=0
Vulkan detection: 0
GfxDevice: creating device client; threaded=1; jobified=1
NullGfxDevice:
    Version:  NULL 1.0 [1.0]
    Renderer: Null Device
    Vendor:   Unity Technologies
  • Running docker run -it --gpus all ai2thor-docker:latest-3-9-7 seems to result in the same scenario as Scenario 2 where nvidia-docker run -t ai2thor-docker:latest-3-9-7 is used.
  • In both Scenarios 1 and 2, running apt install -y mesa-vulkan-drivers in the container shell to try and resolve the VK_ERROR_INCOMPATIBLE_DRIVER, changes the outputs of vulkaninfo and Player.log to the following.
    • The vulkaninfo error message changes to VK_ERROR_INITIALIZATION_FAILED instead.
    • More Vulkan extensions detected in Player.log but no device detected yet. (Player.log is checked after trying to create an AI2-THOR controller once again, in order to create new logs once after installing mesa-vulkan-drivers.)
root@<some-address>:/app# vulkaninfo
===========
VULKAN INFO
===========

Vulkan Instance Version: 1.1.70

ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Could not get 'vkCreateInstance' via 'vk_icdGetInstanceProcAddr' for ICD libGLX_nvidia.so.0
/build/vulkan-UL09PJ/vulkan-1.1.70+dfsg1/demos/vulkaninfo.c:2700: failed with VK_ERROR_INITIALIZATION_FAILED

root@<some-address>:/app# cat ~/.config/unity3d/Allen\ Institute\ for\ Artificial\ Intelligence/AI2-THOR/Player.log 
Preloaded 'libjpeg.so'
Preloaded 'libpngslz.so'
Preloaded 'libturbojpeg.so'
Unable to load player prefs
Initialize engine version: 2021.2.0b11 (0bffbba03cb9)
[Subsystems] Discovering subsystems at path /root/.ai2thor/releases/thor-CloudRendering-54535f6b9d76896c2ccb4532727aeda5741a9061/thor-CloudRendering-54535f6b9d76896c2ccb4532727aeda5741a9061_Data/UnitySubsystems
Forcing GfxDevice: Vulkan
GfxDevice: creating device client; threaded=1; jobified=1
[Vulkan init] extensions: count=16
[Vulkan init] extensions: name=VK_KHR_device_group_creation, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_fence_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_memory_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_semaphore_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_get_physical_device_properties2, enabled=1
[Vulkan init] extensions: name=VK_KHR_get_surface_capabilities2, enabled=0
[Vulkan init] extensions: name=VK_KHR_surface, enabled=1
[Vulkan init] extensions: name=VK_KHR_wayland_surface, enabled=0
[Vulkan init] extensions: name=VK_KHR_xcb_surface, enabled=0
[Vulkan init] extensions: name=VK_KHR_xlib_surface, enabled=0
[Vulkan init] extensions: name=VK_KHR_display, enabled=1
[Vulkan init] extensions: name=VK_EXT_direct_mode_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_acquire_xlib_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_display_surface_counter, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_report, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_utils, enabled=0
Vulkan detection: 0
GfxDevice: creating device client; threaded=1; jobified=1
NullGfxDevice:
    Version:  NULL 1.0 [1.0]
    Renderer: Null Device
    Vendor:   Unity Technologies

Now, the creation of an AI2-THOR controller is working perfectly fine outside of the Docker container, on my local system. The output here of vulkaninfo is a large file with meaningful values, and Player.log showing detection of my GfxDevice. I am also able to see the Controller and navigate in the Python shell using ctr.step(action="MoveAhead"), and other such methods.

My local machine has the following relevant specifications:

  • OS: Ubuntu 20.04
  • NVIDIA Driver version: 515.65.01
  • CUDA version: 11.7
  • NVIDIA GeForce 940MX (4GB Graphics RAM)

This has been a long-term recurring issue with running AI2-THOR on encapsulated/isolated machines/cluster nodes. Vulkan struggles to connect to machine hardware from within a Docker container. I managed to fix this issue once and successfully run AI2-THOR controllers on docker containers using ai2thor-docker and some other steps/hacks, but unfortunately I have lost those changes that I made and I don't recall what I did then. I would massively appreciate help to bring me out of this rut. Thanks!

@csupika
Copy link

csupika commented Mar 19, 2023

Any update on this?

@ypp2020
Copy link

ypp2020 commented Sep 11, 2023

when i add "--privileged" tag in run.sh,it seems work fine. but some people say it is not safe,in this blog https://juejin.cn/s/docker%20x11%20authorization%20required%20but%20no%20authorization%20protocol%20specified
i tried the second way,
i add these in run.sh file
xhost +local:docker
docker run --rm -it -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix ai2thor-docker:latest python3 example_agent.py

then i find the same error like u :

Traceback (most recent call last):
File "example_agent.py", line 7, in
controller = ai2thor.controller.Controller(platform=ai2thor.platform.CloudRendering, scene='FloorPlan28')
File "/usr/local/lib/python3.6/dist-packages/ai2thor/controller.py", line 498, in init
host=host,
File "/usr/local/lib/python3.6/dist-packages/ai2thor/controller.py", line 1296, in start
self._start_unity_thread(env, width, height, unity_params, image_name)
File "/usr/local/lib/python3.6/dist-packages/ai2thor/controller.py", line 1020, in _start_unity_thread
raise Exception(message)
Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants