You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am unable to create an AI2-THOR controller in any docker container built on the ai2thor-docker image. It seems that Vulkan is having difficulties initializing, and connecting to my local machine's hardware. I expect the docker container to eventually use the hardware of whatever machine I run my training jobs on.
After using ./scripts/build.sh to create the image, trying ./scripts/run.sh gives the following error:
./scripts/run.sh: line 3: cd: too many arguments
Traceback (most recent call last):
File "/app/example_agent.py", line 7, in <module>
controller = ai2thor.controller.Controller(platform=ai2thor.platform.CloudRendering, scene='FloorPlan28')
File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 492, in __init__
self.start(
File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1296, in start
self._start_unity_thread(env, width, height, unity_params, image_name)
File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1020, in _start_unity_thread
raise Exception(message)
Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11
I also tried executing a bash shell through running a container with the -it options, and the same errors popped up. I tried two different scenarios - one using docker and the other using nvidia-docker. These two scenarios differed in their outputs for nvidia-smi, but not while creating an AI2-THOR controller.
Scenario 1 (using docker):
(Run apt install vulkan-utils inside the container in order to use the vulkaninfo command as below.)
$ docker run -it ai2thor-docker:latest-3-9-7
root@<some-address>:/app# nvidia-smi
Failed to initialize NVML: Unknown error
(root@<some-address>:/app# apt install -y vulkan-utils)
root@<some-address>:/app# vulkaninfo
===========
VULKAN INFO
===========
Vulkan Instance Version: 1.1.70
ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Could not get 'vkCreateInstance' via 'vk_icdGetInstanceProcAddr' for ICD libGLX_nvidia.so.0
Cannot create Vulkan instance.
/build/vulkan-UL09PJ/vulkan-1.1.70+dfsg1/demos/vulkaninfo.c:768: failed with VK_ERROR_INCOMPATIBLE_DRIVER
root@<some-address>:/app# python
Python 3.9.7 (default, Oct 4 2022, 04:34:06)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ai2thor.controller import Controller
>>> from ai2thor.platform import CloudRendering
>>> ctr = Controller(platform=CloudRendering)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 492, in __init__
self.start(
File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1296, in start
self._start_unity_thread(env, width, height, unity_params, image_name)
File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1020, in _start_unity_thread
raise Exception(message)
Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11
root@<some-address>:/app# cat ~/.config/unity3d/Allen\ Institute\ for\ Artificial\ Intelligence/AI2-THOR/Player.log
Preloaded 'libjpeg.so'
Preloaded 'libpngslz.so'
Preloaded 'libturbojpeg.so'
Unable to load player prefs
Initialize engine version: 2021.2.0b11 (0bffbba03cb9)
[Subsystems] Discovering subsystems at path /root/.ai2thor/releases/thor-CloudRendering-54535f6b9d76896c2ccb4532727aeda5741a9061/thor-CloudRendering-54535f6b9d76896c2ccb4532727aeda5741a9061_Data/UnitySubsystems
Forcing GfxDevice: Vulkan
GfxDevice: creating device client; threaded=1; jobified=1
[Vulkan init] extensions: count=2
[Vulkan init] extensions: name=VK_EXT_debug_report, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_utils, enabled=0
Vulkan detection: 0
GfxDevice: creating device client; threaded=1; jobified=1
NullGfxDevice:
Version: NULL 1.0 [1.0]
Renderer: Null Device
Vendor: Unity Technologies
Scenario 2 (using nvidia-docker):
The only difference from Scenario 1 is the output of nvidia-smi. (Again, run apt install vulkan-utils in order to use the vulkaninfo command as below.)
$ nvidia-docker run -it ai2thor-docker:latest-3-9-7
root@<some-address>:/app# nvidia-smi
Wed Oct 5 15:37:55 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 49C P8 N/A / N/A | 7MiB / 4096MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
(root@<some-address>:/app# apt install -y vulkan-utils)
root@<some-address>:/app# vulkaninfo
===========
VULKAN INFO
===========
Vulkan Instance Version: 1.1.70
ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Could not get 'vkCreateInstance' via 'vk_icdGetInstanceProcAddr' for ICD libGLX_nvidia.so.0
Cannot create Vulkan instance.
/build/vulkan-UL09PJ/vulkan-1.1.70+dfsg1/demos/vulkaninfo.c:768: failed with VK_ERROR_INCOMPATIBLE_DRIVER
root@<some-address>:/app# python
Python 3.9.7 (default, Oct 4 2022, 04:34:06)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ai2thor.controller import Controller
>>> from ai2thor.platform import CloudRendering
>>> ctr = Controller(platform=CloudRendering)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 492, in __init__
self.start(
File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1296, in start
self._start_unity_thread(env, width, height, unity_params, image_name)
File "/usr/local/lib/python3.9/site-packages/ai2thor/controller.py", line 1020, in _start_unity_thread
raise Exception(message)
Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11
root@<some-address>:/app# cat ~/.config/unity3d/Allen\ Institute\ for\ Artificial\ Intelligence/AI2-THOR/Player.log
Preloaded 'libjpeg.so'
Preloaded 'libpngslz.so'
Preloaded 'libturbojpeg.so'
Unable to load player prefs
Initialize engine version: 2021.2.0b11 (0bffbba03cb9)
[Subsystems] Discovering subsystems at path /root/.ai2thor/releases/thor-CloudRendering-54535f6b9d76896c2ccb4532727aeda5741a9061/thor-CloudRendering-54535f6b9d76896c2ccb4532727aeda5741a9061_Data/UnitySubsystems
Forcing GfxDevice: Vulkan
GfxDevice: creating device client; threaded=1; jobified=1
[Vulkan init] extensions: count=2
[Vulkan init] extensions: name=VK_EXT_debug_report, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_utils, enabled=0
Vulkan detection: 0
GfxDevice: creating device client; threaded=1; jobified=1
NullGfxDevice:
Version: NULL 1.0 [1.0]
Renderer: Null Device
Vendor: Unity Technologies
Running docker run -it --gpus all ai2thor-docker:latest-3-9-7 seems to result in the same scenario as Scenario 2 where nvidia-docker run -t ai2thor-docker:latest-3-9-7 is used.
In both Scenarios 1 and 2, running apt install -y mesa-vulkan-drivers in the container shell to try and resolve the VK_ERROR_INCOMPATIBLE_DRIVER, changes the outputs of vulkaninfo and Player.log to the following.
The vulkaninfo error message changes to VK_ERROR_INITIALIZATION_FAILED instead.
More Vulkan extensions detected in Player.log but no device detected yet. (Player.log is checked after trying to create an AI2-THOR controller once again, in order to create new logs once after installing mesa-vulkan-drivers.)
Now, the creation of an AI2-THOR controller is working perfectly fine outside of the Docker container, on my local system. The output here of vulkaninfo is a large file with meaningful values, and Player.log showing detection of my GfxDevice. I am also able to see the Controller and navigate in the Python shell using ctr.step(action="MoveAhead"), and other such methods.
My local machine has the following relevant specifications:
OS: Ubuntu 20.04
NVIDIA Driver version: 515.65.01
CUDA version: 11.7
NVIDIA GeForce 940MX (4GB Graphics RAM)
This has been a long-term recurring issue with running AI2-THOR on encapsulated/isolated machines/cluster nodes. Vulkan struggles to connect to machine hardware from within a Docker container. I managed to fix this issue once and successfully run AI2-THOR controllers on docker containers using ai2thor-docker and some other steps/hacks, but unfortunately I have lost those changes that I made and I don't recall what I did then. I would massively appreciate help to bring me out of this rut. Thanks!
The text was updated successfully, but these errors were encountered:
Traceback (most recent call last):
File "example_agent.py", line 7, in
controller = ai2thor.controller.Controller(platform=ai2thor.platform.CloudRendering, scene='FloorPlan28')
File "/usr/local/lib/python3.6/dist-packages/ai2thor/controller.py", line 498, in init
host=host,
File "/usr/local/lib/python3.6/dist-packages/ai2thor/controller.py", line 1296, in start
self._start_unity_thread(env, width, height, unity_params, image_name)
File "/usr/local/lib/python3.6/dist-packages/ai2thor/controller.py", line 1020, in _start_unity_thread
raise Exception(message)
Exception: Unity process has exited - check Player.log for errors. Confirm that Vulkan is properly configured on this system using vulkaninfo from the vulkan-utils package. returncode=-11
Hi,
I am unable to create an AI2-THOR controller in any docker container built on the ai2thor-docker image. It seems that Vulkan is having difficulties initializing, and connecting to my local machine's hardware. I expect the docker container to eventually use the hardware of whatever machine I run my training jobs on.
After using
./scripts/build.sh
to create the image, trying./scripts/run.sh
gives the following error:I also tried executing a bash shell through running a container with the
-it
options, and the same errors popped up. I tried two different scenarios - one usingdocker
and the other usingnvidia-docker
. These two scenarios differed in their outputs fornvidia-smi
, but not while creating an AI2-THOR controller.Scenario 1 (using
docker
):(Run
apt install vulkan-utils
inside the container in order to use thevulkaninfo
command as below.)Scenario 2 (using
nvidia-docker
):The only difference from Scenario 1 is the output of
nvidia-smi
. (Again, runapt install vulkan-utils
in order to use thevulkaninfo
command as below.)docker run -it --gpus all ai2thor-docker:latest-3-9-7
seems to result in the same scenario as Scenario 2 wherenvidia-docker run -t ai2thor-docker:latest-3-9-7
is used.apt install -y mesa-vulkan-drivers
in the container shell to try and resolve theVK_ERROR_INCOMPATIBLE_DRIVER
, changes the outputs ofvulkaninfo
andPlayer.log
to the following.vulkaninfo
error message changes toVK_ERROR_INITIALIZATION_FAILED
instead.Player.log
but no device detected yet. (Player.log
is checked after trying to create an AI2-THOR controller once again, in order to create new logs once after installing mesa-vulkan-drivers.)Now, the creation of an AI2-THOR controller is working perfectly fine outside of the Docker container, on my local system. The output here of
vulkaninfo
is a large file with meaningful values, and Player.log showing detection of myGfxDevice
. I am also able to see the Controller and navigate in the Python shell usingctr.step(action="MoveAhead")
, and other such methods.My local machine has the following relevant specifications:
This has been a long-term recurring issue with running AI2-THOR on encapsulated/isolated machines/cluster nodes. Vulkan struggles to connect to machine hardware from within a Docker container. I managed to fix this issue once and successfully run AI2-THOR controllers on docker containers using
ai2thor-docker
and some other steps/hacks, but unfortunately I have lost those changes that I made and I don't recall what I did then. I would massively appreciate help to bring me out of this rut. Thanks!The text was updated successfully, but these errors were encountered: