Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gazebo Fortress - Loading Models Problem with Ogre2.2 #2365

Open
Space-Swarm opened this issue Apr 8, 2024 · 11 comments
Open

Gazebo Fortress - Loading Models Problem with Ogre2.2 #2365

Space-Swarm opened this issue Apr 8, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@Space-Swarm
Copy link

Space-Swarm commented Apr 8, 2024

Hi there,

I’m currently using Gazebo Fortress with the SubT Simulator on Ubuntu 18.04, and the models in the world are taking up to 10 minutes to load with a medium complexity map. I have the models saved locally in the standard cache location (.ignition/fuel/fuel.ignitionrobotics.org/openrobotics/models) and the cache files are accessed only after Ogre 2.2 times out when attempting to load texture & deletes the buffer memory.

This issue doesn’t occur with Ignition Dome, and the map loads normally with Ignition Dome.

This is likely a physics engine related problem, as Ignition Dome doesn’t use Ogre2. The log from Ogre2 when the map is being loaded says the following:
“17:33:42: Can’t assign material scene::Material(55155) because this Material does not exist. Have you forgotten to define it in a .material script?
17:33:42: Can’t assign material scene::Material(55154) because this Material does not exist. Have you forgotten to define it in a .material script?
17:33:42: WARNING: Deleting mapped buffer without having it unmapped. This is often sign of a resource leak or a bad pattern. Umapping the buffer for you…”

Once the buffer is unmapped, the scene loads.

Possible Causes:

  1. Ogre2 has an issue loading textures
  2. There is an issue with namespace changes from ignition::gazebo to gz::sim
  3. There is a problem with the setup of the Ignition Fortress & Ignition Fuel

Related errors occur:

  1. EGL related

    " 17:28:41: Trying to init device: EGL_EXT_device_drm /dev/dri/card0…
    17:28:41: Created GL 4.5 context for device EGL_EXT_device_drm /dev/dri/card0
    17:28:41: Destroying device: EGL_EXT_device_drm /dev/dri/card0…
    17:28:41: EGL Device: EGL_EXT_device_drm /dev/dri/card1
    17:28:41: Trying to init device: EGL_EXT_device_drm /dev/dri/card1…
    17:28:41: OGRE EXCEPTION(3:RenderingAPIException): eglInitialize failed for device EGL_EXT_device_drm [ /dev/dri/card1 in EGLSupport::getGLDisplay at /var/lib/jenkins/workspace/ogre-2.2-debbuilder/repo/RenderSystems/GL3Plus/src/windowing/EGL/PBuffer/OgreEglPBufferSupport.cpp (line 322)
    17:28:41: OGRE EXCEPTION(3:RenderingAPIException): eglInitialize failed for device EGL_EXT_device_drm /dev/dri/card1 in EGLSupport::getGLDisplay at /var/lib/jenkins/workspace/ogre-2.2-debbuilder/repo/RenderSystems/GL3Plus/src/windowing/EGL/PBuffer/OgreEglPBufferSupport.cpp (line 322)
    17:28:41: Destroying device: EGL_EXT_device_drm /dev/dri/card1…
    17:28:41: EGL Device: EGL_MESA_device_software
    17:28:41: Trying to init device: EGL_MESA_device_software…
    17:28:41: Created GL 3.3 context for device EGL_MESA_device_software [
    17:28:41: Destroying device: EGL_MESA_device_software…
    17:28:41: EGL Device: EGL_MESA_device_software
    17:28:41: Trying to init device: EGL_MESA_device_software…
    17:28:41: Created GL 3.3 context for device EGL_MESA_device_software
    17:28:41: Destroying device: EGL_MESA_device_software …"

Could be caused by a graphics card issue, or this may be a red herring.

  1. Level manager error:
    [Err] [LevelManager.cc:218] Could not find a plugin tag with name gz::sim. Levels and distributed simulation will not work.
    Found that it’s likely benign and can be ignored - Launch ignition through network issue · Issue #1350 · gazebosim/gz-sim · GitHub

Things I have tried:

Saved cache files in different likely locations, and edited multiple environment variables of gazebo to point towards the files
Set the uri of all the models that are being pointed towards, to be towards the local versions using model:// syntax
Compiled Ignition Fuel from source with debug mode on and debugged it using GDB. Tried multiple versions, and the code is pretty box standard from what I can tell.
Installing different versions of Ignition Gazebo
Using Ogre1. Ogre1 did the same thing without the material error in the log, but the same time hanging whilst the models are being loaded into the world.

Any suggestions on where to go from here? I’ve tried everything I can think of and it’s really slowing my development time down! Thank you in advance for your help.

@Space-Swarm Space-Swarm added the bug Something isn't working label Apr 8, 2024
@iche033
Copy link
Contributor

iche033 commented Apr 8, 2024

Are there any errors printed in the console?

Dome -> Fortress: we upgraded ogre from 2.1 to 2.2 but not sure if that's the reason or not.

17:33:42: Can’t assign material scene::Material(55154) because this Material does not exist. Have you forgotten to define it in a .material script?
17:33:42: WARNING: Deleting mapped buffer without having it unmapped. This is often sign of a resource leak or a bad pattern. Umapping the buffer for you…”

These warnings should be ok to ignore. They still happen in newer version of gazebo.

17:28:41: OGRE EXCEPTION(3:RenderingAPIException): eglInitialize failed for device EGL_EXT_device_drm [ /dev/dri/card1 in EGLSupport::getGLDisplay at /var/lib/jenkins/workspace/ogre-2.2-debbuilder/repo/RenderSystems/GL3Plus/src/windowing/EGL/PBuffer/OgreEglPBufferSupport.cpp (line 322)

Related issue: gazebosim/gz-rendering#587 - Logged when OGRE tries to query EGL devices - the OGRE 2 dev says it should be harmless.

Using Ogre1. Ogre1 did the same thing without the material error in the log, but the same time hanging whilst the models are being loaded into the world.

Maybe it's a Fuel server issue (https://app.gazebosim.org/). Some older models on Fuel points to ignitionrobotics.org instead of gazebosim.org and may no longer work. Launching gz sim should show errors in console mentioning that it timed out downloading these models.

@Space-Swarm
Copy link
Author

Space-Swarm commented Apr 8, 2024

Thanks so much for the swift reply! This is the error outprinted from the console for loading initial models:

"[Wrn] [FuelClient.cc:1978] The fuel.ignitionrobotics.org URL is deprecrated. Pleasse change https://fuel.ignitionrobotics.org/1.0/OpenRobotics/models/Tunnel Tile 5 to https://fuel.gazebosim.org/1.0/OpenRobotics/models/Tunnel Tile 5

However that only lasts 30 seconds, and is not that big of an issue in terms of load time.

The proper delay starts after the following is outprinted:

"[Dbg] [Sensors.cc:270] Initializing render context
[Msg] Loading plugin [ignition-rendering-ogre2]"

Followed soon after by these errors, which is where the major time out happens:

     "[Wrn] [Component.hh:144] Trying to serialize component with data type [St6vectorIdSaIdEE], which doesn't have `operator<<`. Component will not be serialized.
      [Wrn] [Component.hh:144] Trying to serialize component with data type [St6vectorIdSaIdEE], which doesn't have `operator<<`. Component will not be serialized.
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Component.hh:189] Trying to deserialize component with data type [St6vectorIdSaIdEE], which doesn't have `operator>>`. Component will not be deserialized."

Directly after this line, which is when the rendering memory buffer is deleted, the models load, and the world is shown with the robot in the GUI.

I've tried changing the deprecated ignitionrobotics.org syntax before as well as replacing it with the local model syntax. All it did is result in the deprecated error not being printed out.

From what you've said it sounds like the error is most likely an issue created by Ogre2.2. Is there a way to implement the memory buffer deletion sooner with Ogre2.2?

An alternative reason: Ignition Fuel is trying to load models that are no longer accessible. Is there a way to setup Ignition Fortress to use local files by default prior to attempting to download?

@iche033
Copy link
Contributor

iche033 commented Apr 11, 2024

An alternative reason: Ignition Fuel is trying to load models that are no longer accessible. Is there a way to setup Ignition Fortress to use local files by default prior to attempting to download?

Gazebo looks at the local cache first before downloading them from fuel. So if the models are available in ~/.gz/fuel/.. it should just load those.

From what you've said it sounds like the error is most likely an issue created by Ogre2.2. Is there a way to implement the memory buffer deletion sooner with Ogre2.2?

I'm not sure if that's the reason for slow down. You mentioned that it's also hanging with Ogre1 so that makes me thing it's caused by something else.

One thing I would try first is to comment out some models in the world and see if it's because certain models are taking too long to load.

@Space-Swarm
Copy link
Author

Space-Swarm commented Apr 15, 2024

Thanks for the suggestion @iche033! I have been using the simple_tunnel_02 map & one other map, and have commented out different model files. I removed all of the model files except the robot one by one until all model files were removed, and tried different robots (it is very funny to watch a single robot falling through a non existent floor!). All/any of the model files cause this error to crop up in the Ogre2 rendering log:

"Can't assign material scene::Material(65447) because this Material does not exist. Have you forgotten to define it in a .material script?"

It is only after the memory is cleared that the GUI loads with the robot:
WARNING: Deleting mapped buffer without having it unmapped. This is often sign of a resource leak or a bad pattern. Umapping the buffer for you...

After digging into it further and trying different approaches, I have reached the following conclusion which likely explains the situation:

There is a porting problem with Ogre2.2 and Ogre2.1 when the updates were rolled out to gz-rendering6. The reason the error still happens with Ogre1, is the version of gz-rendering6 remains the same, so the code still hangs when launched with the earlier Ogre1 engine instead.

I have identified the particular push when related areas were discussed in gz-rendering6: gazebosim/gz-rendering#223

Tagging the relevant contributors from that push - @mjcarroll @ahcorde @chapulina @darksylinc

This also is in line with what I am experiencing with EGL support with Ignition Fortress - the headless version is actually far slower than the normal GUI version, which it shouldn't be. This means that the port from Ogre2.1 to Ogre2.2 has a fundamental bug of some kind in gz-rendering6 and the EGL rendering is not happening as it should. It could be that there are inefficiencies in Ogre2.2's way of rendering but these bugs I'm experiencing are leading me to think it's a bug based in gz-rendering6/ignition fortress, rather than Ogre2.2 based. I think this may solve related bugs which cropped up with Ignition Fortress (e.g. #1116 and #1370), so I think it's worth other people taking a look at. It's a bit beyond my skill level to solve myself unfortunately.

I believe there also may be a difference in how the two different .materials files are structured in the fuel models vs what comes loaded in Ogre2.2 and works already. I have attached the two corresponding files which highlight that difference, although it may not be a contributing factor.

skybox.material (ogre2.2).txt
tunnel_tile.material (fuel model material).txt

@peci1
Copy link
Contributor

peci1 commented Apr 19, 2024

I'm now running a very similar installation and I don't observe this delay. I run it on a NUC with Intel GPU, though. And also with ctu_cras_norlab_absolem_sensor_config_1 .

@peci1
Copy link
Contributor

peci1 commented Apr 19, 2024

How long is the delay you observe? Mine is about 1 minute from ign-rendering-ogre2 loading start (with a single robot model).

@Space-Swarm
Copy link
Author

I've found this very dependent on the map in question and possibly the specs of the PC. It can range from 1 minute to 10-15 minutes with a map like tunnel_circuit_01. Should I open up a separate issue regarding EGL processing headless being slower than the normal run through with fortress?

@darksylinc
Copy link
Contributor

I've found this very dependent on the map in question and possibly the specs of the PC. It can range from 1 minute to 10-15 minutes with a map like tunnel_circuit_01. Should I open up a separate issue regarding EGL processing headless being slower than the normal run through with fortress?

Is that on Linux or on Windows (or Windows via WSL)?

When you mentioned headless, I remembered that if no monitors are plugged to Windows, the OS will disable the GPUs and tons of problems appear. You could be running on full SW emulation.

@Space-Swarm
Copy link
Author

Space-Swarm commented Apr 23, 2024

I've found this very dependent on the map in question and possibly the specs of the PC. It can range from 1 minute to 10-15 minutes with a map like tunnel_circuit_01. Should I open up a separate issue regarding EGL processing headless being slower than the normal run through with fortress?

Is that on Linux or on Windows (or Windows via WSL)?

When you mentioned headless, I remembered that if no monitors are plugged to Windows, the OS will disable the GPUs and tons of problems appear. You could be running on full SW emulation.

Thanks so much for the feedback @darksylinc, this is a bit of a headscratcher! It is on native Linux 20.04 with ROS Noetic and Ignition Fortress and a monitor plugged in. I am running headless via the following launch script using the "headless:=true" option: https://github.com/osrf/subt/blob/26fd5da5cc0d7dbbcd269b30752ca305d2bba3d5/subt_ign/launch/competition.ign#L387C13-L387C14

I've tested the headless on different run throughs using 1, 3 and 5 robots for 1 hour worth of elapsed time (real time, rather than sim time) and the RTF for the headless run throughs are slower than the normal run with the GUI in every case. I believe it indicates 1 of 3 things:

  1. The port of gzrendering6 from Ogre2.1 and Ogre2.2 was implemented with a bug which makes headless slower
  2. The setup using that particular launch file and the SubT simulator is configured wrong
  3. Ogre2.2 renders things less effectively in headless mode

I believe it is likely the first result given the replicated error we are encountering with loading materials in the gazeborendering log. Any help would be much appreciated as solving this bug around headless being slower is really important to moving forward with my PhD work.

@iche033
Copy link
Contributor

iche033 commented Apr 25, 2024

AFAIK the "headless:=true" option through ROS launch script just disables GUI window but does not actually enable EGL (a little confusing because both are sometimes referred to as headless).

the RTF for the headless run throughs are slower than the normal run with the GUI in every case

This is really weird. I haven't seen this issue before.

Are you able to share your ogre2.log?

@darksylinc
Copy link
Contributor

darksylinc commented Apr 25, 2024

The simplest way to address this is to debug it yourself:

  1. OPTIONAL: Install debug symbols for system libs: sudo apt install libc6-dbg libstdc++6-10-dbg (note: libstdc++X-Y-dbg may be different depending on your distro version).
  2. Build gazebo from sources in Debug mode. To do so run colcon build --cmake-args -DBUILD_TESTING=OFF -DCMAKE_BUILD_TYPE=Debug --merge-install when running the colcon step (I disabled tests to speed up compilation since they shouldn't be needed).
  3. Verify your debug gazebo buikd works as expected.
  4. Install QtCreator.
  5. Start a terminal.
  6. Just like in the documentation, run the command (with the period at the beginning) . ~/workspace/install/setup.bash so that environment variables are set.
  7. Launch QtCreator FROM WITHIN THIS TERMINAL (so that QtCreator inherits the environment variables from the previous step).
  8. Go to Debug -> Start and Debug external application.
  9. Enter gazebo's executable and parameters as in this picture (look at Local excutable, Cmd line arguments, and Working directory):
    Screenshot_2024-04-24_22-19-42
  10. Launch it

Once it launched, start doing what triggers this bug (assuming it requires further interaction).
When it starts taking too long (i.e. the bug finally manifests), hit "Pause":

Screenshot_2024-04-24_22-24-21

Once paused it may look like this:
Screenshot_2024-04-24_22-26-03

What's relevant is the call stack and the threads enumeration.
This is the callstack, please Right Click -> "Copy contents to Clipboard" and paste it here:

Screenshot_2024-04-24_22-27-14

The other thing is the Threads button:

Untitled

In this example we have very few threads, but Gazebo has A LOT of threads. Go one by one (you can use the mouse wheel to scroll through the Threads very quickly) and see if there is something suspicious that looks stuck. Each time you change threads, the call stack changes. Anything that looks like relevant or "stuck" to you, please paste it here.

Then you can click on the Pause button again to unpause it; and repeat this step to see if it's stuck in the same place or somewhere else. It is relevant to know if the app is stuck in the same place or keeps jumping between different locations.

If successful the callstack for each thread you paste should look like this:

                                                                                                         
                                                                                                         
1  futex_abstimed_wait_cancelable                         futex-internal.h           320  0x7ffff7d1d7d1 
2  __pthread_cond_wait_common                             pthread_cond_wait.c        520  0x7ffff7d1d7d1 
3  __pthread_cond_timedwait                               pthread_cond_wait.c        665  0x7ffff7d1d7d1 
4  ??                                                                                     0x7fffb4798f4a 
5  ??                                                                                     0x7fffb4793ec1 
6  ??                                                                                     0x7fffb4793e73 
7  Ogre::VulkanWindowSwapChainBased::acquireNextSwapchain OgreVulkanWindow.cpp       538  0x7fffc59826f7 
8  Ogre::VulkanQueue::commitAndNextCommandBuffer          OgreVulkanQueue.cpp        1295 0x7fffc5943978 
9  Ogre::VulkanDevice::commitAndNextCommandBuffer         OgreVulkanDevice.cpp       685  0x7fffc5911f35 
10 Ogre::VulkanRenderSystem::_endFrameOnce                OgreVulkanRenderSystem.cpp 2028 0x7fffc595b109 
11 Ogre::CompositorManager2::_swapAllFinalTargets         OgreCompositorManager2.cpp 828  0x7ffff70e2e7a 
12 Ogre::Root::_updateAllRenderTargets                    OgreRoot.cpp               1575 0x7ffff6ee6201 
13 Ogre::Root::renderOneFrame                             OgreRoot.cpp               1104 0x7ffff6ee6108 
14 Demo::GraphicsSystem::update                           GraphicsSystem.cpp         427  0x22be68       
15 Demo::MainEntryPoints::mainAppSingleThreaded           MainLoopSingleThreaded.cpp 135  0x23b865       
16 mainApp                                                PbsMaterials.cpp           25   0x222ebb       
17 main                                                   MainEntryPointHelper.h     40   0x222db9       

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Inbox
Development

No branches or pull requests

4 participants