Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Iron] Longer loading times for dedicated GPUs #1131

Open
firesurfer opened this issue Jan 27, 2024 · 4 comments
Open

[Iron] Longer loading times for dedicated GPUs #1131

firesurfer opened this issue Jan 27, 2024 · 4 comments

Comments

@firesurfer
Copy link

Hi,
we have a rather large robot mesh, which takes a bit of time to load in general.
For development we have some machines with a dedicated (AMD) GPU and some machines with an integrated (AMD /Intel) GPU.
The loading times for the machines with a dedicated gpus seem to be way longer than for the ones with an integrated GPU.

On the machines with integrated GPUs we typically see loading times for our model of around ~10-20s.
The most extreme loading time can be seen on my developer machine where I have to wait for ~1.5min for the model to be fully loaded in rviz.
What's interresting / puzzling is that my machine has way more potent hardware (Ryzen 9 7950X, RX 6800) than the machines with an integrated GPU.

I am running the ROS installation in a podman container on my machine but I also saw the same increase of the loading time on native installations.
I didn't make any exact measurements so far but as I wrote the difference in loading times is so vast that it is easy to notice.

@tfoote
Copy link
Contributor

tfoote commented Jan 29, 2024

Is this correlated with shared memory address space? Such that the model doesn't need to get copied into the separate memory space. How big is your model? File size/number of vertices?

As an aside, you may want to think about using a down-sampled mesh for visualization to avoid these load times. It's very easy to export very high poly count meshes from CAD tools, but many of those features aren't relevant for visualization and can significantly reduce the computational load and enable the use of lower performance hardware. A good threshold to think about is how much of the details can you usually see in the visualizer window.

@firesurfer
Copy link
Author

firesurfer commented Jan 30, 2024

Is this correlated with shared memory address space? Such that the model doesn't need to get copied into the separate memory space. How big is your model? File size/number of vertices?

The total size of all meshes is 57MB most of them are stl files but there are also some dae files. According to meshlab the total amount of vertices is 559286. Some of the meshes are used multiple times in the model.

I agree with you that one wants to reduce the amount of polygons and we already did that.

The reason I made this issue is the vast difference in loading times between integrated GPUs (where the time span between rviz window first showing up and the model loading is ~7s on a Ryzen 7 5700G - I just stopped the time right now) and dedicated GPUs where the same models takes more that a minute to load on a way more potent system (Ryzen 9 7950X, RX 6800).

I am not very familiar with 3D graphics programming but as far as I know it can make quite a difference how exactly meshes are loaded from system ram to the gpu ram. This bottleneck does not exist on a system with an integrated gpu.

@tfoote
Copy link
Contributor

tfoote commented Mar 8, 2024

Those are very big meshes with a lot of vertices. A more common mesh file size is ~50kB. I'm using the PR2 Description as a reference: https://github.com/PR2/pr2_common/tree/melodic-devel/pr2_description/meshes/base_v0

How much did you reduce your complexity? This still seems very high complexity, to the level that it's actually more than you can effectively render, with the number of vertices potentially approaching the same order of magnitude as the number of pixels in a rendering window.

The discrete GPUs have their own memory so there's an inherent requirement that resources are copied into their dedicated memory. Which consequently requires an alternative codepath. And that hardware lane has inherent bandwidth limitations.

There's likely room for improvements in the optimization, but I think that the place with the most space for improvement is to optimize the meshes.

@firesurfer
Copy link
Author

firesurfer commented Mar 14, 2024

I agree with you that shrinking the meshes will definitely help - but I do not think that this is the actual issue. If the complexity would be too high we would see this in the fps. But there we still run into the frame limit of 31fps. Furthermore I don't think that 57MB of meshes is too much for a modern PC. In modern PC games for example there are way more / larger meshes. I did a bit of research on the bandwidth between system memory and gpu memory. I don't think that the memory bandwidth is the limiting factor in this case as the PCIe link should be able to transfer multiple GB/s.

Further reducing the mesh size is rather difficult for us as in our case we need export them automatically from our CAD model (we also have a partially automatic urdf export). Afterwards we use the blender decimate tool to reduce the amount of vertices in the meshes. Our machine furthermore has multiple parts where we need a high amount of detail in order to visually confirm certain positions / interaction of parts with each other. We already reduced the complexity as much as possible for most parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants