Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nd rasterizer is 10x slower than rasterizer #68

Open
zubair-irshad opened this issue Nov 2, 2023 · 10 comments
Open

nd rasterizer is 10x slower than rasterizer #68

zubair-irshad opened this issue Nov 2, 2023 · 10 comments

Comments

@zubair-irshad
Copy link

zubair-irshad commented Nov 2, 2023

Hi, Great work! nd rasterizer is around 10x slower than sh rasterizer. To be precise, my model inference time with sh rasterization is 0.008s which gives me >100FPS as described in the original gaussian splatting paper but just adding nd rasterizaiton reduces it to 0.075 s and 13 FPS.

Is there a way to make it better? Any intuition would be greatly appreciated. With nd rasterization, it looks like we lose the benefits i.e. speed of gaussian splatting. Thank you again for the awesome work!

@vye16
Copy link
Collaborator

vye16 commented Nov 2, 2023 via email

@zubair-irshad
Copy link
Author

Thank you for the great intuition and detailed response. My channel size is currently 29 but I am considering increasing the feature size to 128 or even 256 which my worry is it will be slower than 13 FPS. I will try the batched RGB rasterizer as you suggested in a for-loop manner and see if it gives a higher FPS, thank you!

@zubair-irshad
Copy link
Author

zubair-irshad commented Nov 3, 2023

@vye16 Reporting back what I found. Implementing a for loop to rasterize multiple channels in batches i.e. 0-3, 3-5 etc instead of ND rasterization is slightly worse in performance and I didn't find it to improve performance. My guess is due to the for loop which has to run 10 times for the channel size I am trying i.e. 30. Any other intuition to improve performance is greatly appreciated, thank you!

Just to provide more specifics, per iteration time for 640 by 480 image for nd rasterization with N=30 is ~74-76ms, with batched (the one I shared above is 82-85ms, with just sh i.e. 3 channel rendering it is 16ms. The same results translate to fps numbers during inference i.e. 13fps for nd_rasterization with N=30 vs >100fps for sh rasterization only

@zubair-irshad
Copy link
Author

zubair-irshad commented Nov 3, 2023

Update: with batched implementation fps increased to 25 though it is still quite less than >100 for the original rasterizer implementation

@zubair-irshad
Copy link
Author

@vye16 @maturk Any plans on supporting larger register channels i.e. MAX_REGISTER_CHANNELS>3 perhaps 16 or 32 to achieve same level of optimization that native sh rasterizer gives? I am happy to create a PR. Though just increasing this number gives some errors elsewhere for instance AT_ERROR("v_colors must have dimensions (N, 3)"); Should I change anything else in the CUDA code to achieve this?

I am wondering if there are any downsides of specifying 128, 256 or 512 MAX_REGISTER_CHANNELS, would it affect the memory? I think GPUs with larger sizes can support this? Any intuition is greatly appreciated.

@vye16
Copy link
Collaborator

vye16 commented Dec 6, 2023 via email

@zubair-irshad
Copy link
Author

Thanks @vye16! I am happy to work on it and make a PR. Any pointers on where I start/which parts I look at changing first would be appreciated, thanks a lot!

@SeanGuo063
Copy link

SeanGuo063 commented Dec 21, 2023

Any update to this issue? I am also working on rendering high dimensional features, and want to know how to speed up nd rasterizer

@kerrj
Copy link
Collaborator

kerrj commented Feb 14, 2024

#130 works towards this issue, let me know if you try it out! @zubair-irshad

@zubair-irshad
Copy link
Author

This is great, I will check it asap. Thanks @kerrj.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants