nd rasterizer is 10x slower than rasterizer #68

zubair-irshad · 2023-11-02T22:52:43Z

Hi, Great work! nd rasterizer is around 10x slower than sh rasterizer. To be precise, my model inference time with sh rasterization is 0.008s which gives me >100FPS as described in the original gaussian splatting paper but just adding nd rasterizaiton reduces it to 0.075 s and 13 FPS.

Is there a way to make it better? Any intuition would be greatly appreciated. With nd rasterization, it looks like we lose the benefits i.e. speed of gaussian splatting. Thank you again for the awesome work!

vye16 · 2023-11-02T23:31:48Z

Hi! What N are you using? In rasterization, each pixel requires an N-d array of workspace memory. For RGB, we can fit that in register memory, and can specify this statically at compile time. We wrote N-d for the case that the necessary workspace exceeds available register memory, and must be in global memory. This means we can't make the same kinds of optimizations in the RGB rasterizer. If this is the case for you, then you can either stick with the global memory situation, or you can rasterize in batches with the current optimized RGB rasterizer (channels 0-3, 3-6, etc). We're considering adding an in-between version of the rasterizer for `MAX_REGISTER_CHANNELS=16` with similar optimizations to the RGB rasterizer.

…

On Thu, Nov 2, 2023 at 3:52 PM Zubair Irshad ***@***.***> wrote: Hi, Great work! nd rasterizer is around 10x slower than sh rasterizer. To be precise, my model inference time with sh rasterization is 0.008s which gives me >100FPS as described in the original gaussian splatting paper but just adding nd rasterizaiton reduces it to 0.075 s and 13 FPS. Is there a way to make it better? Any intuition would be greatly appreciated? With nd rasterization, it looks like we loose the benefits i.e. speed of gaussian splatting. Thank you again for the awesome work! — Reply to this email directly, view it on GitHub <#68>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLOKW3JAPS4MXO7IX2BRLTYCQP4PAVCNFSM6AAAAAA63TUTHCVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3TKMJWGQ3DKMA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

zubair-irshad · 2023-11-02T23:37:46Z

Thank you for the great intuition and detailed response. My channel size is currently 29 but I am considering increasing the feature size to 128 or even 256 which my worry is it will be slower than 13 FPS. I will try the batched RGB rasterizer as you suggested in a for-loop manner and see if it gives a higher FPS, thank you!

zubair-irshad · 2023-11-03T03:33:16Z

@vye16 Reporting back what I found. Implementing a for loop to rasterize multiple channels in batches i.e. 0-3, 3-5 etc instead of ND rasterization is slightly worse in performance and I didn't find it to improve performance. My guess is due to the for loop which has to run 10 times for the channel size I am trying i.e. 30. Any other intuition to improve performance is greatly appreciated, thank you!

Just to provide more specifics, per iteration time for 640 by 480 image for nd rasterization with N=30 is ~74-76ms, with batched (the one I shared above is 82-85ms, with just sh i.e. 3 channel rendering it is 16ms. The same results translate to fps numbers during inference i.e. 13fps for nd_rasterization with N=30 vs >100fps for sh rasterization only

zubair-irshad · 2023-11-03T04:10:32Z

Update: with batched implementation fps increased to 25 though it is still quite less than >100 for the original rasterizer implementation

zubair-irshad · 2023-11-06T21:23:48Z

@vye16 @maturk Any plans on supporting larger register channels i.e. MAX_REGISTER_CHANNELS>3 perhaps 16 or 32 to achieve same level of optimization that native sh rasterizer gives? I am happy to create a PR. Though just increasing this number gives some errors elsewhere for instance AT_ERROR("v_colors must have dimensions (N, 3)"); Should I change anything else in the CUDA code to achieve this?

I am wondering if there are any downsides of specifying 128, 256 or 512 MAX_REGISTER_CHANNELS, would it affect the memory? I think GPUs with larger sizes can support this? Any intuition is greatly appreciated.

vye16 · 2023-12-06T15:52:31Z

Hi Zubair, sorry for the late response. Currently the color rasterization represents color in float3 (CUDA vectorized type). We can make a version that accepts N-d colors up to ~32 channels that could fit in shared memory during rasterization. Unfortunately 128, 256, 512 would be too big to fit in shared memory in one pass, but it is possible to rasterize them in batches of channels (0-32) that fit in shared memory. This is unlikely to reach similar performance, but would be better than the current ND-rasterizer. In the near-term we're not currently working on it, but I'm happy to guide you if you'd like to make a PR.

zubair-irshad · 2023-12-08T09:35:21Z

Thanks @vye16! I am happy to work on it and make a PR. Any pointers on where I start/which parts I look at changing first would be appreciated, thanks a lot!

SeanGuo063 · 2023-12-21T12:23:48Z

Any update to this issue? I am also working on rendering high dimensional features, and want to know how to speed up nd rasterizer

kerrj · 2024-02-14T17:06:58Z

#130 works towards this issue, let me know if you try it out! @zubair-irshad

zubair-irshad · 2024-02-15T02:40:42Z

This is great, I will check it asap. Thanks @kerrj.

kerrj mentioned this issue Feb 13, 2024

Refactor interfaces to hide tile_bounds and allow dynamic block_size #129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nd rasterizer is 10x slower than rasterizer #68

nd rasterizer is 10x slower than rasterizer #68

zubair-irshad commented Nov 2, 2023 •

edited

vye16 commented Nov 2, 2023 via email

zubair-irshad commented Nov 2, 2023

zubair-irshad commented Nov 3, 2023 •

edited

zubair-irshad commented Nov 3, 2023 •

edited

zubair-irshad commented Nov 6, 2023

vye16 commented Dec 6, 2023 via email

zubair-irshad commented Dec 8, 2023

SeanGuo063 commented Dec 21, 2023 •

edited

kerrj commented Feb 14, 2024

zubair-irshad commented Feb 15, 2024

nd rasterizer is 10x slower than rasterizer #68

nd rasterizer is 10x slower than rasterizer #68

Comments

zubair-irshad commented Nov 2, 2023 • edited

vye16 commented Nov 2, 2023 via email

zubair-irshad commented Nov 2, 2023

zubair-irshad commented Nov 3, 2023 • edited

zubair-irshad commented Nov 3, 2023 • edited

zubair-irshad commented Nov 6, 2023

vye16 commented Dec 6, 2023 via email

zubair-irshad commented Dec 8, 2023

SeanGuo063 commented Dec 21, 2023 • edited

kerrj commented Feb 14, 2024

zubair-irshad commented Feb 15, 2024

zubair-irshad commented Nov 2, 2023 •

edited

zubair-irshad commented Nov 3, 2023 •

edited

zubair-irshad commented Nov 3, 2023 •

edited

SeanGuo063 commented Dec 21, 2023 •

edited