About GPU memory usage #46

Monkey-D-Luffy-star · 2021-10-29T03:20:26Z

If non-local is applied to the low-level feature map, CUDA out of memory will happen.Is this due to the amount of memory required to compute the Attention matrix?
Looking forward to your reply

buncybunny · 2021-11-04T09:02:37Z

I'm also experiencing CUDA out of memory issue with non-local block. I'm trying to use non-local block at the top of my network, which is for bbox regression conv head in faster r-cnn. Do you guys have any ideas to address this?

AlexHex7 · 2021-11-10T05:18:38Z

@Monkey-D-Luffy-star @vombategeht Hi~

The larger the size (height, width, depth) of feature maps is, the more memories the matrix multiplication will occupy.

When I encounter this problem，I will：

reduce the batch size
downsample the feature maps
move non-local block to high-level position
make some optimization. For example, follow the idea of papers:
4.1. GCNet:Non-local Networks Meet Squeeze-Excitation Networks and Beyond
4.2. Compact Generalized Non-local Network
follow the idea of transformer block: split tokens (height x width x depth) in several groups, then do self-attention in each group.
or directly try using transformer block

Monkey-D-Luffy-star · 2021-11-10T07:17:38Z

@AlexHex7 Thx, benefit a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About GPU memory usage #46

About GPU memory usage #46

Monkey-D-Luffy-star commented Oct 29, 2021

buncybunny commented Nov 4, 2021 •

edited

AlexHex7 commented Nov 10, 2021

Monkey-D-Luffy-star commented Nov 10, 2021

About GPU memory usage #46

About GPU memory usage #46

Comments

Monkey-D-Luffy-star commented Oct 29, 2021

buncybunny commented Nov 4, 2021 • edited

AlexHex7 commented Nov 10, 2021

Monkey-D-Luffy-star commented Nov 10, 2021

buncybunny commented Nov 4, 2021 •

edited