Difference in memory usage of mlp and kan #23

c-pupil · 2024-05-16T08:32:49Z

Hello：
if input tensor size is [64,28x28],hidden layers is [256,256,256,256],The memory usage of mlp and kan is similar,382M and 500M respectively.The results are consistent with the experimental results:
However，if the input tensor size is [36864,28x28],The memory usage of the two is huge different,844M and 14468M respectively.What is the reason for this?The initialization of the kan is consistent with that given in the example. And use a gpu.

Blealtan · 2024-05-17T18:58:36Z

The parameters are relevantly little in this case of [batch_size=36864, input_size], but the intermediate storage (taped variables/graph in terms of autograd) is huge. I think it's due to the b-spline computation creating too much intermediate variables. A huge fused kernel might help, but I don't have time working on this; needs some math work manually differentiating over the b-spline base functions.

Blealtan mentioned this issue May 17, 2024

CUDA out of memory. #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference in memory usage of mlp and kan #23

Difference in memory usage of mlp and kan #23

c-pupil commented May 16, 2024 •

edited

Blealtan commented May 17, 2024

Difference in memory usage of mlp and kan #23

Difference in memory usage of mlp and kan #23

Comments

c-pupil commented May 16, 2024 • edited

Blealtan commented May 17, 2024

c-pupil commented May 16, 2024 •

edited