Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in memory usage of mlp and kan #23

Open
c-pupil opened this issue May 16, 2024 · 1 comment
Open

Difference in memory usage of mlp and kan #23

c-pupil opened this issue May 16, 2024 · 1 comment

Comments

@c-pupil
Copy link

c-pupil commented May 16, 2024

Hello:
if input tensor size is [64,28x28],hidden layers is [256,256,256,256],The memory usage of mlp and kan is similar,382M and 500M respectively.The results are consistent with the experimental results:
However,if the input tensor size is [36864,28x28],The memory usage of the two is huge different,844M and 14468M respectively.What is the reason for this?The initialization of the kan is consistent with that given in the example. And use a gpu.

@Blealtan
Copy link
Owner

The parameters are relevantly little in this case of [batch_size=36864, input_size], but the intermediate storage (taped variables/graph in terms of autograd) is huge. I think it's due to the b-spline computation creating too much intermediate variables. A huge fused kernel might help, but I don't have time working on this; needs some math work manually differentiating over the b-spline base functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants