Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can configure efficient-kan model for continual learning? #28

Open
lukmanulhakeem97 opened this issue May 19, 2024 · 9 comments
Open

Can configure efficient-kan model for continual learning? #28

lukmanulhakeem97 opened this issue May 19, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@lukmanulhakeem97
Copy link

lukmanulhakeem97 commented May 19, 2024

Similar like what authors shown in official git repo, can use this efficient-kan model for continual learning settings. . For using efficient-kan for CL settings, I haven't found some attributes that need to be set given in official pykan;

######### cl code from pykan

setting bias_trainable=False, sp_trainable=False, sb_trainable=False is important.

otherwise KAN will have random scaling and shift for samples in previous stages

model = KAN(width=[1,1], grid=200, k=3, noise_scale=0.1, bias_trainable=False, sp_trainable=False, sb_trainable=False)

how can I set bias_trainable=False, sp_trainable=False, sb_trainable=False here, is there a way?

@Blealtan
Copy link
Owner

Select the corresponding parameter and .requires_grad_(False) on them. It's the same as freezing some parameters when you do parameter-efficient finetuning. If later I have the time I may add corresponding helper methods for that.

The parameters are:

sp_trainable -> spline_scaler
sb_trainable -> base_weight

Sadly I forgot to implement the bias term! (and thank you for letting me notice that XD)

@Blealtan Blealtan self-assigned this May 20, 2024
@Blealtan Blealtan added the enhancement New feature or request label May 20, 2024
@ASCIIJK
Copy link

ASCIIJK commented May 21, 2024

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

@rafaelcp
Copy link

Anyone else found that KAN's effectiveness against catastrophic forgetting works only on 1D tasks? Since the spline locality applies to each dimension independently, it can't isolate more complex patterns based on multiple input dimensions (e.g.: an MNIST digit). I think a simple 2D experiment with some Gaussian bumps would be enough to demonstrate this shortcome.

@lukmanulhakeem97
Copy link
Author

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

working on similiar case. For me, i just trained two task initially. but it still have catastrophic forgetting issue.

@ASCIIJK
Copy link

ASCIIJK commented May 23, 2024

Anyone else found that KAN's effectiveness against catastrophic forgetting works only on 1D tasks? Since the spline locality applies to each dimension independently, it can't isolate more complex patterns based on multiple input dimensions (e.g.: an MNIST digit). I think a simple 2D experiment with some Gaussian bumps would be enough to demonstrate this shortcome.

I have made the experiments with the offical KAN, which shows the catastrophic forgetting issue. Specifically, we use a mixed 2D Gaussian distribution with 5 peaks to construct a CL tasks, which shows as bellow:
Ground_task5

And the model learns each peak with 50,000 data points. For exemple, the data points of first task is showed as bellow:
Ground_task0

Then, we get the results after 5 tasks:
Pred_task4

This forgetting issue occurrs in each task, such as task 1:
Pred_task1

PS: We use the model: "model = KAN(width=[2, 16, 1], grid=5, k=6, noise_scale=0.1, bias_trainable=False, sp_trainable=False, sb_trainable=False)". And we have made sure that the loss is down to zero at each task. So you can find a perfect peak as the same as training data. We think that KAN maybe hard to learn the high-dimensional data without forgetting.

@lukmanulhakeem97
Copy link
Author

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

Bro, here u pass output from cnn (512 size vector for each image) into KAN is it? Can you share your code?

@ASCIIJK
Copy link

ASCIIJK commented May 24, 2024

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

Bro, here u pass output from cnn (512 size vector for each image) into KAN is it? Can you share your code?

I just simply replace the last fc with a KANLayer. And I compare it with the replay method which only store 20 old samples for future training. Its results shows little improvement. PS: "KANLinear(in_features=512, out_features=100)", I directly initialize this layer into 100 categories to avoid adjust the dimension in subsequent tasks.

@lukmanulhakeem97
Copy link
Author

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

Bro, here u pass output from cnn (512 size vector for each image) into KAN is it? Can you share your code?

I just simply replace the last fc with a KANLayer. And I compare it with the replay method which only store 20 old samples for future training. Its results shows little improvement. PS: "KANLinear(in_features=512, out_features=100)", I directly initialize this layer into 100 categories to avoid adjust the dimension in subsequent tasks.

okay, so for KANLayer you pass only tasks specific data, wilthout old task's replay sample.?

@ASCIIJK
Copy link

ASCIIJK commented May 27, 2024

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

Bro, here u pass output from cnn (512 size vector for each image) into KAN is it? Can you share your code?

I just simply replace the last fc with a KANLayer. And I compare it with the replay method which only store 20 old samples for future training. Its results shows little improvement. PS: "KANLinear(in_features=512, out_features=100)", I directly initialize this layer into 100 categories to avoid adjust the dimension in subsequent tasks.

okay, so for KANLayer you pass only tasks specific data, wilthout old task's replay sample.?

No, I use the same data as replay, which contains 20 samples per category. That's why I say KAN may be hard to deal with high dimension data. But I actually find that KAN can handle the simple data (such as classifying a 2D scatter) and avoid forgetting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants