Can configure efficient-kan model for continual learning? #28

lukmanulhakeem97 · 2024-05-19T01:18:28Z

Similar like what authors shown in official git repo, can use this efficient-kan model for continual learning settings. . For using efficient-kan for CL settings, I haven't found some attributes that need to be set given in official pykan;

######### cl code from pykan

setting bias_trainable=False, sp_trainable=False, sb_trainable=False is important.

otherwise KAN will have random scaling and shift for samples in previous stages

model = KAN(width=[1,1], grid=200, k=3, noise_scale=0.1, bias_trainable=False, sp_trainable=False, sb_trainable=False)

how can I set bias_trainable=False, sp_trainable=False, sb_trainable=False here, is there a way?

Blealtan · 2024-05-20T12:15:04Z

Select the corresponding parameter and .requires_grad_(False) on them. It's the same as freezing some parameters when you do parameter-efficient finetuning. If later I have the time I may add corresponding helper methods for that.

The parameters are:

sp_trainable -> spline_scaler
sb_trainable -> base_weight

Sadly I forgot to implement the bias term! (and thank you for letting me notice that XD)

ASCIIJK · 2024-05-21T07:44:27Z

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

rafaelcp · 2024-05-21T21:11:27Z

Anyone else found that KAN's effectiveness against catastrophic forgetting works only on 1D tasks? Since the spline locality applies to each dimension independently, it can't isolate more complex patterns based on multiple input dimensions (e.g.: an MNIST digit). I think a simple 2D experiment with some Gaussian bumps would be enough to demonstrate this shortcome.

lukmanulhakeem97 · 2024-05-22T02:48:36Z

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

working on similiar case. For me, i just trained two task initially. but it still have catastrophic forgetting issue.

ASCIIJK · 2024-05-23T08:36:38Z

Anyone else found that KAN's effectiveness against catastrophic forgetting works only on 1D tasks? Since the spline locality applies to each dimension independently, it can't isolate more complex patterns based on multiple input dimensions (e.g.: an MNIST digit). I think a simple 2D experiment with some Gaussian bumps would be enough to demonstrate this shortcome.

I have made the experiments with the offical KAN, which shows the catastrophic forgetting issue. Specifically, we use a mixed 2D Gaussian distribution with 5 peaks to construct a CL tasks, which shows as bellow:

And the model learns each peak with 50,000 data points. For exemple, the data points of first task is showed as bellow：

Then, we get the results after 5 tasks:

This forgetting issue occurrs in each task, such as task 1:

PS: We use the model: "model = KAN(width=[2, 16, 1], grid=5, k=6, noise_scale=0.1, bias_trainable=False, sp_trainable=False, sb_trainable=False)". And we have made sure that the loss is down to zero at each task. So you can find a perfect peak as the same as training data. We think that KAN maybe hard to learn the high-dimensional data without forgetting.

lukmanulhakeem97 · 2024-05-23T15:23:09Z

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

Bro, here u pass output from cnn (512 size vector for each image) into KAN is it? Can you share your code?

ASCIIJK · 2024-05-24T01:27:50Z

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

Bro, here u pass output from cnn (512 size vector for each image) into KAN is it? Can you share your code?

I just simply replace the last fc with a KANLayer. And I compare it with the replay method which only store 20 old samples for future training. Its results shows little improvement. PS: "KANLinear(in_features=512, out_features=100)", I directly initialize this layer into 100 categories to avoid adjust the dimension in subsequent tasks.

lukmanulhakeem97 · 2024-05-26T05:01:42Z

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

Bro, here u pass output from cnn (512 size vector for each image) into KAN is it? Can you share your code?

I just simply replace the last fc with a KANLayer. And I compare it with the replay method which only store 20 old samples for future training. Its results shows little improvement. PS: "KANLinear(in_features=512, out_features=100)", I directly initialize this layer into 100 categories to avoid adjust the dimension in subsequent tasks.

okay, so for KANLayer you pass only tasks specific data, wilthout old task's replay sample.?

ASCIIJK · 2024-05-27T00:46:46Z

Is it the reason for little effectness of CIL (class incremental learning)? I have tried to replace the fc layer directly with KAN layer on image classification task (CIFAR-100 B50inc5). It's just a little improvement (maybe 0.5%?).

Bro, here u pass output from cnn (512 size vector for each image) into KAN is it? Can you share your code?

I just simply replace the last fc with a KANLayer. And I compare it with the replay method which only store 20 old samples for future training. Its results shows little improvement. PS: "KANLinear(in_features=512, out_features=100)", I directly initialize this layer into 100 categories to avoid adjust the dimension in subsequent tasks.

okay, so for KANLayer you pass only tasks specific data, wilthout old task's replay sample.?

No, I use the same data as replay, which contains 20 samples per category. That's why I say KAN may be hard to deal with high dimension data. But I actually find that KAN can handle the simple data (such as classifying a 2D scatter) and avoid forgetting.

Blealtan self-assigned this May 20, 2024

Blealtan added the enhancement New feature or request label May 20, 2024

lukmanulhakeem97 closed this as completed May 22, 2024

lukmanulhakeem97 reopened this May 22, 2024

rafaelcp mentioned this issue May 28, 2024

Continual learning with KAN KindXiaoming/pykan#227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can configure efficient-kan model for continual learning? #28

Can configure efficient-kan model for continual learning? #28

lukmanulhakeem97 commented May 19, 2024 •

edited

Blealtan commented May 20, 2024

ASCIIJK commented May 21, 2024

rafaelcp commented May 21, 2024

lukmanulhakeem97 commented May 22, 2024

ASCIIJK commented May 23, 2024

lukmanulhakeem97 commented May 23, 2024

ASCIIJK commented May 24, 2024

lukmanulhakeem97 commented May 26, 2024

ASCIIJK commented May 27, 2024

Can configure efficient-kan model for continual learning? #28

Can configure efficient-kan model for continual learning? #28

Comments

lukmanulhakeem97 commented May 19, 2024 • edited

setting bias_trainable=False, sp_trainable=False, sb_trainable=False is important.

otherwise KAN will have random scaling and shift for samples in previous stages

Blealtan commented May 20, 2024

ASCIIJK commented May 21, 2024

rafaelcp commented May 21, 2024

lukmanulhakeem97 commented May 22, 2024

ASCIIJK commented May 23, 2024

lukmanulhakeem97 commented May 23, 2024

ASCIIJK commented May 24, 2024

lukmanulhakeem97 commented May 26, 2024

ASCIIJK commented May 27, 2024

lukmanulhakeem97 commented May 19, 2024 •

edited