Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lora's "catastrophic forgetting" problem #311

Open
shatealaboxiaowang opened this issue Feb 20, 2024 · 2 comments
Open

lora's "catastrophic forgetting" problem #311

shatealaboxiaowang opened this issue Feb 20, 2024 · 2 comments

Comments

@shatealaboxiaowang
Copy link

Hi, dear:

Thanks for your open source!
How did you overcome the catastrophic forgetting problem in lora finetune.
The performance dropped a lot on humaneval dataset after lora finetune on my own dataset.

@JegernOUTT
Copy link
Member

Hi! We faced this problem, so there are several things you can do:

  • make lora finetune params as "small" as they can be, use finetune settings section for that (Lora R, Lora Alpha)
  • try different checkpoints (from early steps)

Combining these two, you can find a balance between performance on humaneval / your codebase

However, you can never completely beat that problem while you're using the finetune. There are a couple of methods to prepare data to make the problem less visible though, https://arxiv.org/abs/2312.05934. I guess we'll revisit this in some time, but you're welcome to contribute if you have some ideas

@shatealaboxiaowang
Copy link
Author

Hi! We faced this problem, so there are several things you can do:

  • make lora finetune params as "small" as they can be, use finetune settings section for that (Lora R, Lora Alpha)
  • try different checkpoints (from early steps)

Combining these two, you can find a balance between performance on humaneval / your codebase

However, you can never completely beat that problem while you're using the finetune. There are a couple of methods to prepare data to make the problem less visible though, https://arxiv.org/abs/2312.05934. I guess we'll revisit this in some time, but you're welcome to contribute if you have some ideas

Thx, i will try. I have two questions:
(1) My code-dataset is relatively large(≈1G),may be better to fine-tune on full parameter? but full parameter fine-tuning is more prone to catastrophic forgetting than lora, is that right ?
(2) I find very few project use FIM to fine-tune codellama, but use instruction, but you use it here.
My current task is to fine-tune and implement performance(Generation && FIM) enhancements based on our internal code, do you have any better suggestions ?
Thx again !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants