Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composable SFT #28

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Composable SFT #28

wants to merge 6 commits into from

Conversation

haileyschoelkopf
Copy link
Collaborator

@haileyschoelkopf haileyschoelkopf commented Jun 28, 2022

https://arxiv.org/pdf/2110.07560.pdf <-- Paper
https://github.com/cambridgeltl/composable-sft <-- code

TODOs:

  • Determine hyperparameters we should use for comparable testing. This will mean, likely, x train steps + 50k rewinded steps with one iteration in their method. Or maybe 5 iterations + 10k train steps twice? Idk yet
  • Add loading an SFT from path (NOT MAIN PRIORITY)

If we want to train both adapters and Composable SFT at once, this will require some extra code. Probably not TOO bad, but would need extra testing to account for freezing all correct parameters

@haileyschoelkopf
Copy link
Collaborator Author

haileyschoelkopf commented Jun 28, 2022

Issue #23 is addressed by this.

EDITED:
image
Here for reference are loss curves (Sparse FT) compared to adapter test run! Loss curves look great. I am curious to see whether the less steep eval loss will imply lower downstream performance :)

@haileyschoelkopf
Copy link
Collaborator Author

06/27/2022 21:55:58 - INFO - __main__ - adapter elements: 3697664
06/27/2022 21:55:58 - INFO - __main__ - K value for SFT is 3670016.0

My calc for # tunable params is slightly off still. I'm missing something minor. (In file, I estimate # params for the model using pfeiffer+inv with adapter_reduction_factor.)

Yong, did you run calcs to find # of tunable adapter params, or just get that number by adding an adapter to the model?

@haileyschoelkopf haileyschoelkopf marked this pull request as ready for review July 1, 2022 18:33
@haileyschoelkopf
Copy link
Collaborator Author

@yongzx This is ready to merge, the only thing needing changing is the calc for # of parameters to finetune.

@haileyschoelkopf haileyschoelkopf changed the title [WIP] Composable SFT Composable SFT Jul 1, 2022
@yongzx
Copy link
Collaborator

yongzx commented Jul 1, 2022

Yong, did you run calcs to find # of tunable adapter params, or just get that number by adding an adapter to the model?

the only thing needing changing is the calc for # of parameters to fine-tune.

I can help do this, no worries! It's just a running sum of trainable parameters. I will review the code over the weekend.

@yongzx
Copy link
Collaborator

yongzx commented Jul 1, 2022

I just did a quick read of the commit. It seems like for SFT, we don't need to modify anything in the adapter-transformers?

@haileyschoelkopf
Copy link
Collaborator Author

Yep! Just

git clone https://github.com/cambridgeltl/composable-sft.git
cd composable-sft
pip install -e .

to install their code. Thanks!

@haileyschoelkopf
Copy link
Collaborator Author

Also

I can help do this, no worries! It's just a running sum of trainable parameters. I will review the code over the weekend.

what I meant by this was to set the number of parameters this method changes (it's fully configurable) such that that total was equivalent to using pfeiffer+inv adapters, not just counting # trainable params with this method.

@yongzx
Copy link
Collaborator

yongzx commented Jul 5, 2022

I will prioritize evaluation test suites over this for now, but I hope to finish reviewing this before our meeting this Friday.

@haileyschoelkopf
Copy link
Collaborator Author

No problem!

For reference,

git clone https://github.com/haileyschoelkopf/composable-sft.git
cd composable-sft
pip install -e .

Now to install the dependency, do this instead. I'm hoping to add a Random and FISH masking strategy in this.

@yongzx
Copy link
Collaborator

yongzx commented Aug 30, 2022

Refering to MLM training scripts, the training steps for both full-model and sparse finetuning seem to be equal. Since we are comparing sparse-finetuning to other adapters methods, we need to set both to be 25K steps.

@yongzx
Copy link
Collaborator

yongzx commented Aug 30, 2022

558f674 now supports composable SFT. Hailey, do you want to test it out?

@haileyschoelkopf
Copy link
Collaborator Author

Yes, let me try running this with those parameters!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants