Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fails to finetune certain subset of parameters via torch.optim.AdamW code (not .json setting) #5485

Open
JasonLeeFdu opened this issue Apr 30, 2024 · 0 comments
Labels
bug Something isn't working training

Comments

@JasonLeeFdu
Copy link

JasonLeeFdu commented Apr 30, 2024

I have to add one more LoRA layer by hand(without peft) to a pretrained Multi-modal model, to finetune the model for new data. I want Deepspeed to optimize ONLY the parameters from the LoRA layer rather than all the parameters. Like this
image

The platform is huggingface's transformers and Deepspeed.

Therefore I decorate the Trainer from HF's transformers,as below:
image

Unfortuanately, it doesn't work, both LoRA and non-LoRa's weights are not changed during training. It seems that the optimizer in Deepspeed is not the same as that from pytorch.

My question is , are there any ways that allow me to ONLY finetune certain subnet's parameters with Deepspeed+Transformer's Trainer?

@JasonLeeFdu JasonLeeFdu added bug Something isn't working training labels Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

No branches or pull requests

1 participant