[BUG] Fails to finetune certain subset of parameters via torch.optim.AdamW code (not .json setting) #5485

JasonLeeFdu · 2024-04-30T17:29:35Z

I have to add one more LoRA layer by hand(without peft) to a pretrained Multi-modal model, to finetune the model for new data. I want Deepspeed to optimize ONLY the parameters from the LoRA layer rather than all the parameters. Like this

The platform is huggingface's transformers and Deepspeed.

Therefore I decorate the Trainer from HF's transformers,as below:

Unfortuanately, it doesn't work, both LoRA and non-LoRa's weights are not changed during training. It seems that the optimizer in Deepspeed is not the same as that from pytorch.

My question is , are there any ways that allow me to ONLY finetune certain subnet's parameters with Deepspeed+Transformer's Trainer?

JasonLeeFdu added bug Something isn't working training labels Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Fails to finetune certain subset of parameters via torch.optim.AdamW code (not .json setting) #5485

[BUG] Fails to finetune certain subset of parameters via torch.optim.AdamW code (not .json setting) #5485

JasonLeeFdu commented Apr 30, 2024 •

edited

[BUG] Fails to finetune certain subset of parameters via torch.optim.AdamW code (not .json setting) #5485

[BUG] Fails to finetune certain subset of parameters via torch.optim.AdamW code (not .json setting) #5485

Comments

JasonLeeFdu commented Apr 30, 2024 • edited

JasonLeeFdu commented Apr 30, 2024 •

edited