New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref_model
not needed in Fine_tune_a_Mistral_7b_model_with_DPO.ipynb
#44
Comments
Hey @alvarobartt, thanks a lot for the hints. I am using the above notebook and your suggestion solved my memory issue on google colab. |
Yep, if you try to run DPOTrainer when passing the ref model, you get the runtime error below, to fix you can just comment out ref_model in DPOTrainer (and cleanup the declaration of ref_model). Thanks @mlabonne for this super notebook which gets me started with going beyond SFT with first DPO tune,
|
Thanks @alvarobartt for opening this issue! I faced the same problem and following your suggestion, solved it. I removed the declaration for ref_model as @corticalstack suggested and I further removed the ref_model argument in the DPOTrainer. |
I updated the notebook and removed the |
Hi here @mlabonne! Congratulations on your awesome work with this course 馃馃徎
After going through
Fine_tune_a_Mistral_7b_model_with_DPO.ipynb
I realised that there's no need to define theref_model
required by DPO, since when fine-tuning using LoRA, the reference model is not required, as the one without the adapters will be used to compute the logprobs, so you can remove theref_model
and the result will still be the same, but using even less resources.Finally, as a tip, when using the
DPOTrainer
for full fine-tunes you can also specifyprecompute_ref_log_probs
to compute those in advance before the actual fine-tune starts, so that theref_model
is not needed either.The text was updated successfully, but these errors were encountered: