You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, in DPO the swapping between parameters of the current vs. reference policies occurs after each batch, but this may be inefficient for large models => allowing less frequent swapping may be beneficial.
Describe the solution you'd like
Profile code on Llama2-70B to estimate potential benefits of less frequent swapping
If these benefits are not negligible: implement it
Additional context
The current implementation is already partially designed to support less frequent swapping. We just need to make sure this plays well with the optimization that pads each batch to a different length (either procesing each batch independently, or re-padding them to the same length)
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently, in DPO the swapping between parameters of the current vs. reference policies occurs after each batch, but this may be inefficient for large models => allowing less frequent swapping may be beneficial.
Describe the solution you'd like
Additional context
The current implementation is already partially designed to support less frequent swapping. We just need to make sure this plays well with the optimization that pads each batch to a different length (either procesing each batch independently, or re-padding them to the same length)
The text was updated successfully, but these errors were encountered: