Consider swapping parameters less often in DPO #61

odelalleau · 2023-12-18T18:28:29Z

Is your feature request related to a problem? Please describe.

Currently, in DPO the swapping between parameters of the current vs. reference policies occurs after each batch, but this may be inefficient for large models => allowing less frequent swapping may be beneficial.

Describe the solution you'd like

Profile code on Llama2-70B to estimate potential benefits of less frequent swapping
If these benefits are not negligible: implement it

Additional context

The current implementation is already partially designed to support less frequent swapping. We just need to make sure this plays well with the optimization that pads each batch to a different length (either procesing each batch independently, or re-padding them to the same length)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider swapping parameters less often in DPO #61

Consider swapping parameters less often in DPO #61

odelalleau commented Dec 18, 2023

Consider swapping parameters less often in DPO #61

Consider swapping parameters less often in DPO #61

Comments

odelalleau commented Dec 18, 2023