Skip to content

somvy/slic-hf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reproducing results of paper - "Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints"

The paper compares different divergence functions for direct preference optimization (DPO).

Results notebook on nbviewer - results.ipynb

Setup

  1. Install poetry
  2. Then run:
git clone https://github.com/somvy/slic-hf && cd slic-hf
poetry install && poetry shell
wandb login
huggingface-cli login
  1. Specify your HuggingFace username, desired SFT model in config.py

Dataset

Prompts - first sentences from movie reviews. Used some hacks to generate answers with positive bias (see dataset/generation_config.py) Used diverse beam search decoding with diversity penalty 50 to generate 6 answers per prompt. Then scored them with reward model. Used pairs of (top1, top4\5\6) and (top1\2\3, top6) as chosen and rejected answers (total 6 pairs from generation). Final dataset - 3600 pairs, test size 0.2.

hf link

Also randomly selected 50 prompts for eval generation - hf link

Use this dataset, or generate your own by

set -a && source .env && poetry run python dataset/main.py

after generation change datasets paths in config.py

Train

  1. Specify training arguments, DPOTrainer params and run_name in train_dpo/train.py
  2. Run
set -a && source .env && poetry run python train_dpo/train.py
  1. (Optional) Generate answers from eval dataset. Specify generation params and desired run_name in train_dpo/generate.py
set -a && source .env && poetry run python train_dpo/generate.py

Experiments setup

Trained GPT2 finetuned on IMDB reviews.
3 epochs, batch size 4, lr 1e-4 for sigmoid and hinge, 1e-5 for others.

Weights and logs

Loss Weights Wandb Report
Hinge link
$\beta = 10$ link
$\beta = 1$ link
$\beta = 0.5$ link
$\beta = 0.1$ link
Sigmoid link
$\beta = 10$ link
$\beta = 1$ link
$\beta = 0.5$ link
$\beta = 0.1$ link
JS divergence link
$\beta = 1$ link
$\beta = 0.1 $ link
Forward KL link
$\beta=0.1$ link
$\beta = 1$ link
$\alpha$-divergence link
$\alpha = 0.3, \beta = 1$ link
$\alpha = 0.3, \beta = 0.1$ link
$\alpha = 0.5, \beta = 1$ link
$\alpha = 0.5, \beta = 0.1$ link
$\alpha = 0.7, \beta = 1$ link
$\alpha = 0.7, \beta = 0.1$ link

About

Experiments of divergence functions for DPO, RLHF

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published