GitHub - somvy/slic-hf: Experiments of divergence functions for DPO, RLHF

Reproducing results of paper - "Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints"

The paper compares different divergence functions for direct preference optimization (DPO).

Results notebook on nbviewer - results.ipynb

Setup

Install poetry
Then run:

git clone https://github.com/somvy/slic-hf && cd slic-hf
poetry install && poetry shell
wandb login
huggingface-cli login

Specify your HuggingFace username, desired SFT model in config.py

Dataset

Prompts - first sentences from movie reviews. Used some hacks to generate answers with positive bias (see dataset/generation_config.py) Used diverse beam search decoding with diversity penalty 50 to generate 6 answers per prompt. Then scored them with reward model. Used pairs of (top1, top4\5\6) and (top1\2\3, top6) as chosen and rejected answers (total 6 pairs from generation). Final dataset - 3600 pairs, test size 0.2.

hf link

Also randomly selected 50 prompts for eval generation - hf link

Use this dataset, or generate your own by

set -a && source .env && poetry run python dataset/main.py

after generation change datasets paths in config.py

Train

Specify training arguments, DPOTrainer params and run_name in train_dpo/train.py
Run

set -a && source .env && poetry run python train_dpo/train.py

(Optional) Generate answers from eval dataset. Specify generation params and desired run_name in train_dpo/generate.py

set -a && source .env && poetry run python train_dpo/generate.py

Experiments setup

Trained GPT2 finetuned on IMDB reviews.
3 epochs, batch size 4, lr 1e-4 for sigmoid and hinge, 1e-5 for others.

Weights and logs

Loss	Weights	Wandb Report
Hinge		link
$\beta = 10$	link
$\beta = 1$	link
$\beta = 0.5$	link
$\beta = 0.1$	link
Sigmoid		link
$\beta = 10$	link
$\beta = 1$	link
$\beta = 0.5$	link
$\beta = 0.1$	link
JS divergence		link
$\beta = 1$	link
$\beta = 0.1 $	link
Forward KL		link
$\beta=0.1$	link
$\beta = 1$	link
$\alpha$-divergence		link
$\alpha = 0.3, \beta = 1$	link
$\alpha = 0.3, \beta = 0.1$	link
$\alpha = 0.5, \beta = 1$	link
$\alpha = 0.5, \beta = 0.1$	link
$\alpha = 0.7, \beta = 1$	link
$\alpha = 0.7, \beta = 0.1$	link

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
common		common
data		data
dataset		dataset
train_dpo		train_dpo
.env		.env
.gitignore		.gitignore
README.md		README.md
config.py		config.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
results.ipynb		results.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common

common

data

data

dataset

dataset

train_dpo

train_dpo

.env

.env

.gitignore

.gitignore

README.md

README.md

config.py

config.py

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

results.ipynb

results.ipynb

Repository files navigation

Setup

Dataset

Train

Experiments setup

Weights and logs

About

Releases

Packages

Languages

somvy/slic-hf

Folders and files

Latest commit

History

Repository files navigation

Setup

Dataset

Train

Experiments setup

Weights and logs

About

Topics

Resources

Stars

Watchers

Forks

Languages