Measuring Memorization in Language Model Fine-tuning

Creating the Environment

conda create --name env --file requirments.txt

File structure

In this repo we mainly focus on the task of next word prediction. The code is under the folder `gen'. The run_clm.py file is the main script that runs fine-tuning and memorization evaluations. Under gen you see ptb, enron and wikipedia directories, which contain the bash scripts to run experiments for different datasets, and different fine-tuning methods (full, head and adapters). The logs for each experiment will contain all the necessary metrics.

Run Fine-tuning and Evaluations

To run full fine-tuning on wikipedia data, and evaluate memorization using the membership inference attack run:

cd gen/wikipedia
bash  run_clm_full_ft_1gpu.sh

If you want to fine-tune adapters, run:

cd gen/wikipedia
bash   run_clm_adapter_1gpu.sh

And for fine-tuning the head run:

cd gen/wikipedia
bash run_clm_head_ft_1gpu.sh

For running fine-tuning and evaluating memorization using the exposure metric, you can run:

cd gen/wikipedia
bash  run_clm_full_ft_canaries_1gpu.sh

You can change the dataset/fine-tuning method to run the other experiments.

Extracting the Metrics from the Logs and Drawing Plots

To get the evaluation metrics and redraw our plotes, use the following Jupyter notebook:

cd gen
code plots.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
gen		gen
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gen

gen

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Measuring Memorization in Language Model Fine-tuning

Creating the Environment

File structure

Run Fine-tuning and Evaluations

Extracting the Metrics from the Logs and Drawing Plots

About

Releases

Packages

Languages

mireshghallah/ft-memorization

Folders and files

Latest commit

History

Repository files navigation

Measuring Memorization in Language Model Fine-tuning

Creating the Environment

File structure

Run Fine-tuning and Evaluations

Extracting the Metrics from the Logs and Drawing Plots

About

Resources

Stars

Watchers

Forks

Languages