Skip to content

eminorhan/llm-memory

Repository files navigation

Recognition, recall, and retention of few-shot memories in LLMs

This repository contains the code for reproducing the results reported in the following paper:

Orhan AE (2023) Recognition, recall, and retention of few-shot memories in large language models. arXiv:2303.17557.

The repository contains three Python files train.py, test.py, generate.py (all modified from the Huggingface causal language modeling example here) to train (or finetune) a model, to run a recognition test, and to run a recall test, respectively.

Usage examples

Some usage examples for these files are given below.

  • Finetune a gpt-j-6B model with the study sentences in seen_data_0.json for 1 epoch (1 exposure) on 4 GPUs (with a total batch size of 4x4=16 sentences) using the Huggingface Accelerate framework (see the example config file here):
accelerate launch --config_file accelerate_config.yaml --num_cpu_threads_per_process 4 train.py \
    --model_name_or_path "EleutherAI/gpt-j-6B" \
    --train_file "data/llm-experiment-data/expt1/seen_data_0.json" \
    --per_device_train_batch_size 4 \
    --learning_rate 0.00001 \
    --output_dir OUTPUT_DIR \
    --save_prefix INFORMATIVE_SAVE_PREFIX \
    --block_size 128 \
    --num_train_epochs 1 \
    --overwrite_cache
  • Run a recognition test on a model with the study sentences in seen_data_0.json and foils in unseen_data_0.json:
python -u test.py \
    --model_name_or_path MODEL_PATH \
    --seen_file "data/llm-experiment-data/expt1/seen_data_0.json" \
    --unseen_file "data/llm-experiment-data/expt1/unseen_data_0.json" \
    --per_device_eval_batch_size 1 \
    --output_dir OUTPUT_DIR \
    --save_prefix INFORMATIVE_SAVE_PREFIX \
    --block_size 128 \
    --overwrite_cache
  • Run a recall test with a model with the study sentences in seen_data_0.json:
python -u generate.py \
    --model_name_or_path MODEL_PATH \
    --seen_file "data/llm-experiment-data/expt1/seen_data_0.json" \
    --per_device_eval_batch_size 1 \
    --output_dir OUTPUT_DIR \
    --save_prefix INFORMATIVE_SAVE_PREFIX \
    --block_size 128 \
    --overwrite_cache

Reproduction

The scripts folder contains SLURM scripts for reproducing all experiments reported in the paper, using these three files. The data folder contains all the experimental data used in the experiments. The utils folder contains utility functions that were used to generate the experimental data. The results of all recognition, recall, and retention experiments reported in the paper are available from this Huggingface dataset repository.

Releases

No releases published

Packages

No packages published