Skip to content

Source repository for Editing Common Sense in Transformers (EMNLP 2023)

License

Notifications You must be signed in to change notification settings

anshitag/memit_csk

Repository files navigation

Editing Common Sense in Transformers

EMNLP 2023 Paper: Editing Common Sense in Transformers

We find that commonsense judgments by GPT-2 are associated with localized parameters in early MLP layers of the models by conducting causal mediation analyses. We propose to correct commonsense judgments in transformer models by $MEMIT_{CSK}$, an adaptation of Mass-Editing Memory in a Transformer that can edit subject, verb, or object token positions and features a robust editing layer selection strategy.

Table of Contents

Installation

Similar to MEMIT installation instructions. We recommend conda for managing Python, CUDA, and PyTorch; pip is for everything else. To get started, simply install conda and run:

CONDA_HOME=$CONDA_HOME ./scripts/setup_conda.sh

$CONDA_HOME should be the path to your conda installation, e.g., ~/miniconda3.

Datasets

The 20 Question and PEP 3K datasets are under the data folder. Each dataset consists of Train, Edit Validation, Edit, and Probe Sets.

  • Training Set and Edit Validation Set: Formed by randomly dividing the validation set from Porada et al. (2021) into an 80%-20% split.
  • Edit Set: The test set from Porada et al. (2021).
  • Probe Set: For the subset of Edit Set that was incorrectly predicted by both GPT-2 Large and XL Base Model, we augment each instance with semantically related instances generated by GPT-3 text-davinci-003. The types of relations we consider include unaffected neighborhood, affected neighborhood, affected paraphrase, and affected reasoning, which are detailed in Sec. 4.2.1 of our paper.

Evaluation Metrics

We report three metrics on the Edit Validation Set and Edit Set:

  • F1 Score (↑), a measure of overall performance
  • Efficacy (↑), the percentage of previously incorrect predictions that are corrected by an update method
  • Relapse (↓), the percentage of instances that were previously predicted correctly but are now predicted incorrectly

We report accuracy on different types of augmented instances in the Probe Set.

The metrics are at eval_utils_csk.py.

Base Finetuning

script_base_finetuning.sh can be used for running base finetuning on the GPT2-XL model for the 20q dataset. Similar command can be used for running experiments for GPT2-Large model and PEP 3K dataset.

Causal Tracing

script_causal_trace_zero_shot.sh can be used for performing causal tracing experiment for zero shot model.

script_causal_trace.sh can be used for performing causal tracing experiment for base finetuned model, by passing its checkpoint location as a parameter and output inference file.

script_causal_trace_severed.sh can be used for performing severed causal tracing experiment for base finetuned model, by passing it's checkpoint location as a parameter and output inference file.

Repair Finetuning

script_repair_finetuning.sh can be used for running repair finetuning on the base finetuned GPT2-XL model for the 20q dataset. Similar command can be used for running experiments for GPT2-Large model and PEP 3K dataset.

It includes commands to evaluate the affected and unaffected metrics for the repair finetuned model.

MEMIT_CSK Experiment

script_memit_csk.sh can be used for running $MEMIT_{CSK}$ on the base finetuned GPT2-XL model for the 20q dataset. Similar commands can be used for running experiments for GPT2-Large model and PEP 3K dataset.

It includes commands to (1) find best hyperparamters for the Edit Validation Set and perform configuration generalization evaluation on the Edit Set, and (2) run semantic generalization evaluation on the Probe Set. Please refer to Section 3.3-3.4 in our paper for descriptions of the two types of generalization.

How to Cite

@inproceedings{gupta2023editing,
    title = "Editing Common Sense in Transformers",
    author={Gupta, Anshita and Mondal, Debanjan and Sheshadri, Akshay Krishna and Zhao, Wenlong and Li, Xiang Lorraine and Wiegreffe, Sarah and Tandon, Niket},
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2305.14956",
}