Awesome Large Language Model Unlearning

This repository tracks the latest research on machine unlearning in large language models (LLMs). The goal is to offer a comprehensive list of papers, datasets, and resources relevant to the topic.

Note

If you believe your paper on LLM unlearning is not included, or if you find a mistake, typo, or information that is not up to date, please open an issue, and I will address it as soon as possible.

If you want to add a new paper, feel free to either open an issue or create a pull request.

Papers

Methods

To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models
- Author(s): Barbulescu and Triantafillou
- Date: 2024-05
- Venue: ICML 2024
- Code: -
SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
- Author(s): Jia et al.
- Date: 2024-04
- Venue: -
- Code: GitHub
Machine Unlearning in Large Language Models
- Author(s): Chen et al.
- Date: 2024-04
- Venue: -
- Code: -
Offset Unlearning for Large Language Models
- Author(s): Huang et al.
- Date: 2024-04
- Venue: -
- Code: GitHub
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge
- Author(s): Lu et al.
- Date: 2024-04
- Venue: -
- Code: -
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
- Author(s): Zhang et al.
- Date: 2024-04
- Venue: -
- Code: GitHub
Localizing Paragraph Memorization in Language Models
- Author(s): Stoehr et al.
- Date: 2024-03
- Venue: -
- Code: -
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
- Author(s): Li et al.
- Date: 2024-03
- Venue: -
- Code: GitHub
Dissecting Language Models: Machine Unlearning via Selective Pruning
- Author(s): Pochinkov and Schoots
- Date: 2024-03
- Venue: -
- Code: -
Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models
- Author(s): Gu et al.
- Date: 2024-03
- Venue: -
- Code: -
Ethos: Rectifying Language Models in Orthogonal Parameter Space
- Author(s): Gao et al.
- Date: 2024-03
- Venue: -
- Code: -
Towards Efficient and Effective Unlearning of Large Language Models for Recommendation
- Author(s): Wang et al.
- Date: 2024-03
- Venue: -
- Code: GitHub
Guardrail Baselines for Unlearning in LLMs
- Author(s): Thaker et al.
- Date: 2024-03
- Venue: ICLR 2024 SeT-LLM Workshop
- Code: -
Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning
- Author(s): Zhao et al.
- Date: 2024-02
- Venue: -
- Code: -
Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination
- Author(s): Dong et al.
- Date: 2024-02
- Venue: -
- Code: GitHub
Towards Safer Large Language Models through Machine Unlearning
- Author(s): Liu et al.
- Date: 2024-02
- Venue: -
- Code: GitHub
Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
- Author(s): Wang et al.
- Date: 2024-02
- Venue: -
- Code: -
Unlearnable Algorithms for In-context Learning
- Author(s): Muresanu et al.
- Date: 2024-02
- Venue: -
- Code: -
Machine Unlearning of Pre-trained Large Language Models
- Author(s): Yao et al.
- Date: 2024-02
- Venue: -
- Code: GitHub
Visual In-Context Learning for Large Vision-Language Models
- Author(s): Zhou et al.
- Date: 2024-02
- Venue: -
- Code: -
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
- Author(s): Xing et al.
- Date: 2024-02
- Venue: -
- Code: -
Unlearning Reveals the Influential Training Data of Language Models
- Author(s): Isonuma and Titov
- Date: 2024-01
- Venue: -
- Code: -
TOFU: A Task of Fictitious Unlearning for LLMs
- Author(s): Maini et al.
- Date: 2024-01
- Venue: -
- Code: GitHub
FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs
- Author(s): Kadhe et al.
- Date: 2023-12
- Venue: NeurIPS 2023 SoLaR Workshop
- Code: -
Making Harmful Behaviors Unlearnable for Large Language Models
- Author(s): Zhou et al.
- Date: 2023-11
- Venue: -
- Code: -
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models
- Author(s): Ni et al.
- Date: 2023-11
- Venue: -
- Code: -
Who's Harry Potter? Approximate Unlearning in LLMs
- Author(s): Eldan and Russinovich
- Date: 2023-10
- Venue: -
- Code: -
DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models
- Author(s): Wu et al.
- Date: 2023-10
- Venue: EMNLP 2023
- Code: GitHub
Unlearn What You Want to Forget: Efficient Unlearning for LLMs
- Author(s): Chen and Yang
- Date: 2023-10
- Venue: EMNLP 2023
- Code: GitHub
In-Context Unlearning: Language Models as Few Shot Unlearners
- Author(s): Pawelczyk et al.
- Date: 2023-10
- Venue: -
- Code: -
Large Language Model Unlearning
- Author(s): Yao et al.
- Date: 2023-10
- Venue: NeurIPS 2023 SoLaR Workshop
- Code: GitHub
Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble
- Author(s): Liu and Kalinli
- Date: 2023-09
- Venue: -
- Code: -
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
- Author(s): Patil et al.
- Date: 2023-09
- Venue: -
- Code: GitHub
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
- Author(s): Hu et al.
- Date: 2023-08
- Venue: AAAI 2024
- Code: GitHub
Unlearning Bias in Language Models by Partitioning Gradients
- Author(s): Yu et al.
- Date: 2023-07
- Venue: ACL (Findings) 2023
- Code: GitHub
Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
- Author(s): Li et al.
- Date: 2023-07
- Venue: -
- Code: -
What can we learn from Data Leakage and Unlearning for Law?
- Author(s): Borkar
- Date: 2023-07
- Venue: -
- Code: -
LEACE: Perfect linear concept erasure in closed form
- Author(s): Belrose et al.
- Date: 2023-06
- Venue: NeurIPS 2023
- Code: GitHub
Composing Parameter-Efficient Modules with Arithmetic Operations
- Author(s): Zhang et al.
- Date: 2023-06
- Venue: NeurIPS 2023
- Code: GitHub
KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment
- Author(s): Wang et al.
- Date: 2023-05
- Venue: -
- Code: GitHub
Editing Models with Task Arithmetic
- Author(s): Ilharco et al.
- Date: 2022-12
- Venue: ICLR 2023
- Code: GitHub
Privacy Adhering Machine Un-learning in NLP
- Author(s): Kumar et al.
- Date: 2022-12
- Venue: -
- Code: -
The CRINGE Loss: Learning what language not to model
- Author(s): Adolphs et al.
- Date: 2022-11
- Venue: -
- Code: -
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
- Author(s): Jang et al.
- Date: 2022-10
- Venue: -
- Code: GitHub
Quark: Controllable Text Generation with Reinforced Unlearning
- Author(s): Lu et al.
- Date: 2022-05
- Venue: NeurIPS 2022
- Code: GitHub
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
- Author(s): Liu et al.
- Date: 2021-05
- Venue: ACL 2021
- Code: GitHub

Surveys and Position Papers

Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
- Author(s): Blanco-Justicia et al.
- Date: 2024-04
- Venue: -
Machine Unlearning for Traditional Models and Large Language Models: A Short Survey
- Author(s): Xu
- Date: 2024-04
- Venue: -
The Frontier of Data Erasure: Machine Unlearning for Large Language Models
- Author(s): Qu et al.
- Date: 2024-03
- Venue: -
Rethinking Machine Unlearning for Large Language Models
- Author(s): Liu et al.
- Date: 2024-02
- Venue: -
Eight Methods to Evaluate Robust Unlearning in LLMs
- Author(s): Lynch et al.
- Date: 2024-02
- Venue: -
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
- Author(s): Si et al.
- Date: 2023-11
- Venue: -
Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions
- Author(s): Zhang et al.
- Date: 2023-07
- Venue: -

Blog Posts

Machine Unlearning in 2024
- Author(s): Ken Liu
- Date: 2024-05
Deep Forgetting & Unlearning for Safely-Scoped LLMs
- Author(s): Stephen Casper
- Date: 2023-12

Datasets

TOFU
- Description: A synthetic QA dataset of fictitious authors generated by GPT-4. The datasets comes with three splits of the retain/forget sets, including 99/1, 95/5, and 90/10 (in percentage). The dataset also includes questions about real authors and world facts to evaluate the loss of general knowledge after unlearning.
- Links: arXiv, Hugging Face
WMDP
- Description: A benchmark for assessing hazardous knowledge in biology, chemistry, and cybersecurity, containing about 4000 multiple-choice questions with similar style to MMLU. It also comes with corpora in the three domains.
- Links: arXiv, Hugging Face
MMLU Subsets
- Description: A task proposed along with the WMDP dataset. The goal is to unlearn (retain) three categories in the MMLU dataset: economics (econometrics and others), physics (math and others), and law (jurisprudence and others). The task requires high-precision unlearning, because the retain sets are categories closely related to the unlearning categories.
- Links: arXiv, Hugging Face
arXiv, GitHub, and copyrighted books corpus
- Description: A dataset for evaluating approximate unlearning algorithms for pre-trained LLMs. The dataset contains both forget and retain splits of each category, and comes with both in-distribution and general retain sets. The dataset is deisgned for unlearning directly on pre-trained models, as they are random samples from the pre-training dataset of Yi.
- Links: arXiv, Hugging Face

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Awesome Large Language Model Unlearning

Table of Contents

Papers

Methods

Surveys and Position Papers

Blog Posts

Datasets

About

Releases

Packages

chrisliu298/awesome-llm-unlearning

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Awesome Large Language Model Unlearning

Table of Contents

Papers

Methods

Surveys and Position Papers

Blog Posts

Datasets

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages