This repository tracks the latest research on machine unlearning in large language models (LLMs). The goal is to offer a comprehensive list of papers, datasets, and resources relevant to the topic.
Note
If you believe your paper on LLM unlearning is not included, or if you find a mistake, typo, or information that is not up to date, please open an issue, and I will address it as soon as possible.
If you want to add a new paper, feel free to either open an issue or create a pull request.
- To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models
- Author(s): Barbulescu and Triantafillou
- Date: 2024-05
- Venue: ICML 2024
- Code: -
- SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
- Author(s): Jia et al.
- Date: 2024-04
- Venue: -
- Code: GitHub
- Machine Unlearning in Large Language Models
- Author(s): Chen et al.
- Date: 2024-04
- Venue: -
- Code: -
- Offset Unlearning for Large Language Models
- Author(s): Huang et al.
- Date: 2024-04
- Venue: -
- Code: GitHub
- Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge
- Author(s): Lu et al.
- Date: 2024-04
- Venue: -
- Code: -
- Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
- Author(s): Zhang et al.
- Date: 2024-04
- Venue: -
- Code: GitHub
- Localizing Paragraph Memorization in Language Models
- Author(s): Stoehr et al.
- Date: 2024-03
- Venue: -
- Code: -
- The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
- Author(s): Li et al.
- Date: 2024-03
- Venue: -
- Code: GitHub
- Dissecting Language Models: Machine Unlearning via Selective Pruning
- Author(s): Pochinkov and Schoots
- Date: 2024-03
- Venue: -
- Code: -
- Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models
- Author(s): Gu et al.
- Date: 2024-03
- Venue: -
- Code: -
- Ethos: Rectifying Language Models in Orthogonal Parameter Space
- Author(s): Gao et al.
- Date: 2024-03
- Venue: -
- Code: -
- Towards Efficient and Effective Unlearning of Large Language Models for Recommendation
- Author(s): Wang et al.
- Date: 2024-03
- Venue: -
- Code: GitHub
- Guardrail Baselines for Unlearning in LLMs
- Author(s): Thaker et al.
- Date: 2024-03
- Venue: ICLR 2024 SeT-LLM Workshop
- Code: -
- Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning
- Author(s): Zhao et al.
- Date: 2024-02
- Venue: -
- Code: -
- Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination
- Author(s): Dong et al.
- Date: 2024-02
- Venue: -
- Code: GitHub
- Towards Safer Large Language Models through Machine Unlearning
- Author(s): Liu et al.
- Date: 2024-02
- Venue: -
- Code: GitHub
- Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
- Author(s): Wang et al.
- Date: 2024-02
- Venue: -
- Code: -
- Unlearnable Algorithms for In-context Learning
- Author(s): Muresanu et al.
- Date: 2024-02
- Venue: -
- Code: -
- Machine Unlearning of Pre-trained Large Language Models
- Author(s): Yao et al.
- Date: 2024-02
- Venue: -
- Code: GitHub
- Visual In-Context Learning for Large Vision-Language Models
- Author(s): Zhou et al.
- Date: 2024-02
- Venue: -
- Code: -
- EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
- Author(s): Xing et al.
- Date: 2024-02
- Venue: -
- Code: -
- Unlearning Reveals the Influential Training Data of Language Models
- Author(s): Isonuma and Titov
- Date: 2024-01
- Venue: -
- Code: -
- TOFU: A Task of Fictitious Unlearning for LLMs
- Author(s): Maini et al.
- Date: 2024-01
- Venue: -
- Code: GitHub
- FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs
- Author(s): Kadhe et al.
- Date: 2023-12
- Venue: NeurIPS 2023 SoLaR Workshop
- Code: -
- Making Harmful Behaviors Unlearnable for Large Language Models
- Author(s): Zhou et al.
- Date: 2023-11
- Venue: -
- Code: -
- Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models
- Author(s): Ni et al.
- Date: 2023-11
- Venue: -
- Code: -
- Who's Harry Potter? Approximate Unlearning in LLMs
- Author(s): Eldan and Russinovich
- Date: 2023-10
- Venue: -
- Code: -
- DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models
- Author(s): Wu et al.
- Date: 2023-10
- Venue: EMNLP 2023
- Code: GitHub
- Unlearn What You Want to Forget: Efficient Unlearning for LLMs
- Author(s): Chen and Yang
- Date: 2023-10
- Venue: EMNLP 2023
- Code: GitHub
- In-Context Unlearning: Language Models as Few Shot Unlearners
- Author(s): Pawelczyk et al.
- Date: 2023-10
- Venue: -
- Code: -
- Large Language Model Unlearning
- Author(s): Yao et al.
- Date: 2023-10
- Venue: NeurIPS 2023 SoLaR Workshop
- Code: GitHub
- Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble
- Author(s): Liu and Kalinli
- Date: 2023-09
- Venue: -
- Code: -
- Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
- Author(s): Patil et al.
- Date: 2023-09
- Venue: -
- Code: GitHub
- Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
- Author(s): Hu et al.
- Date: 2023-08
- Venue: AAAI 2024
- Code: GitHub
- Unlearning Bias in Language Models by Partitioning Gradients
- Author(s): Yu et al.
- Date: 2023-07
- Venue: ACL (Findings) 2023
- Code: GitHub
- Make Text Unlearnable: Exploiting Effective Patterns to Protect Personal Data
- Author(s): Li et al.
- Date: 2023-07
- Venue: -
- Code: -
- What can we learn from Data Leakage and Unlearning for Law?
- Author(s): Borkar
- Date: 2023-07
- Venue: -
- Code: -
- LEACE: Perfect linear concept erasure in closed form
- Author(s): Belrose et al.
- Date: 2023-06
- Venue: NeurIPS 2023
- Code: GitHub
- Composing Parameter-Efficient Modules with Arithmetic Operations
- Author(s): Zhang et al.
- Date: 2023-06
- Venue: NeurIPS 2023
- Code: GitHub
- KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment
- Author(s): Wang et al.
- Date: 2023-05
- Venue: -
- Code: GitHub
- Editing Models with Task Arithmetic
- Author(s): Ilharco et al.
- Date: 2022-12
- Venue: ICLR 2023
- Code: GitHub
- Privacy Adhering Machine Un-learning in NLP
- Author(s): Kumar et al.
- Date: 2022-12
- Venue: -
- Code: -
- The CRINGE Loss: Learning what language not to model
- Author(s): Adolphs et al.
- Date: 2022-11
- Venue: -
- Code: -
- Knowledge Unlearning for Mitigating Privacy Risks in Language Models
- Author(s): Jang et al.
- Date: 2022-10
- Venue: -
- Code: GitHub
- Quark: Controllable Text Generation with Reinforced Unlearning
- Author(s): Lu et al.
- Date: 2022-05
- Venue: NeurIPS 2022
- Code: GitHub
- DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
- Author(s): Liu et al.
- Date: 2021-05
- Venue: ACL 2021
- Code: GitHub
- Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
- Author(s): Blanco-Justicia et al.
- Date: 2024-04
- Venue: -
- Machine Unlearning for Traditional Models and Large Language Models: A Short Survey
- Author(s): Xu
- Date: 2024-04
- Venue: -
- The Frontier of Data Erasure: Machine Unlearning for Large Language Models
- Author(s): Qu et al.
- Date: 2024-03
- Venue: -
- Rethinking Machine Unlearning for Large Language Models
- Author(s): Liu et al.
- Date: 2024-02
- Venue: -
- Eight Methods to Evaluate Robust Unlearning in LLMs
- Author(s): Lynch et al.
- Date: 2024-02
- Venue: -
- Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges
- Author(s): Si et al.
- Date: 2023-11
- Venue: -
- Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions
- Author(s): Zhang et al.
- Date: 2023-07
- Venue: -
- Machine Unlearning in 2024
- Author(s): Ken Liu
- Date: 2024-05
- Deep Forgetting & Unlearning for Safely-Scoped LLMs
- Author(s): Stephen Casper
- Date: 2023-12
- TOFU
- Description: A synthetic QA dataset of fictitious authors generated by GPT-4. The datasets comes with three splits of the retain/forget sets, including 99/1, 95/5, and 90/10 (in percentage). The dataset also includes questions about real authors and world facts to evaluate the loss of general knowledge after unlearning.
- Links: arXiv, Hugging Face
- WMDP
- Description: A benchmark for assessing hazardous knowledge in biology, chemistry, and cybersecurity, containing about 4000 multiple-choice questions with similar style to MMLU. It also comes with corpora in the three domains.
- Links: arXiv, Hugging Face
- MMLU Subsets
- Description: A task proposed along with the WMDP dataset. The goal is to unlearn (retain) three categories in the MMLU dataset: economics (econometrics and others), physics (math and others), and law (jurisprudence and others). The task requires high-precision unlearning, because the retain sets are categories closely related to the unlearning categories.
- Links: arXiv, Hugging Face
- arXiv, GitHub, and copyrighted books corpus
- Description: A dataset for evaluating approximate unlearning algorithms for pre-trained LLMs. The dataset contains both forget and retain splits of each category, and comes with both in-distribution and general retain sets. The dataset is deisgned for unlearning directly on pre-trained models, as they are random samples from the pre-training dataset of Yi.
- Links: arXiv, Hugging Face