Skip to content

JacksonWuxs/UsableXAI_LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This is the official code of paper: Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era. In this repo, we implement several explanation methods for LLMs, including a gradient-based attribution method, a EK-FAC approximated influence function, and an in-context demonstration strategy. Our implementations could be easily extended to various language model families, such as GPT-2, LLaMA, and Mistral. This codebase could serve as a foundational resource for advancing discussions on XAI in the era of LLMs.

Environment

  • Setup: We assume that you manage the environment with Conda library.

    >>> conda create -n UsableXAI python=3.9 -y
    >>> conda activate UsableXAI
    >>> pip install -U requirements.txt
    
  • Dataset: We include three public datasets for case studies: MultiRC, HalluEval-V2, and SciFact. They are located in the ./datasets/ folder.

Explanation Methods

  • The implemented explanation methods are in the ./libs/core/ folder. Our implementation should be easily adapted to different language model families from the Huggingface transformers library.

Case Studies

Our case studies are listed in the ./Case_Studies/ folder.

  • Hallucination Detection

    We propose to use attribution scores between the responses and the prompts to develop a hallucination detector. Our case study shows that a smaller language model (i.e., Vicuna-7B) can be used to detect the hallucinated responses generated by a large model (i.e., ChatGPT). See details at here.

  • LLM Response Verification

    We propose to use attribution scores between the responses and the input contents to estimate whether a generated response is reliable or not. Our case study shows that the content highlighted by the attribution scores can be used to verify the quality of the corresponding response. See details at here.

  • Training Sample Influence Estimation

    We implement the influence function for LLMs (e.g., Vicuna-13B and Mistral-7B) according to the EK-FAC estimation suggested by Roger Grosse et al. (2023). Our case study shows that EK-FAC is an practical strategy to estimate the contribution of each training samples for response generation. See details at here.

  • Is CoT Really Making LLM Inferences Explainable?

    We consider the fidelity metric to measure the faithfulness of Chain-of-Thoughts (CoTs) in explaining model predictions. Our case study shows that the explanation contents in CoTs can generally be regarded as the explanation for the final prediction. However, these explanations may not be faithful to the final prediction in some cases. Details are coming soon.

TODO List

  • Improve documents and readme
  • Release the explanation methods as a package

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages