GitHub

Introduction

This is the official code of paper: Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era. In this repo, we implement several explanation methods for LLMs, including a gradient-based attribution method, a EK-FAC approximated influence function, and an in-context demonstration strategy. Our implementations could be easily extended to various language model families, such as GPT-2, LLaMA, and Mistral. This codebase could serve as a foundational resource for advancing discussions on XAI in the era of LLMs.

Environment

Setup: We assume that you manage the environment with Conda library.

>>> conda create -n UsableXAI python=3.9 -y
>>> conda activate UsableXAI
>>> pip install -U requirements.txt

Dataset: We include three public datasets for case studies: MultiRC, HalluEval-V2, and SciFact. They are located in the ./datasets/ folder.

Explanation Methods

The implemented explanation methods are in the ./libs/core/ folder. Our implementation should be easily adapted to different language model families from the Huggingface transformers library.

Case Studies

Our case studies are listed in the ./Case_Studies/ folder.

Hallucination Detection

We propose to use attribution scores between the responses and the prompts to develop a hallucination detector. Our case study shows that a smaller language model (i.e., Vicuna-7B) can be used to detect the hallucinated responses generated by a large model (i.e., ChatGPT). See details at here.
LLM Response Verification

We propose to use attribution scores between the responses and the input contents to estimate whether a generated response is reliable or not. Our case study shows that the content highlighted by the attribution scores can be used to verify the quality of the corresponding response. See details at here.
Training Sample Influence Estimation

We implement the influence function for LLMs (e.g., Vicuna-13B and Mistral-7B) according to the EK-FAC estimation suggested by Roger Grosse et al. (2023). Our case study shows that EK-FAC is an practical strategy to estimate the contribution of each training samples for response generation. See details at here.
Is CoT Really Making LLM Inferences Explainable?

We consider the fidelity metric to measure the faithfulness of Chain-of-Thoughts (CoTs) in explaining model predictions. Our case study shows that the explanation contents in CoTs can generally be regarded as the explanation for the final prediction. However, these explanations may not be faithful to the final prediction in some cases. Details are coming soon.

TODO List

Improve documents and readme
Release the explanation methods as a package

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Case_Studies		Case_Studies
datasets		datasets
libs		libs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Case_Studies

Case_Studies

datasets

datasets

libs

libs

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Introduction

Environment

Explanation Methods

Case Studies

TODO List

About

Releases

Packages

Languages

License

JacksonWuxs/UsableXAI_LLM

Folders and files

Latest commit

History

Repository files navigation

Introduction

Environment

Explanation Methods

Case Studies

TODO List

About

Resources

License

Stars

Watchers

Forks

Languages