Causal Abstractions of Neural Natural Language Inference Models

This is the implementation for the experiments in the paper Causal Abstractions of Neural Natural Language Inference Models.

Setup and dependencies

See requirements.txt.

intervention/ Basic infrastructure for defining computation graphs and performing interventions.
compgraphs/ Computation graphs for Natural Language Inference causal models and neural models.
causal_abstraction/ Interchange experiments and analysis.
datasets/ Class definitions for datasets.
modeling/ Neural models for NLI and training code.
probing/ Probing experiments.
experiment/ Utilities for launching experiments and automatically recording experiment results in databases.
feature_importance/ Utilities for integrated gradients experiments.

Training models

train_bert.py and train_lstm.py. Train one instance of a model.
train_manager.py. Utilities for interfacing with the experiment module and managing grid search training.

Interchange experiments

interchange.py. Run one set of an interchange experiment on a given causal model intermediate node, and all neural model locations for that node. Analyze the success rates of interventions.
graph_analysis.py. Composes the graph linking the examples after interchange experiments and finds cliques.
interchange_manager.py. Utilities for interfacing with the experiment module and run large batches of interchange experiments on a computing cluster.

Probing experiments

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
__pycache__		__pycache__
causal_abstraction		causal_abstraction
compgraph_cpp		compgraph_cpp
compgraphs		compgraphs
data/tokenization		data/tokenization
datasets		datasets
experiment		experiment
feature_importance		feature_importance
intervention		intervention
misc		misc
modeling		modeling
mqnli		mqnli
probing		probing
scripts		scripts
tests		tests
.gitignore		.gitignore
Example.ipynb		Example.ipynb
README.md		README.md
abstractions.txt		abstractions.txt
graph_analysis.py		graph_analysis.py
integrated_gradients_analyses.ipynb		integrated_gradients_analyses.ipynb
integrated_gradients_calculations.ipynb		integrated_gradients_calculations.ipynb
interchange.mplstyle		interchange.mplstyle
interchange.py		interchange.py
interchange_analysis.ipynb		interchange_analysis.ipynb
interchange_manager.py		interchange_manager.py
probe.py		probe.py
probe_manager.py		probe_manager.py
probing_analysis.ipynb		probing_analysis.ipynb
requirements.txt		requirements.txt
setup.py		setup.py
train_bert.py		train_bert.py
train_lstm.py		train_lstm.py
train_manager.py		train_manager.py