Equitable Valuation of Data Using Shapley Values (PyTorch Implementation)

Note: the implementation is currently lacking a retraining step. I welcome any PRs to fix this. See #1.

This is a PyTorch reimplementation of Computing Shapley Values via Truncated Monte Carlo sampling from What is your data worth? Equitable Valuation of Data by Amirata Ghorbani and James Zou. The original implementation (In Tensorflow) can be found here.

This implementation is currently designed for neural networks, and the only available performance metric is model classification accuracy, but contributions to expand the implementation are welcome.

Why Compute Shapley Values?
Requirements
Usage

Why Compute Shapley Values?

Computing Shapley values help when you need a system to rank the importance of your training data, which may arise when you need to prune your training data of harmful images, or when you need to provide compensation for data from multiple sources.

It differs from computing the value based on the leave-one-out method (LOO), because Shapley values satisfy three main properties:

Null Data: If a datum does not change the model performance if it is added to any subset of the training data, then its value is zero.
Equality: For any data x & y, if x has equal contribution to y when added to any subset of the training data, then x and y have the same Shapley value.
Additive: If datum x contributes S_x(d_1) and S_x(d_2) to test data 1 and 2, respectively, then the value of x for both points is S_x(d_1) + S_x(d_2).

Requirements

Python 3.6 or later
PyTorch 1.0 or later
NumPy 1.12 or later
Pickle
Tqdm

Usage

from tmc import DShap

# Supplied by the user:
model = get_my_model()
train_set, test_set = get_my_datasets()

dshap = DShap(model, train_set, testset, directory='your_directory')

dshap.run(save_every=100, err=0.1, tolerance=0.01)

This outputs a pickle file that contains the sampled Shapley Values. You can convert this into a numpy array of dimensions (Iterations x # of Training Points).

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
LICENSE		LICENSE
README.md		README.md
tmc.py		tmc.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

tmc.py

tmc.py

utils.py

utils.py

Repository files navigation

Equitable Valuation of Data Using Shapley Values (PyTorch Implementation)

Note: the implementation is currently lacking a retraining step. I welcome any PRs to fix this. See #1.

Why Compute Shapley Values?

Requirements

Usage

LICENSE

About

Releases

Packages

Languages

License

ajsanjoaquin/Shapley_Valuation

Folders and files

Latest commit

History

Repository files navigation

Equitable Valuation of Data Using Shapley Values (PyTorch Implementation)

Note: the implementation is currently lacking a retraining step. I welcome any PRs to fix this. See #1.

Why Compute Shapley Values?

Requirements

Usage

LICENSE

About

Topics

Resources

License

Stars

Watchers

Forks

Languages