Papers about training data quality management for ML models.
-
Updated
Jun 7, 2024
Papers about training data quality management for ML models.
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
OpenDataVal: a Unified Benchmark for Data Valuation in Python (NeurIPS 2023)
The pyDVL slides for pyData Berlin 2024
The Medium of Exchange of Ecosystem
Code for paper 'Interpretable Triplet Importance for Personalized Ranking' in submission
Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)
💱 A curated list of data valuation (DV) to design your next data marketplace
Supplementary programmes for DeRDaVa: Deletion-Robust Data Valuation for Machine Learning.
Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"
Algorithms for data valuation and benchmarks
Simulation environment for data collection dynamics.
This is an official repository for "2D-Shapley: A Framework for Fragmented Data Valuation" (ICML2023).
Code for the submission to the ML Reproducibility Challenge 2022, reproducing "If you like Shapley then you'll love the core"
This is an official repository for "LAVA: Data Valuation without Pre-Specified Learning Algorithms" (ICLR2023).
PyTorch reimplementation of computing Shapley values via Truncated Monte Carlo sampling from "What is your data worth? Equitable Valuation of Data" by Amirata Ghorbani and James Zou [ICML 2019]
Add a description, image, and links to the data-valuation topic page so that developers can more easily learn about it.
To associate your repository with the data-valuation topic, visit your repo's landing page and select "manage topics."