Data Crimes

Code:

Data:

Official github repository for the paper "Implicit Data Crimes: Machine Learning Bias Arising from Misuse of Public Data". The paper has been accepted to PNAS (Feb 2022) and will be published soon with full open access.

Here are our arXiv preprint Subtle data crimes: naively training machine learning algorithms could lead to overly-optimistic results and ISMRM'21 abstract.

Notice: repository name change

The repository's name recently changed from "subtle_data_crimes" to "data_crimes".

If you already cloned the repository with the previous name, you can update the remote origin using this git command: git remote set-url origin git@github.com:mikgroup/data_crimes.git
The code package is still named subtle_data_crimes, so importing functions is done as follows: subtle_data_crimes.functions.utils import calc_pad_half or import subtle_data_crimes.functions.utils as utils

Video

A 20-minute talk about this research is available here (this talk was recorded during the MRI Together 2021 conference), and a 5-min talk is here (this link requires registration to the ISMRM'21 conference).

Installation

To use this package:

Clone or download it.
Install the required python packages (tested with python 3.7 on Ubuntu 20.04 LTS) by running (creating a virtual enviroment is recommened):

pip install --upgrade pip
pip install -e .

If you prefer to install using coda, you can also run this:

 ```conda install -c intel mkl-service==2.3.0```

Install the pytorch_wavelets library following the instructions here. Specifically:

git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .

In order to fully reproduce our experiments, data should be downloaded from the FastMRI database (see details in the next section).

Data

All the data used in this research was obtained from the FastMRI database.

To download the data, you need to request access to FastMRI and follow their procedures.

Comments:

After downloading the data, please copy the file

file_brain_AXT2_207_2070504.h5

from the FastMRI multi-coil brain training data should be copied to the folder:

crime_1_zero_padding/brain_data

For full reproducibility, we mention that the next two files from the FastMRI multi-coil knee training data (which include pathology examples)

file1000425.h5
file1002455.h5

were used in our pathology experiments. However, there's no need to place them in a separate folder, the code will find them in the FastMRI training data folder

Pre-trained networks

The weights of our pre-trained networks are publicly available on Zenodo.

Acknowledgements

The Compressed Sensing implementation was based on Sigpy.

The Dictionary Learning algorithm implementation is based on Jon Tamir's ISMRM 2020 educational tutorial.

The authors acknowledge funding from grants U24 EB029240-01, R01EB009690, R01HL136965.

Terms of use

This repository is published under the MIT license. It may be used only for research/educational purpose. It may not be used for clinical diagnosis or any other clinical purpose.

Citation

When using our code in your future publications, please cite our paper. Thank you.

@article{shimron2021subtle,
  title={Subtle inverse crimes: Na$\backslash$" ively training machine learning algorithms could lead to overly-optimistic results},
  author={Shimron, Efrat and Tamir, Jonathan I and Wang, Ke and Lustig, Michael},
  journal={arXiv preprint arXiv:2109.08237},
  year={2021}
}

Contact

Problems? Questions? contact efrat.s@berkeley.edu

Efrat Shimron, UC Berkeley (2022)

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
subtle_data_crimes		subtle_data_crimes
LICENSE		LICENSE
README.md		README.md
README_fig.png		README_fig.png
logo_arXiv_paper.png		logo_arXiv_paper.png
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subtle_data_crimes

subtle_data_crimes

LICENSE

LICENSE

README.md

README.md

README_fig.png

README_fig.png

logo_arXiv_paper.png

logo_arXiv_paper.png

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Data Crimes

Notice: repository name change

Video

Installation

Data

Pre-trained networks

Acknowledgements

Terms of use

Citation

Contact

About

Releases 4

Packages

Contributors 2

Languages

License

mikgroup/data_crimes

Folders and files

Latest commit

History

Repository files navigation

Data Crimes

Notice: repository name change

Video

Installation

Data

Pre-trained networks

Acknowledgements

Terms of use

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Languages