This repository contains code for the BHI 2022 conference paper.
- Paper on ArXiv: http://arxiv.org/abs/2210.02241
- Paper on IEEE: https://doi.org/10.1109/BHI56158.2022.9926777
- Citation:
Johnson, E., Mohan, S., Gaudio, A., Smailagic, A., Faloutsos, C., & Campilho, A. (2022, September). HeartSpot: Privatized and Explainable Data Compression for Cardiomegaly Detection. In 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) (pp. 01-04). IEEE.
- Bibtex:
@inproceedings{heartspot,
title={{HeartSpot}: Privatized and Explainable Data Compression for Cardiomegaly Detection},
author={Johnson, Elvin and Mohan, Shreshta and Gaudio, Alex and Smailagic, Asim and Faloutsos, Christos and Campilho, Aur{\'e}lio},
year={2022}, month={10},
booktitle={IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)},
pages={01--04},
organization={IEEE},
url = {https://doi.org/10.1109/BHI56158.2022.9926777}
}
The code used for the paper was extracted from a code repository that contained code for another project. I attempted to extract only the relevant material pertaining to this paper. Please let me know how to fix an issue or otherwise if you encountered one.
$ git clone https://github.com/adgaudio/HeartSpot.git
$ cd HeartSpot
CheXpert dataset download link: https://stanfordmlgroup.github.io/competitions/chexpert/
# after downloading, ensure your directory structure matches this
$ find ./data/CheXpert-v1.0-small/ -maxdepth 1
data/CheXpert-v1.0-small/
data/CheXpert-v1.0-small/train.csv
data/CheXpert-v1.0-small/train
data/CheXpert-v1.0-small/valid
data/CheXpert-v1.0-small/valid.csv
# note that the dataset is about 12gb
$ du -sm data/CheXpert-v1.0-small/
12366 data/CheXpert-v1.0-small/
I used anaconda with python 3.9.7 and these packages.
Hopefully this is all packages; the same environment was used for another project and had other packages installed. Raise an issue and suggest how to update the Readme if there is a missing requirement.
$ conda create -n heartspot python=3.9
$ conda activate heartspot
$ conda install scipy numpy matplotlib IPython seaborn pandas
$ conda install scikit-learn termcolor
$ pip install torch==1.9.1 torchvision==0.10.1 captum==0.4.0
$ pip install --no-deps opencv-python
$ pip install efficientnet-pytorch pampy==0.3.0 simple-parsing==0.0.17
$ pip install --no-deps simplepytorch==v1.0.0
Results will be saved into ./results
Note 1: You may want to review the code yourself before running it. The script expects you
have redis-server installed and running and will try to parallelize jobs across
all available GPUs on the local machine. I suggest to review the file bin/experiments_heartspot.sh
, starting from the last lines of the file.
Note 2: I used two GPUs with 11gb of GPU RAM, a CPU with 24 cores and 128GB RAM. You may need to tweak num_workers and batch_size accordingly.
Note 3: By default, if you just run
the "python heartspot/train.py ..." part of each command without the stuff before it, and without explicitly passing --experiment_id
or using an environment variable like run_id=myexperiment123
, the results will write to ./results/debugging
.
$ ./bin/experiments_heartspot.sh
As jobs run, you'll see output like this:
$ ls results/*.HL8*
'results/5.HL8.densenet121:untrained:1:1.':
20220507T183954.263817971_console.log checkpoints lock log
20220507T183954.263817971_git.log class_thresholds.csv lock.log perf.csv
results/5.HL8.heart.:
20220503T154105.284457043_console.log checkpoints lock perf.csv
20220503T154105.284457043_git.log class_thresholds.csv log
results/5.HL8.heart+densenet.:
20220505T181610.866709553_console.log checkpoints lock log
20220505T181610.866709553_git.log class_thresholds.csv lock.log perf.csv
...
Review and run any of these:
$ ls bin/plot*hline*
bin/plot_hline_acc_vs_time.py bin/plot_hline_getsaliency2.py
bin/plot_hline_archdiagramfigs.py bin/plot_hline_getsaliency.py