Some additional works on Edward Choi's medGAN

In this repository, I share my own work that is based on Edward Choi's medGAN. Congrats to Edward's excellent work!

medGAN (for medical GAN) is a generative adversarial network (GAN) for electronic health records (EHR). medGAN implements the algorithm introduced in the following paper:

Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun  
Machine Learning for Healthcare (MLHC) 2017

I opened a few pull requests on Edward Choi's medGAN repository:

Fixing an error due to version 1.16.3 of NumPy: merged and closed (following this issue I opened).
Fixing an error when running step 2-3 with count variables: merged and closed.

Quick preview

Author: Sylvain Combettes
Dates: June 24th – Sept. 13th, 2019 (3 months)
Context: As part of my penultimate-year at Mines Nancy, I did a 3-month research internship at Servier, the second largest pharmaceutical company in France. In 2018, Servier had a €4.2 billion turnover, operated in 149 countries and had more than 22,000 employees.
Topic: Generating fictitious realistic patient data in order to boost the prediction score [synthesis, dataset augmentation].
Method: Combining GANs (generative adversarial networks) with autoencoders [implicit density estimation].
Programming: Python.
Result: The prediction score can be increased by more than 5% on binary values.
Links: [5 pages synthetic report] [full 62 pages report] [slides]

Abstract

In the first chapter, we do a general presentation on GANs, in particular how they work. GANs are a revolutionary generative model invented by Ian Goodfellow in 2014. The key idea behind GANs is to have two neural networks competing against each other: the generator and the discriminator. GANs can synthesize samples that are impressively realistic.

In the second chapter, we apply GANs to patient data. The method is called medGAN (for medical GAN) and was developed by Edward Choi in 2018. medGAN can only synthesize binary or count values. There are two main applications of medGAN: privacy and dataset augmentation. We only focus on dataset augmentation from a real-life dataset: we generate fictitious yet realistic samples that can then be concatenated with the real-life dataset into an augmented dataset (that has more samples). Training a predictive model on the augmented dataset rather than on the real-life dataset can boost the prediction score (if the generated data is realistic enough).

How to use this repository

01_tips-for-medgan.md: Additional explanations on how to run Edward Choi's medGAN. In this markdown, I add explanations that complete Edward Choi's README.md of his medGAN repository.
02_how-medgan-binary-works.ipynb: Understanding how medGAN works on the MIMIC-III dataset of shape (46 520, 1 071) with binary values. In this notebook, I provide code cells and explanations to help better understand and run medGAN (on binary values). I also measure the accuracy of the fake generated dataset comparing to the real-life original one.
03_how-medgan-count-works.ipynb: Understanding how medGAN works (with count features). In this notebook, I provide code cells and explanations that can help better understand and run medGAN on count features.
04_accuracy-medgan-binary-small.ipynb: Using medGAN on the MIMIC-III dataset of shape (1000, 100) with binary values. This is shorter version of 02_how-medgan-binary-works.ipynb: we only sample from a dataset of shape (1000, 100) instead of (46520, 1071) and check the accuracy.
05_prediction-augmentation.ipynb: Using medGAN to boost the prediction score with data augmentation on the MIMIC-III dataset of shape (1000, 100) with binary values. In this notebook, I use medGAN to perform data augmentation, thus to boost prediction performance. Spoiler: it works (under some conditions).

Note: Due to confidentiality reasons, the data is not available on my repository. If you wish to have access to the data, please refer to my report for the process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01_tips-for-medgan.md

01_tips-for-medgan.md

02_how-medgan-binary-works.ipynb

02_how-medgan-binary-works.ipynb

03_how-medgan-count-works.ipynb

03_how-medgan-count-works.ipynb

04_accuracy-medgan-binary-small.ipynb

04_accuracy-medgan-binary-small.ipynb

05_prediction-augmentation.ipynb

05_prediction-augmentation.ipynb

README.md

README.md

Repository files navigation

Some additional works on Edward Choi's medGAN

Quick preview

Abstract

How to use this repository

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
01_tips-for-medgan.md		01_tips-for-medgan.md
02_how-medgan-binary-works.ipynb		02_how-medgan-binary-works.ipynb
03_how-medgan-count-works.ipynb		03_how-medgan-count-works.ipynb
04_accuracy-medgan-binary-small.ipynb		04_accuracy-medgan-binary-small.ipynb
05_prediction-augmentation.ipynb		05_prediction-augmentation.ipynb
README.md		README.md

sylvaincom/medgan-tips

Folders and files

Latest commit

History

Repository files navigation

Some additional works on Edward Choi's medGAN

Quick preview

Abstract

How to use this repository

About

Topics

Resources

Stars

Watchers

Forks

Languages