Medical-Diagnosis-Learning

Abstract

Understanding clinical notes to extract diagnoses information is a long-standing and challenging task lying at the confluence of Healthcare and Natural Language Understanding. In this work, we perform experiments to learn hierarchical representations from discharge summaries to classify the final diagnoses of patients in a multi-class and multi-label setting. We also investigate the role played by different sections of these clinical note in influencing the performance of the system. Further, we use soft-attention mechanism in our model to allow better interpretebility and faster convergence. The report can be read here.

Running the code

MIMIC-III dataset

The dataset can be downloaded from Physionet website by requesting access to it.

Preprocessed Data Generation

cd src
sh data_gen_scripts/new_run.sh

The above command will take raw data and generate different datasets as mentioned in Table 4 of the paper. It'll also print the data stats as mentioned in the paper.

Before running this command, please open this file and change the data path to where the MIMIC-III files DIAGNOSIS.csv and NOTEEVENTS.csv is located. Also use the --generatesplits 1 when using the preprocessing script for the first time.

Training the model

To train model with content 4:

python master_train_script.py --train_path <data_location>/50codesL5_UNK_content_4_top100_train_data.pkl --val_path <data_location>/50codesL5_UNK_content_4_top100_valid_data.pkl --model_dir <location to save model in> --attention 1 --num_workers 12 --embed_path <path to saved embeddings>/stsp_model.tsv --num_epochs 15 --exp_name attention1_50_content4_top100_stsp --use_starspace 1 --multilabel 1 --batch_size 8 --lr 1e-3

Commands to run the Attention model with other variants of the dataset can be found in src/princerun.sh. The corresponding command for Word-Sentence encoder model are in src/princerun_wordsent.sh.

If running the script for the first time, add the --build_starspace 1 flag and --starspace_exec <path to Starspace/run.sh> file.

Viewing the learning curves

tensorboard --logdir=log

Evaluating on trainset

python eval_test.py --train_path <data_location>/50codesL5_UNK_content_4_top100_train_data.pkl --val_path <data_location>/50codesL5_UNK_content_4_top100_test_data.pkl --model_path <path to saved model> --attention 1 --batch_size 8

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
literature		literature
notebooks		notebooks
src		src
visualize		visualize
.gitignore		.gitignore
README.md		README.md
workplan.md		workplan.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

literature

literature

notebooks

notebooks

src

src

visualize

visualize

.gitignore

.gitignore

README.md

README.md

workplan.md

workplan.md

Repository files navigation

Medical-Diagnosis-Learning

Abstract

Running the code

MIMIC-III dataset

Preprocessed Data Generation

Training the model

Viewing the learning curves

Evaluating on trainset

About

Releases

Packages

Contributors 3

Languages

anantzoid/Medical-Diagnosis-Learning

Folders and files

Latest commit

History

Repository files navigation

Medical-Diagnosis-Learning

Abstract

Running the code

MIMIC-III dataset

Preprocessed Data Generation

Training the model

Viewing the learning curves

Evaluating on trainset

About

Topics

Resources

Stars

Watchers

Forks

Languages