Skip to content

'Track Patient Recovery in Real-Time by Processing Streaming Data' Project Repository

License

Notifications You must be signed in to change notification settings

Biomedical-Data-Design/Stream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stream: Biomedical Data Design Project Repository

Contributors: Zhenyu Xiao*, Haobin Zhou*, Yimeng Xu, and Emma Cardenas.

Affiliation: Department of Biomedical Engineering, Johns Hopkins University

Project Overview

This repository is part of the Biomedical Data Design course, where we focus on tracking patient recovery in real time by processing streaming data. The primary data source for this project is the eICU Collaborative Research Database, accessible on their official website after completing the required course on data security and ethics.

Included here are the source code, weekly presentation slides, and additional resources necessary to understand and engage with our project.

Getting Started:

This project is written in Python 3. You can run this project online using Colaboratory and upload your data to your Google Drive, or run it on your local machine.

When using Google Colaboratory, most of the CSV files will be generated in the directory 'My Drive/Colab Notebooks', only the model input data will be stored in 'Stream/Models' automatically.

STEP 1: Clone Github Repo into Colaboratory

To clone the Github Repository into Google Colaboratory, click and run this link. You will create a folder named 'Stream' in your Google Drive.

STEP 2: Upload eICU

Upload the unzipped eICU files to your Google Drive under 'My Drive/EICU/eicu-collaborative-research-database-2.0'.

STEP 3: Extract Available Patient IDs

Run the file 'Stream/Preprocess/func_check_patient_num.ipynb'. This notebook will filter patients with unavailable features and generate the file 'Final_available_patients.csv'.

Step 4: Extract Data & Align

Run these notebooks in 'Stream/Preprocess' to extract data and interpolate with GPR (except 'Patient_Results.ipynb'):

  • BloodPressure.ipynb
  • Glasgow.ipynb
  • HeartRate.ipynb
  • Pao2fio2-fio2.ipynb
  • Pao2fio2-pao2.ipynb
  • Temp.ipynb
  • Urine.ipynb
  • lab1_BUN.ipynb
  • lab2_WBC.ipynb
  • lab3_bicarbonate.ipynb
  • lab4_sodium.ipynb
  • lab5_potassuim.ipynb
  • lab6_bilirubin.ipynb
  • Patient_Results.ipynb

This step will be time-consuming if using the whole eICU database.

We recommend running these notebooks parallelly in different Colab sessions to save time.

Step 5: Concatenate Data & Run

Run 'Stream/Preprocess/Organize_all_data.ipynb' to merge all the features into '13features.csv'.

As written in Section 4 of the paper, we used 3 machine learning models and 1 deep learning model.

For Machine Learning models, run the notebook 'Stream/Preprocessing/ml_models.ipynb' or use this file, then run the notebook 'Stream/Models/ml_models.ipynb' to evaluate the performance of ML models.

For LSTM, run the notebook 'Stream/Preprocessing/Balance_LSTM.ipynb' to generate the balanced data or use the files in this link, then run the notebook 'Stream/Models/LSTM.ipynb' to evaluate the performance of LSTM (you may use GPU to accelerate this final process).

To run the project locally:

  • Adjust file paths in the code to your local directories.
  • Set up your environment using Anaconda and CUDA as needed. See the installation guide below for details.

Installation Guide

Here we provide an example of how to install the environment on a local machine using Anaconda and CUDA 11.8. For non-GPU & other CUDA version installation, please refer to the PyTorch website when installing PyTorch. We remark that this repository does not depend on a specific CUDA version, feel free to use any CUDA version suitable on your own computer.

# create conda environment
conda create -n bdd python=3.9 -y
conda activate bdd
conda install numpy pandas matplotlib scikit-learn xgboost jupyter pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Even though here we provide 'environment.yaml', it may have redundancy. We recommend you try to install other required packages by running the code and finding which required package hasn't been installed yet.

About

'Track Patient Recovery in Real-Time by Processing Streaming Data' Project Repository

Resources

License

Stars

Watchers

Forks