Stream: Biomedical Data Design Project Repository

Contributors: Zhenyu Xiao*, Haobin Zhou*, Yimeng Xu, and Emma Cardenas.

Affiliation: Department of Biomedical Engineering, Johns Hopkins University

Project Overview

This repository is part of the Biomedical Data Design course, where we focus on tracking patient recovery in real time by processing streaming data. The primary data source for this project is the eICU Collaborative Research Database, accessible on their official website after completing the required course on data security and ethics.

Included here are the source code, weekly presentation slides, and additional resources necessary to understand and engage with our project.

Getting Started:

This project is written in Python 3. You can run this project online using Colaboratory and upload your data to your Google Drive, or run it on your local machine.

Google Colaboratory (Recommended)

When using Google Colaboratory, most of the CSV files will be generated in the directory 'My Drive/Colab Notebooks', only the model input data will be stored in 'Stream/Models' automatically.

STEP 1: Clone Github Repo into Colaboratory

To clone the Github Repository into Google Colaboratory, click and run this link. You will create a folder named 'Stream' in your Google Drive.

STEP 2: Upload eICU

Upload the unzipped eICU files to your Google Drive under 'My Drive/EICU/eicu-collaborative-research-database-2.0'.

STEP 3: Extract Available Patient IDs

Run the file 'Stream/Preprocess/func_check_patient_num.ipynb'. This notebook will filter patients with unavailable features and generate the file 'Final_available_patients.csv'.

Step 4: Extract Data & Align

Run these notebooks in 'Stream/Preprocess' to extract data and interpolate with GPR (except 'Patient_Results.ipynb'):

BloodPressure.ipynb
Glasgow.ipynb
HeartRate.ipynb
Pao2fio2-fio2.ipynb
Pao2fio2-pao2.ipynb
Temp.ipynb
Urine.ipynb
lab1_BUN.ipynb
lab2_WBC.ipynb
lab3_bicarbonate.ipynb
lab4_sodium.ipynb
lab5_potassuim.ipynb
lab6_bilirubin.ipynb
Patient_Results.ipynb

This step will be time-consuming if using the whole eICU database.

We recommend running these notebooks parallelly in different Colab sessions to save time.

Step 5: Concatenate Data & Run

Run 'Stream/Preprocess/Organize_all_data.ipynb' to merge all the features into '13features.csv'.

As written in Section 4 of the paper, we used 3 machine learning models and 1 deep learning model.

For Machine Learning models, run the notebook 'Stream/Preprocessing/ml_models.ipynb' or use this file, then run the notebook 'Stream/Models/ml_models.ipynb' to evaluate the performance of ML models.

For LSTM, run the notebook 'Stream/Preprocessing/Balance_LSTM.ipynb' to generate the balanced data or use the files in this link, then run the notebook 'Stream/Models/LSTM.ipynb' to evaluate the performance of LSTM (you may use GPU to accelerate this final process).

Local Machine

To run the project locally:

Adjust file paths in the code to your local directories.
Set up your environment using Anaconda and CUDA as needed. See the installation guide below for details.

Installation Guide

Here we provide an example of how to install the environment on a local machine using Anaconda and CUDA 11.8. For non-GPU & other CUDA version installation, please refer to the PyTorch website when installing PyTorch. We remark that this repository does not depend on a specific CUDA version, feel free to use any CUDA version suitable on your own computer.

# create conda environment
conda create -n bdd python=3.9 -y
conda activate bdd
conda install numpy pandas matplotlib scikit-learn xgboost jupyter pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Even though here we provide 'environment.yaml', it may have redundancy. We recommend you try to install other required packages by running the code and finding which required package hasn't been installed yet.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
Stream		Stream
Weekly Presentation		Weekly Presentation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream

Stream

Weekly Presentation

Weekly Presentation

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

environment.yaml

environment.yaml

Repository files navigation

Stream: Biomedical Data Design Project Repository

Project Overview

Getting Started:

Google Colaboratory (Recommended)

STEP 1: Clone Github Repo into Colaboratory

STEP 2: Upload eICU

STEP 3: Extract Available Patient IDs

Step 4: Extract Data & Align

Step 5: Concatenate Data & Run

Local Machine

Installation Guide

About

Languages

License

Biomedical-Data-Design/Stream

Folders and files

Latest commit

History

Repository files navigation

Stream: Biomedical Data Design Project Repository

Project Overview

Getting Started:

STEP 1: Clone Github Repo into Colaboratory

STEP 2: Upload eICU

STEP 3: Extract Available Patient IDs

Step 4: Extract Data & Align

Step 5: Concatenate Data & Run

Installation Guide

About

Resources

License

Stars

Watchers

Forks

Languages