Skip to content

nphdang/DeepCoDA

Repository files navigation

DeepCoDA: deep learning for personalized interpretability for compositional health data

This is the implementation of the DeepCoDA model in the paper "DeepCoDA: personalized interpretability for compositional health data", ICML 2020: https://arxiv.org/abs/2006.01392

Introduction

Interpretability allows the domain-expert to directly evaluate the model's relevance and reliability, a practice that offers assurance and builds trust. In the healthcare setting, interpretable models should implicate relevant biological mechanisms independent of technical factors like data pre-processing.

Some health data, especially those generated by high-throughput sequencing experiments, have nuances that compromise precision health models and their interpretation. These data are compositional, meaning that each feature is conditionally dependent on all other features.

We propose the DeepCoDA framework to extend precision health modelling to high-dimensional compositional data, and to provide personalized interpretability through patient-specific weights. Our architecture maintains state-of-the-art performance across 25 real-world data sets, all while producing interpretations that are both personalized and fully coherent for compositional data.

DeepCoDA network architecture

network_architecture

Installation

  1. Python 3.6
  2. scikit-learn 0.23.1
  3. keras 2.24
  4. tensorflow 1.10.0
  5. seaborn 0.11

How to run

  • To run the model without attention: "python DeepCoDA_without_attention.py --dataset data_id --level B --l1 lambda"
  • To run the model with attention: "python DeepCoDA_with_attention.py --dataset data_id --level B --l1 lambda"
  • data_id is a dataset ID (default is "5a"). If dataset_id is "all", then the model will run with all datasets
  • B is the number of log-bottlenecks (default is "5")
  • l1 is L1 penalty term (default is "0.01")

Reference

Thomas P. Quinn, Dang Nguyen, Santu Rana, Sunil Gupta, Svetha Venkatesh (2020). DeepCoDA: personalized interpretability for compositional health data. ICML 2020