Code base for the study: "Interpretable machine learning with tree-based shapley additive explanations: application to metabolomics datasets for binary classification."
Clone local copy of git repository
git clone https://github.com/obifarin/shap-iml-metabolomics
(or use a git GUI client of your choice)
Setup python environment (pls-da-shap.yml
) in the terminal.
(or use anaconda GUI.)
Name | Description |
---|---|
01_Sex_MTBLS404.ipynb | Discriminating biological sex via urine metabolomics. |
02_HighFatDiet_MTBLS547.ipynb | The impact of a high-fat diet on bile acids in the cecum. |
03_Adenocarcinoma_ST000369.ipynb | Detecting Adenocarcinoma via serum metabolomics. |
04-pubmed-metabolomics.ipynb | Keyword occurrences by year for partial least squares regression, random forest, and gradient boosting in metabolomics publications on PubMed. |
- The anaconda environment for this work: pls-da-shap.yml
- The important models are saved in the folder saved_models.
- PyChemometrics is the folder for the library used in PLS-DA computation for this work. Some PLS-DA code wouldn't run if you don't have them in the same directory from which you run the code.
- Data folder contains the raw data used in this study, as prepared by Mendez et al in his paper.