Skip to content

lucasplagwitz/grouped_permutation_importance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grouped Permutation Importance

Understanding the fundamentals of a decision-making process is, for most purposes, an essential step in the field of machine learning. In this context, the analysis of predefined groups of features can provide important indications for comprehending and improving the prediction. This repository extend the univariate permutation importance to a grouped version for evaluating the influence of whole feature subsets in a machine learning model. This is done by a slight modification of the permutation importance of scikit-learn.

Install via pip

pip install git+https://github.com/lucasplagwitz/grouped_permutation_importance
from grouped_permutation_importance import grouped_permutation_importance

data = load_breast_cancer()
feature_names = data["feature_names"].tolist()
X, y = data["data"], data["target"]

idxs = []
columns = ["mean", "error", "worst"]
for key in columns:
    idxs.append([x for (x, y) in enumerate(feature_names) if key in y])

cv = RepeatedStratifiedKFold()
pipe = Pipeline([("MinMax", MinMaxScaler()),  ("SVC", SVC())])


r = grouped_permutation_importance(pipe, X, y, idxs=idxs, n_repeats=50, random_state=0, 
                                   scoring="balanced_accuracy", n_jobs=5, cv=cv, 
                                   perm_set="test")

Simulation

In the file "examples/make_class.py" a small simulation is shown to verify correctness. Based on scikit-learns make_classification method, different informative subsets are analyzed.

Model interpretation

The file "examples/brain_atlas.py" demonstrates a neuroimaging example for rating brain regions depending on the target variable (age, CDR, biological sex).

Citing

If you use the Grouped Permutation Importance in a scientific publication, we would appreciate citations to the following paper:

Lucas Plagwitz, Alexander Brenner, Michael Fujarski, and Julian Varghese. Supporting AI-Explainability by Analyzing Feature Subsets in a Machine Learning Model.
Studies in Health Technology and Informatics, Volume 294: Challenges of Trustable AI and Added-Value on Health. doi:10.3233/SHTI220406