EmTech V-Team - Explainable AI
Nema Sobhani
IT Analytics, Avanade
Explainability/interpretability overview and demonstration of ML tools in the Azure stack used at Avanade.
I. Background
II. In Action
III. Other Approaches
IV. Wrap-up
Do to our unique relationship with Microsoft, we have been given direct access to product owners for Microsoft's cutting edge machine learning, interpretability, and explainability tools including interpret-ml, interpret-community, and the azureml sdk (May Hu, Mehrnoosh Sameki, Ilya Matiach).
"The goal of science is to gain knowledge, but many problems are solved with big datasets and black box machine learning models. The model itself becomes the source of knowledge instead of the data. Interpretability makes it possible to extract this additional knowledge captured by the model."
- Christoph Molnar, ‘Interpretable Machine Learning’
Applications
- Financial Services/Banking (fraud detection)
- Marketing (user engagement)
- Healthcare (individualized medicine and tracking)
- Epidemiology (disease outbreak modeling)
Benefits
- Validation of domain knowledge
- Provides actionable evidence
- Guides data practices and feedback
SHapley Additive exPlanations
Gives both globally and locally accurate and consistent feature importance values derived from individual contributions (drawn from Lloyd Shapley's work in combinatorial game theory).
Ideal for use with opaque models (boosted tree, kernel-based, NN, etc).
Microsoft addressed the need for a unified API that makes it easy to get model explanation/feature importances based on various model types, built in to their machine learning platform.
- From the SDK
pip install --upgrade azureml-sdk[explain,interpret,notebooks]
- Only interpretability package
pip install interpret-community
Using the TabularExplainer
object, the model type is detected and the appropriate SHAP explainer is selected to generate feature importances.
Original Model | Invoked Explainer |
---|---|
Tree-based models | SHAP TreeExplainer |
Deep Neural Network models | SHAP DeepExplainer |
Linear models | SHAP LinearExplainer |
None of the above | SHAP KernelExplainer |
This package also supports a Mimic Explainer (Global Surrogate) and a Permutation Feature Importance Explainer (PFI), both of which are model-agnostic and will be covered later.
Scikit-Learn Breast Cancer Binary Classification
Pip installations:
pip install numpy, pandas, sklearn, lightgbm, interpret-community[visualization]
import numpy as np
import pandas as pd
import sklearn.datasets as datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
import lightgbm as lgbm
# Load and partition data
data = datasets.load_breast_cancer()
X = data.data
y = data.target # 0 = malignant, 1 = benign
feature_names = data.feature_names.tolist()
classes = data.target_names.tolist()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.25, random_state=42, stratify=y_train)
# Model training
clf = lgbm.LGBMClassifier()
clf.fit(
X=X_train,
y=y_train,
eval_set=[(X_valid, y_valid)],
eval_metric='auc',
feature_name=feature_names,
verbose=25
)
y_pred = clf.predict(X_test)
print('\nConfusion Matrix: \n', confusion_matrix(y_test, y_pred))
print('\nClassification Report: \n', classification_report(y_test, y_pred))
[25] valid_0's auc: 0.989519 valid_0's binary_logloss: 0.145683
[50] valid_0's auc: 0.987226 valid_0's binary_logloss: 0.136426
[75] valid_0's auc: 0.986243 valid_0's binary_logloss: 0.172291
[100] valid_0's auc: 0.989846 valid_0's binary_logloss: 0.189166
Confusion Matrix:
[[40 2]
[ 3 69]]
Classification Report:
precision recall f1-score support
0 0.93 0.95 0.94 42
1 0.97 0.96 0.97 72
accuracy 0.96 114
macro avg 0.95 0.96 0.95 114
weighted avg 0.96 0.96 0.96 114
# Feature Importance (SHAP)
from interpret_community import TabularExplainer
explainer = TabularExplainer(clf, initialization_examples=X_train, features=feature_names, classes=classes)
# Global Feature Importances
global_explanation = explainer.explain_global(X_train)
display(pd.DataFrame.from_dict(global_explanation.get_feature_importance_dict(), orient='index', columns=['SHAP Value']).head(10))
SHAP Value | |
---|---|
worst area | 2.341634 |
worst concave points | 1.823587 |
worst perimeter | 1.361179 |
worst texture | 1.167765 |
area error | 0.810253 |
mean concave points | 0.637934 |
worst smoothness | 0.334095 |
mean texture | 0.311042 |
worst radius | 0.200612 |
mean smoothness | 0.200187 |
# Local Feature Importances (for predicting benign class)
shap = global_explanation.local_importance_values[1]
df_shap = pd.DataFrame(shap, columns=feature_names)
display(df_shap.head())
mean radius | mean texture | mean perimeter | mean area | mean smoothness | mean compactness | mean concavity | mean concave points | mean symmetry | mean fractal dimension | ... | worst radius | worst texture | worst perimeter | worst area | worst smoothness | worst compactness | worst concavity | worst concave points | worst symmetry | worst fractal dimension | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -0.021539 | 0.321744 | -0.021727 | -0.001728 | -0.782009 | 0.013229 | -0.071580 | -0.124397 | -0.043490 | -0.034941 | ... | 0.188589 | 1.557899 | 1.157621 | 2.275592 | -0.408454 | -0.001883 | 0.029665 | 1.730362 | 0.047198 | -0.109017 |
1 | -0.126181 | -0.371998 | -0.073260 | -0.001627 | -0.214240 | -0.027172 | -0.291059 | -0.304990 | -0.199762 | -0.004075 | ... | -0.041869 | -2.656456 | -2.995181 | -0.203860 | -0.740862 | 0.005230 | -0.307423 | -0.285139 | -0.441779 | -0.004399 |
2 | 0.003145 | 0.339770 | -0.035467 | -0.001743 | -0.745712 | 0.039464 | 0.062067 | 0.488236 | -0.047262 | -0.037220 | ... | 0.035996 | 1.199373 | 1.354136 | 1.989123 | -0.379708 | 0.062644 | 0.059645 | 1.566148 | -0.050104 | 0.031548 |
3 | 0.067903 | 0.657175 | 0.012231 | 0.003307 | 0.123375 | -0.365281 | 0.033967 | 0.533232 | 0.012120 | 0.062579 | ... | 0.079415 | 2.114178 | 0.219407 | 1.221378 | 0.153687 | 0.045893 | 0.045342 | 1.441597 | 0.435639 | -0.054577 |
4 | -0.055729 | -0.170627 | -0.046742 | -0.002422 | -0.724170 | 0.009585 | -0.229870 | -0.462293 | -0.172071 | -0.042171 | ... | -0.077372 | -2.193885 | 1.002135 | 2.284974 | -0.459759 | -0.027261 | -0.353700 | -5.373368 | -0.290265 | -0.026400 |
5 rows × 30 columns
# Visualization
from interpret_community.widget import ExplanationDashboard
display(ExplanationDashboard(global_explanation, clf, datasetX=X_train, trueY=y_train))
Explainable surrogate models are trained on the predictions of the opaque model, therefore allowing local, interpretable explanations. No guarantee to be globally relevant.
Same as LIME, but applied to global scale. Must be an interpretable model (tree or linear) that trains on the original data with the addition of the predicted label of the opaque model.
Shuffles dataset, feature by feature and measures effect on performance metric. Larger changes are attributable as more important features.
Uses feature perturbations to give actionable outcomes on requirements to shift between classes.
ie. If credit score was > 700, user X would likely move into the "Loan Approved" classification.
AWS Sagemaker Debugger just recently started utilizing the shap package microsoft has integrated into azure ml, contrasting the maturity of Microsoft's early investment in explainable AI.
Oracle's "Skater", is a python package that supports local interpretation using LIME and global interpretation using scalable bayesian rule lists and tree surrogates. The documentation is rather sparse and there doesn't seem to be any momentum to expand to other methods.
Scikit-learn's built-in feature importances provide some value for simple models, but lack the depth and versatility of msft's interpret-community
.
Explain like I’m 5 (ELI5) uses LIME and PFI on opaque models and offers specialized support for text classifiers. Does not offer any visual utilities or shap.
The webapp, ml-interpret, is an online-only platform where a dataset may be uploaded and a model selected to explain outcomes of opaque models, but there is practically no customizability and the user is size-restricted.
Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019
Lundberg SM et al. "A Unified Approach to Interpreting Model Predictions", NIPS 2017
Shapley, LS. "Notes on the n-Person Game -- II: The Value of an n-Person Game", 1951
https://github.com/slundberg/shap
https://github.com/interpretml/interpret-community
https://github.com/interpretml/DiCE
https://github.com/TeamHG-Memex/eli5
https://github.com/oracle/Skater
http://ml-interpret.herokuapp.com/