Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



25 Commits

Repository files navigation


Interpretable ML @ Avanade ITS

EmTech V-Team - Explainable AI

Nema Sobhani
IT Analytics, Avanade


Explainability/interpretability overview and demonstration of ML tools in the Azure stack used at Avanade.


I. Background

II. In Action

III. Other Approaches

IV. Wrap-up

I. Background

Do to our unique relationship with Microsoft, we have been given direct access to product owners for Microsoft's cutting edge machine learning, interpretability, and explainability tools including interpret-ml, interpret-community, and the azureml sdk (May Hu, Mehrnoosh Sameki, Ilya Matiach).

Why do we need Intepretable ML?

"The goal of science is to gain knowledge, but many problems are solved with big datasets and black box machine learning models. The model itself becomes the source of knowledge instead of the data. Interpretability makes it possible to extract this additional knowledge captured by the model."

- Christoph Molnar, ‘Interpretable Machine Learning’


  • Financial Services/Banking (fraud detection)
  • Marketing (user engagement)
  • Healthcare (individualized medicine and tracking)
  • Epidemiology (disease outbreak modeling)


  • Validation of domain knowledge
  • Provides actionable evidence
  • Guides data practices and feedback


SHapley Additive exPlanations

Gives both globally and locally accurate and consistent feature importance values derived from individual contributions (drawn from Lloyd Shapley's work in combinatorial game theory).

Ideal for use with opaque models (boosted tree, kernel-based, NN, etc).

MSFT's Interpretability Offerings

Microsoft addressed the need for a unified API that makes it easy to get model explanation/feature importances based on various model types, built in to their machine learning platform.

  • From the SDK
    • pip install --upgrade azureml-sdk[explain,interpret,notebooks]
  • Only interpretability package
    • pip install interpret-community

Using the TabularExplainer object, the model type is detected and the appropriate SHAP explainer is selected to generate feature importances.

Original Model Invoked Explainer
Tree-based models SHAP TreeExplainer
Deep Neural Network models SHAP DeepExplainer
Linear models SHAP LinearExplainer
None of the above SHAP KernelExplainer

This package also supports a Mimic Explainer (Global Surrogate) and a Permutation Feature Importance Explainer (PFI), both of which are model-agnostic and will be covered later.

II. In Action

Dummy Example

Scikit-Learn Breast Cancer Binary Classification

Pip installations:

pip install numpy, pandas, sklearn, lightgbm, interpret-community[visualization]

import numpy as np
import pandas as pd
import sklearn.datasets as datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
import lightgbm as lgbm
# Load and partition data
data = datasets.load_breast_cancer()

X =
y = # 0 = malignant, 1 = benign
feature_names = data.feature_names.tolist()
classes = data.target_names.tolist()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.25, random_state=42, stratify=y_train) 
# Model training
clf = lgbm.LGBMClassifier()
    eval_set=[(X_valid, y_valid)],

y_pred = clf.predict(X_test)

print('\nConfusion Matrix: \n', confusion_matrix(y_test, y_pred))
print('\nClassification Report: \n', classification_report(y_test, y_pred))
[25]	valid_0's auc: 0.989519	valid_0's binary_logloss: 0.145683
[50]	valid_0's auc: 0.987226	valid_0's binary_logloss: 0.136426
[75]	valid_0's auc: 0.986243	valid_0's binary_logloss: 0.172291
[100]	valid_0's auc: 0.989846	valid_0's binary_logloss: 0.189166

Confusion Matrix: 
 [[40  2]
 [ 3 69]]

Classification Report: 
               precision    recall  f1-score   support

           0       0.93      0.95      0.94        42
           1       0.97      0.96      0.97        72

    accuracy                           0.96       114
   macro avg       0.95      0.96      0.95       114
weighted avg       0.96      0.96      0.96       114
# Feature Importance (SHAP)
from interpret_community import TabularExplainer

explainer = TabularExplainer(clf, initialization_examples=X_train, features=feature_names, classes=classes)
# Global Feature Importances
global_explanation = explainer.explain_global(X_train)
display(pd.DataFrame.from_dict(global_explanation.get_feature_importance_dict(), orient='index', columns=['SHAP Value']).head(10))
SHAP Value
worst area 2.341634
worst concave points 1.823587
worst perimeter 1.361179
worst texture 1.167765
area error 0.810253
mean concave points 0.637934
worst smoothness 0.334095
mean texture 0.311042
worst radius 0.200612
mean smoothness 0.200187
# Local Feature Importances (for predicting benign class)
shap = global_explanation.local_importance_values[1]

df_shap = pd.DataFrame(shap, columns=feature_names)
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension ... worst radius worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension
0 -0.021539 0.321744 -0.021727 -0.001728 -0.782009 0.013229 -0.071580 -0.124397 -0.043490 -0.034941 ... 0.188589 1.557899 1.157621 2.275592 -0.408454 -0.001883 0.029665 1.730362 0.047198 -0.109017
1 -0.126181 -0.371998 -0.073260 -0.001627 -0.214240 -0.027172 -0.291059 -0.304990 -0.199762 -0.004075 ... -0.041869 -2.656456 -2.995181 -0.203860 -0.740862 0.005230 -0.307423 -0.285139 -0.441779 -0.004399
2 0.003145 0.339770 -0.035467 -0.001743 -0.745712 0.039464 0.062067 0.488236 -0.047262 -0.037220 ... 0.035996 1.199373 1.354136 1.989123 -0.379708 0.062644 0.059645 1.566148 -0.050104 0.031548
3 0.067903 0.657175 0.012231 0.003307 0.123375 -0.365281 0.033967 0.533232 0.012120 0.062579 ... 0.079415 2.114178 0.219407 1.221378 0.153687 0.045893 0.045342 1.441597 0.435639 -0.054577
4 -0.055729 -0.170627 -0.046742 -0.002422 -0.724170 0.009585 -0.229870 -0.462293 -0.172071 -0.042171 ... -0.077372 -2.193885 1.002135 2.284974 -0.459759 -0.027261 -0.353700 -5.373368 -0.290265 -0.026400

5 rows × 30 columns

# Visualization
from interpret_community.widget import ExplanationDashboard

display(ExplanationDashboard(global_explanation, clf, datasetX=X_train, trueY=y_train))

Avanade Interpretability VM (Demo)

III. Other Approaches

Different Methods

Local Interpretable Model-agnostic Explanations (LIME)

Explainable surrogate models are trained on the predictions of the opaque model, therefore allowing local, interpretable explanations. No guarantee to be globally relevant.

Global Surrogates Models (Mimic Explainers)

Same as LIME, but applied to global scale. Must be an interpretable model (tree or linear) that trains on the original data with the addition of the predicted label of the opaque model.

Permutation Feature Importance (PFI)

Shuffles dataset, feature by feature and measures effect on performance metric. Larger changes are attributable as more important features.

Diverse Counterfactual Explanations (DiCE)

Uses feature perturbations to give actionable outcomes on requirements to shift between classes.

ie. If credit score was > 700, user X would likely move into the "Loan Approved" classification.

Competing offerings

AWS Sagemaker Debugger just recently started utilizing the shap package microsoft has integrated into azure ml, contrasting the maturity of Microsoft's early investment in explainable AI.

Oracle's "Skater", is a python package that supports local interpretation using LIME and global interpretation using scalable bayesian rule lists and tree surrogates. The documentation is rather sparse and there doesn't seem to be any momentum to expand to other methods.

Scikit-learn's built-in feature importances provide some value for simple models, but lack the depth and versatility of msft's interpret-community.

Explain like I’m 5 (ELI5) uses LIME and PFI on opaque models and offers specialized support for text classifiers. Does not offer any visual utilities or shap.

The webapp, ml-interpret, is an online-only platform where a dataset may be uploaded and a model selected to explain outcomes of opaque models, but there is practically no customizability and the user is size-restricted.

IV. Wrap-up


Sources/Suggested Reading

Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019
Lundberg SM et al. "A Unified Approach to Interpreting Model Predictions", NIPS 2017
Shapley, LS. "Notes on the n-Person Game -- II: The Value of an n-Person Game", 1951



Explainability/interpretability overview and demonstration of ML tools in the Azure stack used at Avanade IT Analytics.






No releases published


No packages published