TreeSHAP is not exact #142

gorkemozkaya · 2019-10-25T22:34:50Z

In several places, the book suggests that the TreeSHAP algorithm provides exact calculation of the Shapley values:

"Lundberg et. al (2018)46 proposed TreeSHAP, a variant of SHAP for tree-based machine learning models such as decision trees, random forests and gradient boosted trees. TreeSHAP is fast, computes exact Shapley values, and correctly estimates the Shapley values when features are dependent. In comparison, KernelSHAP is expensive to compute and only approximates the actual Shapley values."

"Sampling from the marginal distribution means ignoring the dependence structure between present and absent features. KernelSHAP therefore suffers from the same problem as all permutation-based interpretation methods. The estimation puts too much weight on unlikely instances. Results can become unreliable. As we will see later, TreeSHAP for tree-ensembles is not affected by this issue."

I think this is not accurate, and TreeSHAP suffers from the same issue when there is statistical dependencies between features. I tried to demonstrate it in the notebook below:

https://colab.research.google.com/drive/14wsr-dGOb_suPsj8mtxOLn5zcsi39LHk

In this notebook, I created a dataset and fitted a single-tree XGBoost model. Both the dataset and the model are completely symmetric with respect to the two features: if you simply swap the feature names, neither the joint distribution nor the model prediction changes. So the Shapley Importance values for these two features must be equal. However, the TreeSHAP estimation failed to give the correct estimation. I am explaining the reason at the end of the notebook.

NicolasHug · 2019-12-11T17:35:13Z

@gorkemozkaya , I think you are quite correct.

As far as I can tell, the mistake originally comes from the paper Lundberg et. al (2018)46. Indeed, Algorithm 1 from the paper is exactly the algorithm to compute partial dependence, as originally described in [1].

By definition, the partial dependence of f on Xs is E_Xc[f(x)] where Xc is the complement of Xs. It is only equivalent to E[f(x)|xs] if the features are independent.

The TreeSHAP algorithm uses E_Xc[f(x)], i.e. the partial dependence, and it will suffer from exactly the same kind of issues that partial dependence interpretation will have.

CC @slundberg, if I may.

[1] Greedy Function Approximation, a Gradient Boosting Machine -- Jerome Friedman https://statweb.stanford.edu/~jhf/ftp/trebst.pdf

slundberg · 2019-12-11T18:51:42Z

@NicolasHug @gorkemozkaya great points. There are some subtle issues here that I am working on documenting better. The original SHAP paper proposed pure conditional expectations for measuring the value of a set of input features, and then proposed using the Shapley values to reduce this exponential number of values down to a single number for each feature. To make things more tractable we can assume feature independence. This is of course never true in practice, and so may seem like a terrible approximation. But it turns out that you can look at this assumption from a very different perspective, where you break feature dependence not because of an independence assumption, but because of arguments based on causal inference. I unfortunately did not include this comment in the final NIPS paper and a recent (more in depth) summary is at: https://arxiv.org/abs/1910.13413

SHAP as it stands today uses the interventional (causal) method of feature perturbation by default. I'll try and get the docs for this wrapped up sooner rather than later.

amueller · 2019-12-11T19:52:19Z

Thanks! Just talked to Harsha about this, will definitely look at the paper!

slundberg · 2019-12-11T23:32:47Z

@amueller I have also a longer post on this at shap/shap#882

christophM · 2019-12-12T14:34:20Z

Very interesting discussions! Thanks a lot to everyone contributing.

The Janzig et. al paper is on my list. I will read this and update the SHAP chapter accordingly.

This issue goes very deep and concerns all permutation-based interpretation methods.
When features are dependent, it seems like both options have issues: interventional creates new data points that might be far outside the data distribution (where the machine learning model might not behave nicely), conditional entangles the effects of the dependent features.

slundberg · 2019-12-12T19:18:49Z

That's right. I view it as a fundamental trade-off between being "true to the data" and never providing inputs that are off-manifold, and being "true to the model" and never letting credit bleed between correlated features. I am working on docs/discussion about this (perhaps it should be a short arXiv) but it is impossible to be both true to the model and true to the data when features are arbitrarily correlated and you want to allocate credit among each input feature. To see this imagine two perfectly correlated input features where the model depends on only one of them. There is no way to know which feature the model uses (when perturbing inputs and observing outputs) without asking what the model would do "off the data manifold".

amueller · 2019-12-12T23:50:02Z

Very interesting indeed. I'm not sure if this is addressed in your reference (skimming I didn't see it), but one other point that we were concerned about was the mismatch between the leaf-based and the brute-force method. These are particularly bad when correlated features are present, but my understanding is that even for perfectly independent features, there is a mismatch.

NicolasHug · 2019-12-13T10:37:21Z

I built an example of the discrepancy that @amueller 's talking about here (at the end)

While that's dealing with the computations of partial dependence functions, I think that this is still relevant for SHAP vs TreeSHAP (since the same methods are used)

slundberg · 2019-12-13T15:48:05Z

@amueller @NicolasHug There is indeed a difference between the output of

TreeExplainer(model, prior, feature_perturbation="tree_path_dependent").shap_values(X)
vs.
KernelExplainer(model.predict, prior).shap_values(X)

because of reasons discussed in the link @NicolasHug shared.

However there is no disagreement between

TreeExplainer(model, prior, feature_perturbation="interventional").shap_values(X)
vs.
KernelExplainer(model.predict, prior).shap_values(X, nsamples=Inf)

NicolasHug · 2019-12-13T19:26:56Z

Do you also confirm that tree_path_dependent will also disagree with KernelExplainer(model.predict, prior).shap_values(X, nsamples=Inf)?

Could you please describe what TreeExplainer(interventional) does, or provide a link?

amueller · 2019-12-13T21:00:42Z

@NicolasHug that's the papers linked above, I think.

NicolasHug · 2019-12-13T21:42:47Z

I don't think so?

arxiv.org/abs/1910.13413 is about saying that we want marginal expectations (i.e. partial dependence) instead of conditional expectation, as claimed in the original SHAP papers. It also explains that the conditional expectations are in fact approximated with marginal expectations in the first place.

What I don't understand is, how does TreeExplainer(interventional) differs from tree_path_dependent, since tree_path_dependent users partial dependence hence marginal expectation hence interventional propabilities.

slundberg · 2019-12-13T23:39:50Z

@NicolasHug Yes that is also different. TreeExplainer(interventional) is best described by our Nature MI paper that is in final proof. If you email me though I can share an early PDF directly with you.

I would say tree_path_dependent does not really use true partial dependence, but it is the best you can do if you don't have access to a background dataset and only get to use the data recorded in the model via coverage. tree_path_dependent conditions it's expectation on the values observed higher in the tree, while a true partial dependence plot has no such conditioning.

fixes #142 - TreeSHAP is not exact

pbarcelo · 2020-12-09T16:16:07Z

I would like to point out that we have recently provided a polytime algorithm that computes the SHAP scores over decision trees, and some generalized models, here: https://arxiv.org/pdf/2007.14045.pdf.

guyvdbroeck · 2021-01-08T19:21:58Z

People interested in this issue may also want to read https://arxiv.org/abs/2009.08634 which proves the problem that TreeSHAP tries to solve to be #P-hard.

christophM closed this as completed in dbefccd Apr 7, 2020

christophM added a commit that referenced this issue Apr 7, 2020

Merge pull request #168 from christophM/fix-142

d7796af

fixes #142 - TreeSHAP is not exact

LEMTideman mentioned this issue Aug 12, 2020

what's the difference between feature_perturbation="interventional" and feature_perturbation="tree_path_dependent" shap/shap#1098

Open

kakeami mentioned this issue Nov 24, 2021

On the Tractability of SHAP Explanationsを読む kakeami/blog#19

Open

sebconort mentioned this issue Jan 6, 2022

SHAP Tree algorithm breaks Shapley symmetry property shap/shap#2345

Open

a-reich mentioned this issue Mar 2, 2022

TreeExplainer vs. KernelExplainer shap/shap#512

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TreeSHAP is not exact #142

TreeSHAP is not exact #142

gorkemozkaya commented Oct 25, 2019 •

edited

NicolasHug commented Dec 11, 2019

slundberg commented Dec 11, 2019

amueller commented Dec 11, 2019

slundberg commented Dec 11, 2019

christophM commented Dec 12, 2019

slundberg commented Dec 12, 2019

amueller commented Dec 12, 2019

NicolasHug commented Dec 13, 2019

slundberg commented Dec 13, 2019

NicolasHug commented Dec 13, 2019

amueller commented Dec 13, 2019

NicolasHug commented Dec 13, 2019

slundberg commented Dec 13, 2019

pbarcelo commented Dec 9, 2020

guyvdbroeck commented Jan 8, 2021

TreeSHAP is not exact #142

TreeSHAP is not exact #142

Comments

gorkemozkaya commented Oct 25, 2019 • edited

NicolasHug commented Dec 11, 2019

slundberg commented Dec 11, 2019

amueller commented Dec 11, 2019

slundberg commented Dec 11, 2019

christophM commented Dec 12, 2019

slundberg commented Dec 12, 2019

amueller commented Dec 12, 2019

NicolasHug commented Dec 13, 2019

slundberg commented Dec 13, 2019

NicolasHug commented Dec 13, 2019

amueller commented Dec 13, 2019

NicolasHug commented Dec 13, 2019

slundberg commented Dec 13, 2019

pbarcelo commented Dec 9, 2020

guyvdbroeck commented Jan 8, 2021

gorkemozkaya commented Oct 25, 2019 •

edited