Explain Predictions: Small discrepancies between base value and average probability #32

ajdapretnar · 2021-06-18T14:58:52Z

Assuming I understand the widget and SHAP values correctly, the "Base value" should be the average probability for the given class value. In reality, this is not the case.

Example. I use heart-disease data and Logistic Regression. Then I predict the first instance in the data set. This is the result of Explain Predictions.

Base value is supposed to be 0.47. I now check this in the Box Plot. I use the same model with Predictions and pass the same "Background Data". Then I use Box Plot to observe the probabilities for a given class value, in this case, the Logistic Regression (1). This is the result.

The mean of Logistic Regression predictions in Box Plot is 0.458. Explain Predictions reports base value as 0.474. Why the difference?

Versions:
shap==0.37.0
Shapely==1.7.1

matejklemen · 2021-11-20T20:46:47Z

You are correct that the base value should be the average (or rather expected) probability. SHAP however does not compute this on all reference data (because it could be very slow) but instead computes it on centroids of clustered reference data. So TL;DR, the base value is an approximation.

In your concrete case, you provide 303 instances as original reference data, which get clustered into 10 clusters (k=10 set in _explain_other_models). The 10 centroids are passed through the model and the expected value is calculated based on these predictions, weighted by the proportion of original reference instances which fall into each cluster.

PrimozGodec · 2022-01-21T11:18:15Z

As @matejklemen said it is an approximation that is in all cases very close to the actual base value. Here the solution could be to calculate our own base value (to make predictions and then average them), but I do not know if it is really necessary.

@ajdapretnar, do you have a case where it would be necessary to have an exact base value? Maybe we can just add information that is an approximation in the documentation.

ajdapretnar · 2022-01-21T12:01:42Z

@PrimozGodec My naive interpretation is class distribution (i.e. 33% for iris setosa) and mean (i.e. 22.533 for housing).

But I assume this is not what is meant by "base value".

Btibert3 · 2022-03-23T16:04:34Z

I was about to log this and see that there is active discussion. I replicated the example code from the python library shap, and then exported the data to Orange. Subtle difference on the reported base value in Orange as discussed above, but spot on if I only used the python library.

PrimozGodec · 2023-08-21T10:49:49Z

Closing since it is expected behaviour.

PrimozGodec closed this as completed Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain Predictions: Small discrepancies between base value and average probability #32

Explain Predictions: Small discrepancies between base value and average probability #32

ajdapretnar commented Jun 18, 2021

matejklemen commented Nov 20, 2021

PrimozGodec commented Jan 21, 2022

ajdapretnar commented Jan 21, 2022

Btibert3 commented Mar 23, 2022

PrimozGodec commented Aug 21, 2023

Explain Predictions: Small discrepancies between base value and average probability #32

Explain Predictions: Small discrepancies between base value and average probability #32

Comments

ajdapretnar commented Jun 18, 2021

matejklemen commented Nov 20, 2021

PrimozGodec commented Jan 21, 2022

ajdapretnar commented Jan 21, 2022

Btibert3 commented Mar 23, 2022

PrimozGodec commented Aug 21, 2023