Prediction explanations: display Index ID for best/worst prediction explanations #1119

gsheni · 2020-08-27T16:19:24Z

Goal
If users intend to understand the best and worst predictions, the API should allow them to see the Index of the explanations.

If the user can provide the index column (perhaps through DataTables), that index ID could be displayed in the reported outputted by explain_predictions_best_worst.

If the user doesn't provide an index column, no index ID should be displayed in the report.

Proposal
Add index ID to the prediction explanation, if the user provides an index column in X.

Note the Index ID:

        Best 1 of 2

                Predicted Probabilities: [benign: 0.0, malignant: 1.0]
                Predicted Value: malignant
                Target Value: malignant
                Cross Entropy: 0.0
                Index ID: 45


                     Feature Name         Feature Value        Contribution to        SHAP Value
                                                                 Prediction
                ================================================================================
                    worst perimeter          155.30                   +                  0.10
                     worst radius             23.14                   +                  0.08
                 worst concave points         0.17                    +                  0.08
                worst fractal dimension       0.09                    -                 -0.00
                   compactness error          0.04                    -                 -0.00
                    worst symmetry            0.22                    -                 -0.00


        Best 2 of 2

                Predicted Probabilities: [benign: 0.0, malignant: 1.0]
                Predicted Value: malignant
                Target Value: malignant
                Cross Entropy: 0.0
                Index ID: 2

                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                  worst perimeter         166.10                   +                   0.10
                    worst radius           25.45                   +                   0.08
                worst concave points       0.22                    +                   0.08
                 compactness error         0.03                    -                  -0.00
                 worst compactness         0.21                    -                  -0.00
                   worst symmetry          0.21                    -                  -0.00


        Worst 1 of 2

                Predicted Probabilities: [benign: 0.552, malignant: 0.448]
                Predicted Value: benign
                Target Value: malignant
                Cross Entropy: 0.802
                Index ID: 7

                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                  smoothness error         0.00                    +                   0.04
                    mean texture           21.58                   +                   0.03
                   worst texture           30.25                   +                   0.02
                worst concave points       0.11                    -                  -0.02
                    worst radius           15.93                   -                  -0.03
                mean concave points        0.02                    -                  -0.03

The text was updated successfully, but these errors were encountered:

dsherry · 2020-08-27T16:31:30Z

@gsheni what do you mean by "the Index of the explanations"? What is the "index" in this context?

Are you asking for this information to a) appear visually (which I believe it already does), b) appear in the new JSON response @freddyaboulton is adding, in which case why not just use the position in the list as the index?

gsheni · 2020-08-27T16:37:00Z

Each explanation is a reference to a element (instance) in X. That instance should have an index. I think that index should be in the report.

I don't believe this information appears visually. I did not see it in the docs:
https://evalml.featurelabs.com/en/latest/user_guide/model_understanding.html#Explaining-Multiple-Predictions
I also spoke with @freddyaboulton and confirmed this.

kmax12 · 2020-08-27T16:39:37Z

@gsheni is your example right? for a single explanation, there should only be 1 index value, why do you have it different for each feature?

dsherry · 2020-08-27T16:42:09Z

@gsheni ah, so you want to know, for each prediction explanation which was generated, what was the index in the features dataframe?

If so, doesn't the caller always know that index? Because in order to call prediction explanations, you need to pass in some rows. And in order to pass in rows, you need to select which rows to pass in :)

dsherry · 2020-08-27T16:43:45Z

You are right that we don't currently show the feature DF index value in the prediction explanations returned by evalml. I guess I hadn't considered that as adding value since the caller has to know that info in order to call.

If I'm misunderstanding please let me know

gsheni · 2020-08-27T16:52:17Z

@kmax12 yes, you are right. Fixed the printout example.

@dsherry Yes, I suppose the caller could get that information if they wanted to. It would require the caller re-run the following (outside of explain_predictions_best_worst)

(regression)

y_pred = pipeline.predict(input_features)
errors = metric(y_true, y_pred)
sorted_scores = errors.sort_values()
best = sorted_scores.index[:num_to_explain]
worst = sorted_scores.index[-num_to_explain:]

I would then have to find the index IDs of the best/worst score.

freddyaboulton · 2020-08-27T16:53:51Z

@gsheni Yea you're right. I agree that adding the index value to the output of explain_predictions_best_worst can add value to the user.

dsherry · 2020-08-27T17:10:15Z

Thanks all for the clarification. Yep agreed. I put this in the icebox because I don't think its high priority. If anyone feels different, let's talk.

gsheni added the enhancement An improvement to an existing feature. label Aug 27, 2020

dsherry changed the title ~~Add Index ID to best and worst prediction explanations~~ Prediction explanations: display Index ID for best/worst prediction explanations Oct 8, 2020

freddyaboulton mentioned this issue Oct 29, 2020

Adding the Index ID to explain_prediction_best_worst_output #1365

Merged

freddyaboulton closed this as completed in #1365 Nov 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction explanations: display Index ID for best/worst prediction explanations #1119

Prediction explanations: display Index ID for best/worst prediction explanations #1119

gsheni commented Aug 27, 2020 •

edited

dsherry commented Aug 27, 2020

gsheni commented Aug 27, 2020 •

edited

kmax12 commented Aug 27, 2020

dsherry commented Aug 27, 2020

dsherry commented Aug 27, 2020

gsheni commented Aug 27, 2020 •

edited

freddyaboulton commented Aug 27, 2020

dsherry commented Aug 27, 2020

Prediction explanations: display Index ID for best/worst prediction explanations #1119

Prediction explanations: display Index ID for best/worst prediction explanations #1119

Comments

gsheni commented Aug 27, 2020 • edited

dsherry commented Aug 27, 2020

gsheni commented Aug 27, 2020 • edited

kmax12 commented Aug 27, 2020

dsherry commented Aug 27, 2020

dsherry commented Aug 27, 2020

gsheni commented Aug 27, 2020 • edited

freddyaboulton commented Aug 27, 2020

dsherry commented Aug 27, 2020

gsheni commented Aug 27, 2020 •

edited

gsheni commented Aug 27, 2020 •

edited

gsheni commented Aug 27, 2020 •

edited