Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction explanations: display Index ID for best/worst prediction explanations #1119

Closed
gsheni opened this issue Aug 27, 2020 · 8 comments · Fixed by #1365
Closed

Prediction explanations: display Index ID for best/worst prediction explanations #1119

gsheni opened this issue Aug 27, 2020 · 8 comments · Fixed by #1365
Labels
enhancement An improvement to an existing feature.

Comments

@gsheni
Copy link
Contributor

gsheni commented Aug 27, 2020

Goal
If users intend to understand the best and worst predictions, the API should allow them to see the Index of the explanations.

If the user can provide the index column (perhaps through DataTables), that index ID could be displayed in the reported outputted by explain_predictions_best_worst.

If the user doesn't provide an index column, no index ID should be displayed in the report.

Proposal
Add index ID to the prediction explanation, if the user provides an index column in X.

Note the Index ID:

        Best 1 of 2

                Predicted Probabilities: [benign: 0.0, malignant: 1.0]
                Predicted Value: malignant
                Target Value: malignant
                Cross Entropy: 0.0
                Index ID: 45


                     Feature Name         Feature Value        Contribution to        SHAP Value
                                                                 Prediction
                ================================================================================
                    worst perimeter          155.30                   +                  0.10
                     worst radius             23.14                   +                  0.08
                 worst concave points         0.17                    +                  0.08
                worst fractal dimension       0.09                    -                 -0.00
                   compactness error          0.04                    -                 -0.00
                    worst symmetry            0.22                    -                 -0.00


        Best 2 of 2

                Predicted Probabilities: [benign: 0.0, malignant: 1.0]
                Predicted Value: malignant
                Target Value: malignant
                Cross Entropy: 0.0
                Index ID: 2

                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                  worst perimeter         166.10                   +                   0.10
                    worst radius           25.45                   +                   0.08
                worst concave points       0.22                    +                   0.08
                 compactness error         0.03                    -                  -0.00
                 worst compactness         0.21                    -                  -0.00
                   worst symmetry          0.21                    -                  -0.00


        Worst 1 of 2

                Predicted Probabilities: [benign: 0.552, malignant: 0.448]
                Predicted Value: benign
                Target Value: malignant
                Cross Entropy: 0.802
                Index ID: 7

                    Feature Name       Feature Value   Contribution to Prediction   SHAP Value
                ==============================================================================
                  smoothness error         0.00                    +                   0.04
                    mean texture           21.58                   +                   0.03
                   worst texture           30.25                   +                   0.02
                worst concave points       0.11                    -                  -0.02
                    worst radius           15.93                   -                  -0.03
                mean concave points        0.02                    -                  -0.03

@gsheni gsheni added the enhancement An improvement to an existing feature. label Aug 27, 2020
@dsherry
Copy link
Contributor

dsherry commented Aug 27, 2020

@gsheni what do you mean by "the Index of the explanations"? What is the "index" in this context?

Are you asking for this information to a) appear visually (which I believe it already does), b) appear in the new JSON response @freddyaboulton is adding, in which case why not just use the position in the list as the index?

@gsheni
Copy link
Contributor Author

gsheni commented Aug 27, 2020

Each explanation is a reference to a element (instance) in X. That instance should have an index. I think that index should be in the report.

I don't believe this information appears visually. I did not see it in the docs:
https://evalml.featurelabs.com/en/latest/user_guide/model_understanding.html#Explaining-Multiple-Predictions
I also spoke with @freddyaboulton and confirmed this.

@kmax12
Copy link
Contributor

kmax12 commented Aug 27, 2020

@gsheni is your example right? for a single explanation, there should only be 1 index value, why do you have it different for each feature?

@dsherry
Copy link
Contributor

dsherry commented Aug 27, 2020

@gsheni ah, so you want to know, for each prediction explanation which was generated, what was the index in the features dataframe?

If so, doesn't the caller always know that index? Because in order to call prediction explanations, you need to pass in some rows. And in order to pass in rows, you need to select which rows to pass in :)

@dsherry
Copy link
Contributor

dsherry commented Aug 27, 2020

You are right that we don't currently show the feature DF index value in the prediction explanations returned by evalml. I guess I hadn't considered that as adding value since the caller has to know that info in order to call.

If I'm misunderstanding please let me know

@gsheni
Copy link
Contributor Author

gsheni commented Aug 27, 2020

@kmax12 yes, you are right. Fixed the printout example.

@dsherry Yes, I suppose the caller could get that information if they wanted to. It would require the caller re-run the following (outside of explain_predictions_best_worst)

(regression)

y_pred = pipeline.predict(input_features)
errors = metric(y_true, y_pred)
sorted_scores = errors.sort_values()
best = sorted_scores.index[:num_to_explain]
worst = sorted_scores.index[-num_to_explain:]
  • I would then have to find the index IDs of the best/worst score.

@freddyaboulton
Copy link
Contributor

@gsheni Yea you're right. I agree that adding the index value to the output of explain_predictions_best_worst can add value to the user.

@dsherry
Copy link
Contributor

dsherry commented Aug 27, 2020

Thanks all for the clarification. Yep agreed. I put this in the icebox because I don't think its high priority. If anyone feels different, let's talk.

@dsherry dsherry changed the title Add Index ID to best and worst prediction explanations Prediction explanations: display Index ID for best/worst prediction explanations Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement to an existing feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants