Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction explanations #646

Closed
dsherry opened this issue Apr 15, 2020 · 8 comments
Closed

Prediction explanations #646

dsherry opened this issue Apr 15, 2020 · 8 comments
Assignees
Labels
epic Issues which are epics, containing other issues. #792bb5 needs design Issues requiring design documentation. new feature Features which don't yet exist.
Milestone

Comments

@dsherry
Copy link
Contributor

dsherry commented Apr 15, 2020

A new feature we could add for model understanding is prediction explanation. This would answer the question "why did my model predict x?", allowing users to see which input features were the most impactful to that prediction. This sort of feature can be useful for debugging models from a data setup perspective, because users could examine predictions they've categorized as "bad" and alter or eliminate the features which contributed to those predictions.

Some resources:

@dsherry dsherry added the new feature Features which don't yet exist. label Apr 15, 2020
@gsheni
Copy link
Contributor

gsheni commented May 7, 2020

We should look into using SHAP (SHapley Additive exPlanations)
https://github.com/slundberg/shap
I discovered this library via this notebook:
https://github.com/d6t/d6t-python/blob/master/blogs/blog-20200426-shapley.ipynb

@dsherry dsherry added the needs design Issues requiring design documentation. label Jun 17, 2020
@dsherry dsherry added this to the July 2020 milestone Jun 17, 2020
@freddyaboulton freddyaboulton added the epic Issues which are epics, containing other issues. #792bb5 label Jul 20, 2020
@freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton added epic Issues which are epics, containing other issues. #792bb5 and removed epic Issues which are epics, containing other issues. #792bb5 labels Jul 20, 2020
@dsherry
Copy link
Contributor Author

dsherry commented Jul 21, 2020

@freddyaboulton and I met to discuss this yesterday. This is ready for implementation. Below is the implementation plan from the design doc:

Tasks
Phase 1

  1. Implement interpretation algorithm (1 day of engineering and testing, 1 day review).
    1. Add compute_features method to PipelineBase (private method)
    2. Implement ShapIntrepeter
  2. Implement interpretation UI (1 day of engineering and testing, 1 day review).
    1. Implement explain_prediction
  3. Write/augment tutorial to display new functionality (1 day of engineering and testing, 1 day review).
    1. Add it to the User Guide
    2. Consider adding something to Tutorials
  4. Qualitative Analysis of explanation quality: (3 days)
    1. Run AutoML on difficult datasets.
    2. Grab a couple pipelines and make sure prediction explanations make sense.
    3. Mock dataset and run AutoML search, then explain predictions.
    4. Add notebooks in repo
  5. Stretch Task: Evaluate performance on many datasets.

Note: until all this is complete, we should keep the implementation private for the July release, i.e. _explain_prediction

Overall estimate: 9 days

Phase 2

  1. Implement explain_predictions, which finds and explains the top n most/least confident predictions. (5 days)

Overall estimate: 5 days

Key Dates
July release: July 28, 2020.

Goal
Merge Phase 1 by Tues August 4th
Merge Phase 2 by Tues August 11th

Stretch Goal
Merge Phase 1 by Tues July 28th (July release)
Merge Phase 2 by Tues Aug 4th

@dsherry
Copy link
Contributor Author

dsherry commented Jul 23, 2020

Hey @freddyaboulton , to date we've been keeping epics in the Epic pipeline and instead moving the individual issues through the pipeline. Could you please follow that pattern here as well? If that feels weird or incorrect to you, happy to discuss changing our process for how we organize epics. Its pretty simplistic at the moment.

@freddyaboulton
Copy link
Contributor

@dsherry My mistake! Keeping epics in the epic pipeline makes sense to me 👍

@dsherry dsherry modified the milestones: July 2020, August 2020 Jul 27, 2020
@dsherry
Copy link
Contributor Author

dsherry commented Jul 28, 2020

@freddyaboulton from my perspective, we should finish reviewing the shap qualitative analysis you did (which is super helpful!!), resolve those discussions and perhaps make some fixes/updates. But what I see in there already feels good enough to make public for July!

To confirm: explain_predictions is now public, in the API docs and we added a user guide, correct? Meaning it will be a part of the July release? So great!!

@dsherry
Copy link
Contributor Author

dsherry commented Aug 28, 2020

@freddyaboulton can this epic be closed?

@freddyaboulton
Copy link
Contributor

@dsherry I think once we get #1107 merged we can close this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Issues which are epics, containing other issues. #792bb5 needs design Issues requiring design documentation. new feature Features which don't yet exist.
Projects
None yet
Development

No branches or pull requests

3 participants