Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic selection of background data #84

Open
mayer79 opened this issue Apr 6, 2023 · 0 comments
Open

Automatic selection of background data #84

mayer79 opened this issue Apr 6, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@mayer79
Copy link
Collaborator

mayer79 commented Apr 6, 2023

A pain-point in using {kernelshap} is the manual preparation of the background data bg_X. Most applications have a relatively large explanation data X and a model m, but no background data. It would be convenient to derive the background data automatically from X, so that a SHAP analysis would start with:

shp <- kernelshap(m, X)

Suggestion: Set the default bg_X = NULL. In this case, use this logic here:

  • If nrow(X) <= 200 -> bg_X = X
  • If nrow(X) < 20 -> warning("X is to small to be used as background data. Please pass a larger background data via 'bg_X'.")
  • If nrow(X) > 200 -> message("X is too large to be used as background data. We randomly select 200 rows from it.") and do subsampling

If bg_X = NULL and the user wants to pass a vector of case weights bg_w, the latter would need to correspond to X (we need to subsample correspondingly)

Ping @pbiecek

@mayer79 mayer79 self-assigned this Apr 6, 2023
@mayer79 mayer79 added enhancement New feature or request question Further information is requested labels Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant