Automatic selection of background data #84

mayer79 · 2023-04-06T12:32:57Z

A pain-point in using {kernelshap} is the manual preparation of the background data bg_X. Most applications have a relatively large explanation data X and a model m, but no background data. It would be convenient to derive the background data automatically from X, so that a SHAP analysis would start with:

shp <- kernelshap(m, X)

Suggestion: Set the default bg_X = NULL. In this case, use this logic here:

If nrow(X) <= 200 -> bg_X = X
If nrow(X) < 20 -> warning("X is to small to be used as background data. Please pass a larger background data via 'bg_X'.")
If nrow(X) > 200 -> message("X is too large to be used as background data. We randomly select 200 rows from it.") and do subsampling

If bg_X = NULL and the user wants to pass a vector of case weights bg_w, the latter would need to correspond to X (we need to subsample correspondingly)

Ping @pbiecek

The text was updated successfully, but these errors were encountered:

mayer79 self-assigned this Apr 6, 2023

mayer79 added enhancement New feature or request question Further information is requested labels Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic selection of background data #84

Automatic selection of background data #84

mayer79 commented Apr 6, 2023 •

edited

Automatic selection of background data #84

Automatic selection of background data #84

Comments

mayer79 commented Apr 6, 2023 • edited

mayer79 commented Apr 6, 2023 •

edited