Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with running the break plots on sparse data #31

Open
dokato opened this issue May 1, 2020 · 0 comments
Open

problem with running the break plots on sparse data #31

dokato opened this issue May 1, 2020 · 0 comments

Comments

@dokato
Copy link

dokato commented May 1, 2020

Hi, recently I tried to run the breakdown plots on random forest model (ranger from caret package) trained on sparse data (TFiDF matrix). The model is not doing really good job, but still...

When using DALEX package and after creating explainer without any problems I got for this call:

variable_attribution(rf_explainer,
                               new_observation = x_df_test[ind_to_check, ],
                               type = "break_down")

the following error:

Error in `[.data.frame`(out, , obsLevels, drop = FALSE) : 
  undefined columns selected

Then, I switched to this breakDown package. First of all, after calling it like this:

broken(rf_mod, x_df_test[ind_to_check, ])

It tells me that:

Error in "data.frame" %in% class(data) : 
  argument "data" is missing, with no default

Thus, I changed my call to:

broken(rf_mod, x_df_test[ind_to_check, ], data = x_df_test)

and this time:

Error in yhats[[which.max(yhats_diff)]] : 
  attempt to select less than one element in get1index

The whole code is here: https://github.com/CaRdiffR/tidy_thursdays/blob/master/april_30_2020/predict_gross_clf.R

Strangely, it worked well on exactly same pipeline but with a regression problem.

I use R 4.0.0 and latest version of the DALTEX, breakDown packages.

Might be related to #29 .

@dokato dokato changed the title problem with running the break plots on spare data problem with running the break plots on sparse data May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant