'Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'> #498

yzheng27 · 2022-02-01T20:35:29Z

Was following example https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb on my own data and xgboost object, but get error ('Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'>) at explainer.explain_global(x_test). Changed x_test to DMatrix generates error 'DMatrix' object has no attribute 'shape'. Please advise. Thank you.

x_train, x_test, y_train, y_test = train_test_split(df[features], df[LABEL], test_size=0.2, random_state=0)

from interpret.ext.blackbox import TabularExplainer
explainer = TabularExplainer(model, 
                             x_train, 
                             model_task = 'regression',
                             features=features)
global_explanation = explainer.explain_global(x_test)
# xgtest = xgb.DMatrix(x_test.values)
# global_explanation = explainer.explain_global(xgtest)

Version:
interpret-community==0.23.0
interpret-core==0.2.7
xgboost==1.4.1

The text was updated successfully, but these errors were encountered:

gaugup · 2022-02-02T09:26:30Z

@yzheng27 Thanks for reporting the issue. Could you try with the latest interpret-community release 0.24.2 and see if you continue to see this issue? In case you still see the issue, could you provide a sample notebook so that we can reproduce this issue locally. A stack trace of the error will also help us greatly in triaging this issue.

Regards,

imatiach-msft · 2022-02-02T13:52:53Z

@gaugup I think the issue is happening because they are using the XGBoost API that uses DMatrix, instead of the scikit-learn XGBoost API that is pandas compatible, so I'm guessing that upgrading to latest version won't fix it. @yzheng27 I will take a look to see if we can support DMatrix from XGBoost somehow, but an easy quick fix would be to use the scikit-learn API for XGBoost,

yzheng27 · 2022-02-02T21:44:50Z

thank you. i was able to generate the global_explanation by loading the model with scikit-learn interface. But now my notebook is running code below for several hours. is it expected? the shape of x_test is around 24000*325.

ExplanationDashboard(global_explanation, model, dataset=x_test, true_y=y_test, public_ip = host, port = 7780)

imatiach-msft · 2022-02-03T14:08:24Z

"the shape of x_test is around 24000*325"
@yzheng27 yes that may be too large for the UI to handle, please limit it by downsampling to ~5k rows instead of 24K. If you are still seeing issues with downsampled data, then there might be something about the host/port configuration. However even then you should still see the dashboard, just what-if analysis and ICE plots won't work in the ExplanationDashboard.

imatiach-msft · 2022-02-03T14:11:10Z

@yzheng27 one other thing, are you importing the dashboard from raiwidgets package, on this repository:

from raiwidgets import ExplanationDashboard

https://github.com/microsoft/responsible-ai-toolbox

Make sure you don't import it from interpret-community package, as it has been moved to the other repository.

Also, can you run:

pip show raiwidgets

to check that you have the latest version of raiwidgets package with ExplanationDashboard?

yzheng27 · 2022-02-04T00:23:35Z

@imatiach-msft i'm using the library from raiwidgets and the version is 0.15.1.

I was able to get the dashboard with the data dimensions I mentioned, though it took several hours. Will try with the smaller data.

imatiach-msft · 2022-02-04T13:23:35Z

@yzheng27 if it took several hours but eventually worked then it must be that the UI just loaded too much data, and downsampling should speed it up significantly. All of the datapoints are loaded into the UI and I've noticed that usually after >5k datapoints the UI becomes very slow. Perhaps there is some way to change the UI to stream select data from python backend or to aggregate statistics across multiple points in the future for users who want to run it on a lot of data, I'm not sure. The ErrorAnalysisDashboard is actually able to work on millions of points if you pass in a sample_dataset for the Dataset Explorer, so perhaps something like that could be done for the ExplanationDashboard as well:

https://github.com/microsoft/responsible-ai-toolbox/blob/main/raiwidgets/tests/test_error_analysis_dashboard.py#L83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'> #498

'Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'> #498

yzheng27 commented Feb 1, 2022

gaugup commented Feb 2, 2022

imatiach-msft commented Feb 2, 2022 •

edited

yzheng27 commented Feb 2, 2022

imatiach-msft commented Feb 3, 2022

imatiach-msft commented Feb 3, 2022 •

edited

yzheng27 commented Feb 4, 2022

imatiach-msft commented Feb 4, 2022 •

edited

'Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'> #498

'Expecting data to be a DMatrix object, got: ', <class 'pandas.core.frame.DataFrame'> #498

Comments

yzheng27 commented Feb 1, 2022

gaugup commented Feb 2, 2022

imatiach-msft commented Feb 2, 2022 • edited

yzheng27 commented Feb 2, 2022

imatiach-msft commented Feb 3, 2022

imatiach-msft commented Feb 3, 2022 • edited

yzheng27 commented Feb 4, 2022

imatiach-msft commented Feb 4, 2022 • edited

imatiach-msft commented Feb 2, 2022 •

edited

imatiach-msft commented Feb 3, 2022 •

edited

imatiach-msft commented Feb 4, 2022 •

edited