Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when X has boolean columns #224

Open
sebastian-correa opened this issue Jun 29, 2022 · 2 comments
Open

Error when X has boolean columns #224

sebastian-correa opened this issue Jun 29, 2022 · 2 comments

Comments

@sebastian-correa
Copy link

Summary

In my project, we have a dataset with 1 bool column (with the rest being normal numeric columns).

I'm trying to make an Hub with many dashboards, but when instantiating the first dashboard I get

numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('bool') with casting rule 'same_kind'

The lines that error out are the following, because we're trying to round a bool.

min_range = np.round(self.explainer.X[col][lambda x: x != self.explainer.na_fill].min(), 2)
max_range = np.round(self.explainer.X[col][lambda x: x != self.explainer.na_fill].max(), 2)

For now, I can circumvent the issue by doing

bool_cols = X.select_dtypes(include="bool").columns
X = X.astype({col: "uint8" for col in bool_cols})

MWE

Copy this text in test.py and do python test.py.

import numpy as np
import pandas as pd
from explainerdashboard import ExplainerDashboard, ExplainerHub, RegressionExplainer
from xgboost import XGBRegressor

n_rows = 1_000
n_cols = 20

data = {f"col_{i}": np.random.random(n_rows) for i in range(n_cols - 1)}
data["bool_col"] = np.random.randint(0, 2, n_rows, dtype=bool)
X = pd.DataFrame(data)
y = np.random.random(n_rows)

reg = XGBRegressor()
reg.fit(X, y)

explainer = RegressionExplainer(reg, X, y, target="Target", units="u")

dashboards = [
    ExplainerDashboard(
        explainer,
        title="MWE",
        name="MWE",
        description="MWE Dashboard",
        shap_interaction=True,
        shap_dependence=True,
    )
]

hub = ExplainerHub(dashboards, title="MWE", description="MWE")
hub.run(port=8050)

Full traceback

pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
Changing class type to XGBRegressionExplainer...
Generating self.shap_explainer = shap.TreeExplainer(model)
Building ExplainerDashboard..
Warning: calculating shap interaction values can be slow! Pass shap_interaction=False to remove interactions tab.
Generating layout...
Calculating shap values...
ntree_limit is deprecated, use `iteration_range` or model slicing instead.
Calculating predictions...
pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
Calculating residuals...
Calculating absolute residuals...
Traceback (most recent call last):
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('bool') with casting rule 'same_kind'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/me/Dev/app/test.py", line 20, in <module>
    ExplainerDashboard(
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboards.py", line 590, in __init__
    self.explainer_layout = ExplainerTabsLayout(explainer, tabs, title, 
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboards.py", line 104, in __init__
    self.tabs  = [instantiate_component(tab, explainer, name=str(i+1), **kwargs) for i, tab in enumerate(tabs)]
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboards.py", line 104, in <listcomp>
    self.tabs  = [instantiate_component(tab, explainer, name=str(i+1), **kwargs) for i, tab in enumerate(tabs)]
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_methods.py", line 733, in instantiate_component
    component = component(explainer, name=name, **kwargs)
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/composites.py", line 421, in __init__
    self.input = FeatureInputComponent(explainer, name=self.name+"0",
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/overview_components.py", line 695, in __init__
    self._feature_inputs = [
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/overview_components.py", line 696, in <listcomp>
    self._generate_dash_input(
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/explainerdashboard/dashboard_components/overview_components.py", line 736, in _generate_dash_input
    min_range = np.round(self.explainer.X[col][lambda x: x != self.explainer.na_fill].min(), 2)
  File "<__array_function__ internals>", line 180, in round_
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 3773, in round_
    return around(a, decimals=decimals, out=out)
  File "<__array_function__ internals>", line 180, in around
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 3348, in around
    return _wrapfunc(a, 'round', decimals=decimals, out=out)
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 66, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/Users/me/Dev/app/.venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('bool') with casting rule 'same_kind'

Environment

  • Python 3.9.10
  • explainerdashboard 0.4.0
  • numpy 1.22.3
  • pandas 1.4.1
  • xgboost 1.5.2
  • MacOS 12.4 on Intel Mac.
@oegedijk
Copy link
Owner

oegedijk commented Jan 1, 2023

Couldn't you change the bool columns to a ['0', '1'] int or float column?

@sebastian-correa
Copy link
Author

Couldn't you change the bool columns to a ['0', '1'] int or float column?

That's the workaround I cited! It's still a workaround and I think Explainer Dashboard should work with bool columns. Why not avoid the np.round operation when the column is bool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants