Verification of dtypes of columns of X_row* is same that self.X #300

salmuz · 2024-03-25T17:16:50Z

Hello, I want to contribute by fixing two different bugs that are related to the usage of Ligthgbm.

NaN values in the category columns (which can cause an exception if we want to sorted(...))

       '<' not supported between instances of 'str' and 'float'

Preserve the right dtypes of columns of X (dataframe) so that that Ligthm predict(..) function doesn't throw errors.

…t problems in the prediction step.

oegedijk · 2024-03-27T20:31:34Z

explainerdashboard/explainer_methods.py

@@ -1791,3 +1796,25 @@ def get_xgboost_preds_df(xgbmodel, X_row, pos_label=1):
            0, "pred_proba"
        ]
    return xgboost_preds_df
+
+
+def check_dtype_of(


could you add some tests for this? How flexible is it? (e.g. will it break over float32 vs float64? int vs float? etc)

oegedijk · 2024-03-27T20:33:16Z

explainerdashboard/explainers.py

@@ -50,6 +50,7 @@


 from .explainer_methods import *
+from .explainer_methods import check_dtype_of


you can add check_dtype_of to the __all__ at the start in explainer_methods.py then it is covered by the import * (generally import * is frowned upon, but it's okay as long as you define a restrictive __all__)

oegedijk · 2024-03-27T20:34:17Z

explainerdashboard/explainers.py

@@ -241,7 +242,9 @@ def __init__(
            col for col in self.regular_cols if not is_numeric_dtype(self.X[col])
        ]
        self.categorical_dict = {
-            col: sorted(self.X[col].unique().tolist()) for col in self.categorical_cols
+            col: sorted(
+                v for v in self.X[col].unique().tolist() if not pd.isna(v)


not an expert on lightgbm, but wouldn't there be usecases where na would be a category? Or is that handled differently? How about by catboost or other libraries?

oegedijk · 2024-03-27T20:38:36Z

explainerdashboard/explainer_methods.py

+            df_target is not None and
+            not df_target[features].dtypes.eq(df_origin[features].dtypes).all()
+    ):
+        df_target[features] = df_target[features].astype(


in general not a fan of these functions that modify in place. Could you rewrite it such that it returns the transformed df instead? Then maybe call it adjust_dtypes_to_match_df(...) or something?

Calling something check_dtype_of when it actually modifies one of the arguments is confusing.

oegedijk · 2024-03-27T20:39:31Z

cool, thanks! tests are passing, but please have a look at my comments and see if you can add a few test cases for this new function...

salmuz · 2024-03-28T14:50:40Z

Hello, I will do the requested changes as soon as possible (the next week). Thanks

Verification of dtypes of columns of X_sample so that LightGBM has no…

0b461ae

…t problems in the prediction step.

salmuz changed the title ~~Verification of dtypes of columns of X_sample is same that self.X~~ Verification of dtypes of columns of X_row* is same that self.X Mar 25, 2024

oegedijk reviewed Mar 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verification of dtypes of columns of X_row* is same that self.X #300

Verification of dtypes of columns of X_row* is same that self.X #300

salmuz commented Mar 25, 2024 •

edited

oegedijk Mar 27, 2024

oegedijk Mar 27, 2024

oegedijk Mar 27, 2024

oegedijk Mar 27, 2024

oegedijk commented Mar 27, 2024

salmuz commented Mar 28, 2024

		@@ -50,6 +50,7 @@


		from .explainer_methods import *
		from .explainer_methods import check_dtype_of

Verification of dtypes of columns of X_row* is same that self.X #300

Are you sure you want to change the base?

Verification of dtypes of columns of X_row* is same that self.X #300

Conversation

salmuz commented Mar 25, 2024 • edited

oegedijk Mar 27, 2024

Choose a reason for hiding this comment

oegedijk Mar 27, 2024

Choose a reason for hiding this comment

oegedijk Mar 27, 2024

Choose a reason for hiding this comment

oegedijk Mar 27, 2024

Choose a reason for hiding this comment

oegedijk commented Mar 27, 2024

salmuz commented Mar 28, 2024

salmuz commented Mar 25, 2024 •

edited