Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical columns #273

Open
AlexanderZender opened this issue Jul 13, 2023 · 3 comments
Open

Categorical columns #273

AlexanderZender opened this issue Jul 13, 2023 · 3 comments

Comments

@AlexanderZender
Copy link

AlexanderZender commented Jul 13, 2023

I encountered an issue that was documented before with LightGBM models.
In my case it is not related to LightGBM, as it occurs before data arrives at the model.
This error occurs here in explainer dashboard:
image

It triggers when sorting a string column, and the column contains nan values.

Additionally, some categorical columns get deleted during the explainer process.
Here self.X still has all columns:
image

But two lines further down, they are gone. I can't really tell what's happening, as nothing should be happening:
image

At this point these column seem to not be needed. The explainer dashboard process works fine until the liftcurve_dfs is computed. It will crash as the sample df does not contain the necessary columns for the model:

image

Update: In the last picture I also realized that the feature "Sex" got changed from either male or female to 1 or 0. While the other categories still present like "Embarked" was not changed. This will break the model too, as the preprocessing will be handled by it.

@AlexanderZender
Copy link
Author

Fix for the first issue is to ignore nan values:

image

@oegedijk
Copy link
Owner

Hi @AlexanderZender ,nice catch! I'm on holiday until the end of the month so away from keyboard, but if you want to try to write up a PR I can see if I can have a look at it and get it released once I'm back

@AlexanderZender
Copy link
Author

AlexanderZender commented Jul 14, 2023

I found the issue(?) with my changed columns and values.
The model applied its preprocessing and these changes got applied to self.X in the explainer class.
I currently solved it by making a copy of the passed x in my Wrapper class:
image

The question is, should the user ensure this, or should explainer dashboard maybe only pass copies down to the model in the first place?
I can see that sometimes copies are indeed passed by explainer dashboard e.g.
image

I would suggest to only pass copies, to avoid potential issues with other models or pipeline in which the user might not have influence or doesnt want to add a wrapper.

@oegedijk If you dont mind i will open a PR in a bit with both things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants