Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Woe won't work with NaNs #84

Open
rcg-uab opened this issue Jan 5, 2022 · 2 comments
Open

Woe won't work with NaNs #84

rcg-uab opened this issue Jan 5, 2022 · 2 comments

Comments

@rcg-uab
Copy link

rcg-uab commented Jan 5, 2022

Thanks for putting this package together - it's great to have a scorecard package in Python.

However, it appears the weight of evidence binning algorithm only works with complete data, even though it should factor in missing data.

As a simple test, I added a column to the germancredit.csv file and included some NaN data in the new column and reran the example code. The woebin function breaks as described in other threads (e.g. #78 ). Is this on the radar for a fix?

Cheers,
Ryan

@ShichenXie
Copy link
Owner

ShichenXie commented Jan 8, 2022

I cant reproduce your issue. The package should be able to handle missing values. Please upgrade your package to the latest version on the Github and try again.

@theinexorable
Copy link

I was able to trace this back to the line 126 (in 1.9.2 available in Pypi it was 116) in woebin.py.
The specific code is:
dtm = dtm[~dtm.index.isin(dtm_sv.index)].reset_index() if len(dtm_sv.index) < len(dtm.index) else None
which deletes the rows from dtm (the dataset for the final table) which are the same as the one from a list of missing values (i.e. from 0 up to w/e number of missing you have). I can't make sense of this and the reason it's still included, but this is where the rows with missing values are deleted.
As a workaround for now, I can just comment or delete this line and everything works perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants