Iterative Imputer for categorical data #28545
sudughonge
started this conversation in
Ideas
Replies: 1 comment
-
When it comes to It would also avoid to have to handle the tricky case of heterogeneous types. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey folks, there's been some buzz over the years about having an IterativeImputer work for categorical data (See for ex: #17087). I was able to get a working version of it by inheriting the
sklearn.impute.IterativeImputer
and using thesklearn.ensemble.HistGradientBoostingRegressor
as the estimator. I modified the initial imputation strategy for the categorical variables as mod rather than the default of mean for the Iterative imputer. Thinking of putting a PR. Any major advice, caveats I should be aware of.Of course, this worked for that particular estimator (
sklearn.ensemble.HistGradientBoostingRegressor
) and it may not work with other estimators that don't work with mixed-type data (i.e., numeric and categorical). I understand that that's one of the reasons why this was never implemented as a general case, but I think this should still exist as an option since it is possible to do with the right estimator. Is there a way to do some sort of unordered encoding to ensure this works with an arbitrary estimator and not justsklearn.ensemble.HistGradientBoostingRegressor
?Beta Was this translation helpful? Give feedback.
All reactions