Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Implements get_feature_names_out for transformers that support get_feature_names #18444
API Implements get_feature_names_out for transformers that support get_feature_names #18444
Changes from 46 commits
ab2acbd
3bc674b
1c4a78f
7881930
de63353
8835f3b
2eba5de
449ed23
a1fcf67
b929341
2b613e5
ad66b86
5eb7603
f4f832a
3a9054c
9c4420d
76f5b54
52f38e1
cdda1fb
5f4abbc
4305a28
c387b5b
2fefb67
a6832c3
0f45b22
4717a73
a658ba7
e9e45af
5acaced
0353f69
17a5016
bb07886
4e0968c
95046a0
f7aa3fd
f4a9882
fa4b318
640ad76
d9d2d95
f6075ca
922748f
1af211c
9ab0cf9
2926492
c1a1778
37101b0
82d0a60
8833b5b
adcc1c1
9a07816
0d3bc4e
b922fa4
21cbfe6
86887ae
8ecb38f
8b3c856
5260d7d
cf1ec1e
a63cd14
dddb4a8
a87866b
6f35c0c
526db41
f7c0062
9722c08
ba3aca2
c78967a
f022a1b
f10da10
d8bafb3
8751296
be3f0b1
84dc208
f41a40e
149c4e3
41d0bb1
628a2b3
d178069
faae557
02a25be
20ecd70
c6bc0ce
b76fd41
d60c4fa
4a00562
d5b72de
a0b7446
d8f84b3
f1090df
ae46466
2e2bdd8
5575857
3d1546b
6e44a52
b07a3bc
fffabf0
ecec556
5def4ce
1fda1c1
9b8834b
9034a53
9081ebd
a4ce567
12a2052
aece402
ea31c18
13d406b
d04ecec
d3cc5b6
76be321
caff15b
dcd685f
d930b1b
b379cd3
83d12ec
0841049
8d0b3cf
c35f7aa
ec8b825
560c0d0
043540b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick:
Also I think I would like "categorical" best instead of "categories". But not strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or could we have an option in
ColumnTransformer
to no prefix the output feature names when there is no colliding names?I think 99% of the time those prefixes would add unnecessary verbosity. The default could be to only use the prefix for colliding feature names with an option
The extra parameter could be
feature_name_prefix
taking values in{"only_when_colliding", "always"}
and"only_when_colliding"
would be the default.WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree having no prefixes would work for most use cases and would be okay with the extra parameter.
We would need to update SLEP007 about prefixes in
ColumnTransformer
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It says
So we could add it without changing the SLEP, but if we want
when_collinding
the default, then I guess we have to change the slep. I think we discussed this at some point. @adrinjalali @jnothman might remember?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I don't mind the way it is right now, it's such a huge improvement. Are you concerned that changing this in the future will be an incompatible change? Feature names are probably tricky to change, but we could do a deprecation cycle for
when_colliding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather do the change now if we all agree it is a usability (readability) improvement. Having to deal with backward compat to change that latter would be a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed a line above for a cosmit and as a result it hides an important discussion that was happening here. Sorry for that.
The discussion was about: shall we add a new constructor parameter to skip the verbose
categories__
prefix in this doctest when there is no risk of colliding feature names? It could be named:prefix_feature_names="when_colliding"
(default)prefix_feature_names="always"
If we do so, we need to amend the following paragraph of SLEP7:
I am in favor of making the change now to avoid problems with backward compat if we want to do this change later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on adding it now, I'm not entirely sure about the default +0.5 on that, I guess? In particular it means that adding transformers can change the names of existing features. The typical use-case where you wouldn't want prefixes is where you have separate feature types like categorical, continuous and bow, like here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either always with or always without those ticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an array-like because when
input_features is not None
then this returnsinput_features
without any processing.