Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Implements get_feature_names_out for transformers that support get_feature_names #18444

Merged
merged 121 commits into from Sep 7, 2021

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Sep 23, 2020

Reference Issues/PRs

Fixes #6425
Closes #12627
Follows up on: #18010

What does this implement/fix? Explain your changes.

This PR adds:

  1. get_feature_names_out in transformers that have get_feature_names. This was motivated by Feature names with input features #13307 (comment). The signature of get_output_names is always get_output_names(input_features=None) where input_features can be ignored.

  2. get_feature_names is deprecated.

  3. The output of get_feature_names_out will be a list most of the time. The only case is when input_features is not None, and the transformer is one-to-one. I considered making this is a list, but I do not think it is worth the copy.

CC @amueller

amueller and others added 30 commits November 20, 2018 10:52
# Conflicts:
#	sklearn/base.py
#	sklearn/impute.py
#	sklearn/preprocessing/data.py
Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after a fast pass over the code. I'd love to have it in 1.0 in I dare taking the risk to push the green button....

doc/glossary.rst Outdated Show resolved Hide resolved
doc/glossary.rst Outdated Show resolved Hide resolved
@lorentzenchr lorentzenchr added this to the 1.0 milestone Sep 6, 2021
@thomasjpfan
Copy link
Member Author

thomasjpfan commented Sep 6, 2021

My plan is to follow up this PR with adding get_feature_names_out to preprocessing and impute. I think that would cover a good percentage of the use cases. (Follow up PRs should be relatively simple to review)

@lorentzenchr
Copy link
Member

@thomasjpfan Are you fine with merging now to have this PR in v1.0? (given CI is green)

@ogrisel
Copy link
Member

ogrisel commented Sep 6, 2021

@thomasjpfan Are you fine with merging now to have this PR in v1.0? (given CI is green)

@adrinjalali has already branched 1.0.X. Maybe it's too late.

I don't mind having this in 1.1.0, hopefully that will be a good motivation to not wait too long to release 1.1 :)

I let @adrinjalali decide on the fate of this PR. Based on his decision, we will have to update the what's new entry before merging.

@lorentzenchr
Copy link
Member

There is a CI failure anyway.

@adrinjalali
Copy link
Member

I still haven't tagged, I'm happy to include this if it gets merged by tomorrow

@thomasjpfan
Copy link
Member Author

I recently opened #20919 that enabled TfidfTransformer to run with common tests, which made this PR fail.

My fix is in 560c0d0. Since TfidfTransformer is a "one-to-one" transformer, I added a mixin that adds get_feature_names_out for one-to-one transformers. The commit also adds get_features_names_out to transformers that also have this "one-to-one" correspondence.

@ogrisel ogrisel merged commit 4abd9f2 into scikit-learn:main Sep 7, 2021
@ogrisel
Copy link
Member

ogrisel commented Sep 7, 2021

I merged then! Thanks very much @thomasjpfan!

@adrinjalali this will need a backport to 1.0.X.

@ogrisel ogrisel added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Sep 7, 2021
adrinjalali pushed a commit that referenced this pull request Sep 7, 2021
…t_feature_names (#18444)

Co-authored-by: Andreas Mueller <andreas.mueller@columbia.edu>
Co-authored-by: Andreas Mueller <andreasmuellerml@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
ogrisel added a commit to scikit-learn-inria-fondation/follow-up that referenced this pull request Sep 7, 2021
## August 31th, 2021

### Gael

* TODO: Jeremy's renewal, Chiara's replacement, Mathis's consulting gig

### Olivier

- input feature names: main PR [#18010](scikit-learn/scikit-learn#18010) that links into sub PRs
  - remaining (need review): [#20853](scikit-learn/scikit-learn#20853) (found a bug in `OvOClassifier.n_features_in_`)
- reviewing `get_feature_names_out`: [#18444](scikit-learn/scikit-learn#18444)
- next: give feedback to Chiara on ARM wheel building [#20711](scikit-learn/scikit-learn#20711) (needed for the release)
- next: assist Adrin for the release process
- next: investigate regression in loky that blocks the cloudpickle release [#432](cloudpipe/cloudpickle#432)
- next: come back to intel to write a technical roadmap for a possible collaboration

### Julien

 - Was on holidays
 - Planned week @ Nexedi, Lille, from September 13th to 17th
 - Reviewed PRs
     - [`#20567`](scikit-learn/scikit-learn#20567) Common Private Loss module
     - [`#18310`](scikit-learn/scikit-learn#18310) ENH Add option to centered ICE plots (cICE)
     - Others PRs prior to holidays
 - [`#20254`](scikit-learn/scikit-learn#20254)
     - Adapted benchmarks on `pdist_aggregation` to test #20254 against sklearnex
     - Adapting PR for `fast_euclidean` and `fast_sqeuclidean` on user-facing APIs
     - Next: comparing against scipy's 
     - Next: Having feedback on [#20254](scikit-learn/scikit-learn#20254) would also help
- Next: I need to block time to study Cython code.

### Mathis
- `sklearn_benchmarks`
  - Adapting benchmark script to run on Margaret
  - Fix issue with profiling files too big to be deployed on Github Pages
  - Ensure deterministic benchmark results
  - Working on declarative pipeline specification
  - Next: run long HPO benchmarks on Margaret

### Arturo

- Finished MOOC!
- Finished filling [Loïc's notes](https://notes.inria.fr/rgSzYtubR6uSOQIfY9Fpvw#) to find questions with score under 60% (Issue [#432](INRIA/scikit-learn-mooc#432))
    - started addressing easy-to-fix questions, resulting in gitlab MRs [#21](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/21) and [#22](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/22)
    - currently working on expanding the notes up to 70%
- Continued cross-linking forum posts with issues in GitHub, resulting in [#444](INRIA/scikit-learn-mooc#444), [#445](INRIA/scikit-learn-mooc#445), [#446](INRIA/scikit-learn-mooc#446), [#447](INRIA/scikit-learn-mooc#447) and [#448](INRIA/scikit-learn-mooc#448)

### Jérémie
- back from holidays, catching up
- Mathis' benchmarks
- trying to find what's going on with ASV benchmarks
  (asv should display the versions of all build and runtime depndencies for each run)

### Guillaume

- back from holidays
- Next:
    - release with Adrin
    - check the PR and issue trackers

### TODO / Next

- Expand Loïc’s notes up to 70% (Arturo)
- Create presentation to discuss my experience doing the MOOC (Arturo)
- Help with the scikit-learn release (Olivier, Guillaume)
- HR: Jeremy's renewal, Chiara's replacement (Gael)
- Mathis's consulting gig (Olivier, Gael, Mathis)
thomasjpfan added a commit to thomasjpfan/scikit-learn that referenced this pull request Sep 7, 2021
…t_feature_names (scikit-learn#18444)

Co-authored-by: Andreas Mueller <andreas.mueller@columbia.edu>
Co-authored-by: Andreas Mueller <andreasmuellerml@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
…t_feature_names (scikit-learn#18444)

Co-authored-by: Andreas Mueller <andreas.mueller@columbia.edu>
Co-authored-by: Andreas Mueller <andreasmuellerml@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
gregcaporaso added a commit to qiime2/q2-sample-classifier that referenced this pull request Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transformative get_feature_names for various transformers
7 participants