ENH Adds feature_names_out to impute module #21078

thomasjpfan · 2021-09-17T19:29:24Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds feature_names_out to the impute module.

Any other comments?

There is an edge case I am concerned about:

from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np

marker = np.nan
X = np.array(
    [
        [marker, 2],
        [2, marker],
        [6, 3],
        [1, 2],
    ]
)
X_df = pd.DataFrame(X, columns=["a", "indicator_a"])

imputer = SimpleImputer(add_indicator=True)
imputer.fit_transform(X_df)
imputer.get_feature_names_out()
# array(['a', 'indicator_a', 'indicator_a', 'indicator_indicator_a'], dtype=object)

For this edge case, we can add sklearn to the prefix resulting in sklearn_indicator_a?

lorentzenchr

@thomasjpfan Thanks for your dedication for feature names!

lorentzenchr · 2021-09-19T13:53:06Z

sklearn/impute/_base.py

+            Transformed feature names.
+        """
+        input_features = _check_feature_names_in(self, input_features)
+        return input_features[self.features_]


What is our rule for when to prefix and when not? The indicator output column of SimpleImputer is prefixed with "indicator_".

I do not think we have a rule. I went with SimpleImputer prefixing "indicator_" because it needed to distinguish between the imputed features and the indicator.

Looking at this again, I think we can go with MissingIndcator to add the prefix. This way SimpleImputer only needs to combine them.

I also think this is fine to preserve the original names for the imputed features.

sklearn/impute/tests/test_common.py

ogrisel

Thanks for the PR! Apart from the previous review comments, I would like to use a more explicit prefix:

sklearn/impute/_base.py

…mpute

ogrisel

LGTM!

.github/workflows/assign.yml

lorentzenchr

LGTM
Remark: The convention introduced here is tor prefix with prefix = self.__class__.__name__.lower().

lorentzenchr · 2021-10-21T11:26:36Z

Before merging, should we wait for #21334 and then also use _ClassNamePrefixFeaturesOutMixin or _generate_get_feature_names_out?

thomasjpfan · 2021-10-21T11:42:09Z

Before merging, should we wait for #21334 and then also use _ClassNamePrefixFeaturesOutMixin or _generate_get_feature_names_out?

I do not think we need to wait. This PR is slightly different. MissingIndicator prefixes the input feature names, i.e. missingindicator_myfeature, while #21334 and _generate_get_feature_names_out generates all new names, i.e. pca0, pca1, etc.

timotk · 2021-10-28T14:12:16Z

FYI: This PR has been merged into the main branch, but is NOT included in the 1.0.1 release. I thought it was included because it was linked in the 1.0.1 release, but the specific commit is actually dropped and not picked:

drop 8f621ad ENH Adds feature_names_out to impute module (#21078)

Took me quite a while to figure that out.

ademyanchuk · 2022-01-20T11:27:06Z

FYI: This PR has been merged into the main branch, but is NOT included in the 1.0.1 release

It is still not included 1.0.2. Should it be this way?

timotk · 2022-01-20T14:30:15Z

@ademyanchuk Yes, it should be this way. I believe scikit-learn follows the process of semantic versioning:

The first number increases with major new functionality/changes
the second number for minor new features/changes.
third number is reserved for fixes/patches, so no new features.

ademyanchuk · 2022-01-20T14:44:31Z

Thank you, @timotk
Make sense, I have already switched to nightly for now 😊

lorentzenchr · 2022-01-20T15:07:46Z

As info: Scikit-learn does not follow SemVer. But new features as this one, are usually rolled out in a minor version like 1.1.0 or 1.2.0. Bugfix versions like 1.0.1 are for bugfixes only.

thomasjpfan added 2 commits September 17, 2021 15:14

ENH Adds feature_names_out to impute module

b993b5e

DOC Adds whats new

3f95b5c

github-actions bot added the module:impute label Sep 17, 2021

BUG Remove use for dtype in concatenate

6ebb43f

lorentzenchr reviewed Sep 19, 2021

View reviewed changes

ogrisel reviewed Sep 23, 2021

View reviewed changes

sklearn/impute/_base.py Outdated Show resolved Hide resolved

thomasjpfan mentioned this pull request Sep 30, 2021

Implement get_feature_names_out for SimpleImputer #21200

Closed

thomasjpfan added 23 commits September 30, 2021 09:50

Merge remote-tracking branch 'upstream/main' into feature_names_out_i…

0e5b23a

…mpute

MAINT Allows for multiple whitespace

91ae2bb

Merge remote-tracking branch 'upstream/main' into feature_names_out_i…

27bcfe3

…mpute

Merge remote-tracking branch 'upstream/main' into feature_names_out_i…

7915067

…mpute

Merge remote-tracking branch 'upstream/main' into feature_names_out_i…

f3a3ff8

…mpute

TST Adds non-missing feature in the middle

544770b

ENH Use missingindicator as prefix for indicator

52c2202

TST Remove covergence warning

7170589

Merge branch 'better_white_space_for_take'

a87e4ca

MAINT Whitespace

3b9c3c9

Fix yaml

ce801d0

TST Testing

a5dfc47

TST Testing

ee5f2f7

TST Testing

6719ad6

TST Testing

d0f2474

TST Testing

5ddd36d

TST Testing

31f90ed

TST Testing

218db3f

TST Testing

57203a6

TST Testing

e8c15e4

TST Testing

3a2f25c

TST Testing

ef134d1

TST Testing

71e26ff

thomasjpfan added 6 commits October 5, 2021 13:29

TST Testing

1a75e5c

TST Testing

df4b592

TST Testing

5a422e6

TST Testing

22d871b

TST Testing

39c8e00

TST Testing

df6fd9a

ogrisel approved these changes Oct 13, 2021

View reviewed changes

thomasjpfan and others added 2 commits October 13, 2021 10:57

Merge remote-tracking branch 'origin/main' into feature_names_out_impute

f12da8c

Merge branch 'main' into feature_names_out_impute

8f34d4d

ogrisel reviewed Oct 13, 2021

View reviewed changes

.github/workflows/assign.yml Outdated Show resolved Hide resolved

REV Remove unrelated diff

6d86dc5

ogrisel mentioned this pull request Oct 13, 2021

Implement get_feature_names_out for all estimators #21308

Closed

14 tasks

lorentzenchr approved these changes Oct 21, 2021

View reviewed changes

lorentzenchr merged commit 8f621ad into scikit-learn:main Oct 21, 2021

glemaitre mentioned this pull request Oct 23, 2021

Release 1.0.1 #21404

Merged

10 tasks

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021

ENH Adds feature_names_out to impute module (scikit-learn#21078)

c3ab973

eddiebergman mentioned this pull request Nov 15, 2022

Update scikit learn 1.2 automl/auto-sklearn#1611

Closed

54 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Adds feature_names_out to impute module #21078

ENH Adds feature_names_out to impute module #21078

thomasjpfan commented Sep 17, 2021 •

edited

lorentzenchr left a comment

lorentzenchr Sep 19, 2021

thomasjpfan Oct 4, 2021

ogrisel Oct 13, 2021

ogrisel left a comment

ogrisel left a comment

lorentzenchr left a comment

lorentzenchr commented Oct 21, 2021

thomasjpfan commented Oct 21, 2021

timotk commented Oct 28, 2021

ademyanchuk commented Jan 20, 2022 •

edited

timotk commented Jan 20, 2022

ademyanchuk commented Jan 20, 2022

lorentzenchr commented Jan 20, 2022

ENH Adds feature_names_out to impute module #21078

ENH Adds feature_names_out to impute module #21078

Conversation

thomasjpfan commented Sep 17, 2021 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

lorentzenchr left a comment

Choose a reason for hiding this comment

lorentzenchr Sep 19, 2021

Choose a reason for hiding this comment

thomasjpfan Oct 4, 2021

Choose a reason for hiding this comment

ogrisel Oct 13, 2021

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

lorentzenchr left a comment

Choose a reason for hiding this comment

lorentzenchr commented Oct 21, 2021

thomasjpfan commented Oct 21, 2021

timotk commented Oct 28, 2021

ademyanchuk commented Jan 20, 2022 • edited

timotk commented Jan 20, 2022

ademyanchuk commented Jan 20, 2022

lorentzenchr commented Jan 20, 2022

thomasjpfan commented Sep 17, 2021 •

edited

ademyanchuk commented Jan 20, 2022 •

edited