New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FunctionTransformer
need feature_names_out
even if func
returns DataFrame
#28780
Comments
There are at least two things:
|
@lesteve you're right. If I change my code snippet like following: import numpy as np
import pandas as pd
from sklearn.preprocessing import FunctionTransformer
my_transformer = FunctionTransformer(
lambda X : pd.DataFrame(
{
f"{str(col)}^{power}" : X[col]**power
for col in X
for power in range(2,4)
}
),
feature_names_out = (
lambda transformer, input_features: [
f"{str(feature)}^{power}"
for feature in input_features
for power in range(2,4)
]
)
)
my_transformer.set_output(transform='pandas')
sample_size = 10
X = pd.DataFrame({
"feature 1" : [1,2,3,4,5],
"feature 2" : [3,4,5,6,7]
})
my_transformer.fit(X)
my_transformer.transform(X)
my_transformer.get_feature_names_out() So output columns of But in my opinion it would be more intuitive if If that's the way it's intended, you can close this issue. |
This seems related to #28241 and #27801. cc @glemaitre since he has this in his brain cache more than me. My naive (and apparently wrong) expectation would have been that if your @fedorkobak small tip: you can use syntax highlighting in markdown to make code snippets more readable, see https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks#syntax-highlighting for more details. I have edited your comment accordingly. |
The error message might miss an information: if you use |
@glemaitre. Do you mean something like this? from sklearn.preprocessing import FunctionTransformer
import pandas as pd
my_transformer = FunctionTransformer(
lambda X : pd.DataFrame(
{
f"{str(col)}^{power}" : X[col]**power
for col in X
for power in range(2,4)
}
)
# no features_names_out
)
X = pd.DataFrame({
"feature 1" : [1,2,3,4,5],
"feature 2" : [3,4,5,6,7]
})
my_transformer.set_output(transform="pandas")
my_transformer.fit_transform(X)
# raises: AttributeError: This 'FunctionTransformer' has no attribute 'get_feature_names_out'
my_transformer.get_feature_names_out() I called |
I was expecting something similar but it doesn't seem to work as I hinted above and the snippet in #28780 (comment) shows. You need to specify |
FunctionTransformer
ignores set_output(transform='pandas')
which raises ValueError
when setting columns for outputFunctionTransformer
need feature_names_out
even if func
returns DataFrame
Describe the bug
Trying to call
transform
forFunctionTransformer
for whichfeature_names_out
is configured raises error that advises to useset_output(transform='pandas')
. But this doesn't change anything.Steps/Code to Reproduce
Expected Results
pandas.DataFrame
like followingActual Results
Versions
The text was updated successfully, but these errors were encountered: