how to know which feature is selected by FeatureUnion? #6122

genliu777 · 2016-01-06T07:23:54Z

i run the code of,
http://scikit-learn.org/stable/auto_examples/feature_stacker.html#example-feature-stacker-py
and with the following code,

# Build estimator from PCA and Univariate selection:
combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])

# Use combined features to transform dataset:
X_features = combined_features.fit(X, y).transform(X)

with data put into FeatureUnion, i want to know which feature is selected. in the doc of FeatureUnion, there is a funtion get_feature_names() which gets all the names from all the transformer. so just call this function and get error like this,

AttributeError: Transformer pca does not provide get_feature_names.

actually, i know pca does not have function like this. but why FeatureUnion provide this function!?

The text was updated successfully, but these errors were encountered:

jnothman · 2016-01-06T09:24:54Z

I agree that there should be a way to see which features belong to which
components, and I have long ago proposed this, but I don't think it's
currently possible.

On 6 January 2016 at 18:23, genliu777 notifications@github.com wrote:

i run the code of,

http://scikit-learnorg/stable/auto_examples/feature_stackerhtml#example-feature-stacker-py
and with the following code,

Build estimator from PCA and Univariate selection:

combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])

Use combined features to transform dataset:

X_features = combined_featuresfit(X, y)transform(X)

with data put into FeatureUnion, i want to know which feature is selected
in the doc of FeatureUnion, there is a funtion get_feature_names() which
gets all the names of from all the transformer so just call this function
and get error like this,
AttributeError: Transformer pca does not provide get_feature_names
actually, i know pca does not have function like this but why FeatureUnion
provide this function!?

—
Reply to this email directly or view it on GitHub
#6122.

genliu777 · 2016-01-06T13:43:19Z

why currently impossible!? you know, FeatureUnion gives the function get_feature_names(), and it also should work!

like, maybe all of them, models in sklearn, have the function fit and transform, it should make all the models which can be put in FeatureUnion and work well , provide the attribute as the source code of get_feature_names() calls if not hasattr(trans, 'get_feature_names'): . otherwise, FeatureUnion do not necessarily provide the funciton of get_feature_names()!!

joshhamanngaia · 2017-07-07T18:03:40Z

This may not address your particular issue with PCA directly, but if I read into your question correctly, you are wondering in general how to percolate attributes through the custom pipeline.

Late to the party, but you can access elements within the pipeline, regardless how complicated, by walking through the pipeline structure, finding the appropriate step (even within featureunion) and then using the appropriate attribute. Here is an example I just ran:

pipeline = Pipeline([ ('union', FeatureUnion([ ('categoric', Pipeline([ ('f_cat', feature_type_split(type = 'categoric')), #returns categoric in array for vect ('vect', vect), ])), ('numeric', Pipeline([ ('f_num', feature_type_split(type = 'numeric')), ])), ])), ('select', ff), ('tree_clf', clf), ])

Showing the pipeline object itself via print(pipeline) gives me a point of reference:

Pipeline(steps=[('union', FeatureUnion(n_jobs=1, transformer_list=[('categoric', Pipeline(steps=[('f_cat', feature_type_split(type='categoric')), ('vect', DictVectorizer(dtype=<type 'numpy.float64'>, separator='=', sort=True, sparse=True))])), ('numeric', Pipeline(steps=[('f_num', feature_type...it=2, min_weight_fraction_leaf=0.0, presort=False, random_state=None, splitter='best'))])

So I walk through to the union step via:

pipeline.named_steps['union']

Then walk to the next level which is transformer_list (or the categoric pipeline) via:

pipeline.named_steps['union'].transformer_list[0]

Then walk to the next level which is the steps within the categoric pipeline via:

pipeline.named_steps['union'].transformer_list[0][1]

The above outputs a typical pipeline structure, where we can now utilize named_steps:
print pipeline.named_steps['union'].transformer_list[0][1].named_steps['vect']

And therefore access the attribute we need via:
print pipeline.named_steps['union'].transformer_list[0][1].named_steps['vect'].get_feature_names()

TLDR;
Walk through the pipeline structure piece by piece with your custom pipeline, and then access the attribute as you would normally for that transform/estimator piece.

jnothman · 2017-07-08T21:43:06Z

Please try eli5's transform_feature_names which can work in cases where scikit-learn's get_feature_names doesn't.

markatango · 2021-11-17T02:31:33Z

@joshhamanngaia Awesome. Thank you for not just showing what but also showing how and why.

thomasjpfan · 2021-11-28T03:03:53Z

On main, it is now possible to call get_feature_names_out with feature union. In context of the original issue, one can call get_feature_names_out to get the feature names:

from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

iris = load_iris(as_frame=True)
X, y = iris.data, iris.target

pca = PCA(n_components=2)
selection = SelectKBest(k=1)
combined_features = FeatureUnion([("pca", pca), ("univ_select", selection)])

svm = SVC(kernel="linear")
pipeline = Pipeline([("features", combined_features), ("svm", svm)])
pipeline.fit(X, y)

# slice the pipeline to include all steps excluding the last one
pipeline[:-1].get_feature_names_out()
# array(['pca__pca0', 'pca__pca1', 'univ_select__petal length (cm)'], dtype=object)

In 1.1, we will all transformers will define a get_feature_names_out allowing this feature to work everywhere.

thomasjpfan closed this as completed Nov 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to know which feature is selected by FeatureUnion? #6122

how to know which feature is selected by FeatureUnion? #6122

genliu777 commented Jan 6, 2016

jnothman commented Jan 6, 2016

Build estimator from PCA and Univariate selection:

Use combined features to transform dataset:

genliu777 commented Jan 6, 2016

joshhamanngaia commented Jul 7, 2017 •

edited

jnothman commented Jul 8, 2017

markatango commented Nov 17, 2021

thomasjpfan commented Nov 28, 2021

how to know which feature is selected by FeatureUnion? #6122

how to know which feature is selected by FeatureUnion? #6122

Comments

genliu777 commented Jan 6, 2016

jnothman commented Jan 6, 2016

Build estimator from PCA and Univariate selection:

Use combined features to transform dataset:

genliu777 commented Jan 6, 2016

joshhamanngaia commented Jul 7, 2017 • edited

jnothman commented Jul 8, 2017

markatango commented Nov 17, 2021

thomasjpfan commented Nov 28, 2021

joshhamanngaia commented Jul 7, 2017 •

edited