LabelBinarizer and LabelEncoder fit and transform signatures not compatible with Pipeline #3112

hxu · 2014-04-26T09:37:10Z

I get this error when I try to use LabelBinarizer and LabelEncoder in a Pipeline:

sklearn/pipeline.pyc in fit_transform(self, X, y, **fit_params)
    141         Xt, fit_params = self._pre_transform(X, y, **fit_params)
    142         if hasattr(self.steps[-1][-1], 'fit_transform'):
--> 143             return self.steps[-1][-1].fit_transform(Xt, y, **fit_params)
    144         else:
    145             return self.steps[-1][-1].fit(Xt, y, **fit_params).transform(Xt)

TypeError: fit_transform() takes exactly 2 arguments (3 given)

It seems like this is because the classes' fit and transform signatures are different from most other estimators and only accept a single argument.

I think this is a pretty easy fix (just change the signature to def(self, X, y=None)) that I'd be happy to send a pull request for, but I wanted to check if there were any other reasons that the signatures are the way they are that I didn't think of.

The text was updated successfully, but these errors were encountered:

jnothman · 2014-04-26T10:10:21Z

I think you're right to fix that.

On 26 April 2014 19:37, hxu notifications@github.com wrote:

I get this error when I try to use LabelBinarizer and LabelEncoder in a
Pipeline:

sklearn/pipeline.pyc in fit_transform(self, X, y, *_fit_params)
141 Xt, fit_params = self._pre_transform(X, y, *_fit_params)
142 if hasattr(self.steps[-1][-1], 'fit_transform'):--> 143 return self.steps[-1][-1].fit_transform(Xt, y, *_fit_params)
144 else:
145 return self.steps[-1][-1].fit(Xt, y, *_fit_params).transform(Xt)
TypeError: fit_transform() takes exactly 2 arguments (3 given)

It seems like this is because the classes' fit and transform signatureshttps://github.com/scikit-learn/scikit-learn/blob/master/sklearn/preprocessing/label.py#L85are different from most other estimators and only accept a single argument.

I think this is a pretty easy fix (just change the signature to def(self,
X, y=None)) that I'd be happy to send a pull request for, but I wanted to
check if there were any other reasons that the signatures are the way they
are that I didn't think of.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3112
.

jnothman · 2014-08-31T03:24:08Z

In #3113 we have decided this is not to be fixed because label encoding doesn't really belong in a Pipeline.

tutuca · 2016-06-15T18:00:39Z

@jnothman, just to know: what should I be doing instead if I happen to need to vectorize a categorical feature in a pipeline?

jnothman · 2016-06-16T02:24:55Z

You might be best off writing your own Pipeline-like code (perhaps inheriting from the existing) to handle your specific case.

Kallin · 2017-07-10T21:19:06Z

Instead of using LabelBinarizer in a pipeline I just implemented my own transformer:

class CustomBinarizer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None,**fit_params):
        return self
    def transform(self, X):
        return LabelBinarizer().fit(X).transform(X)

Seems to do the trick!

edit:

this is a better solution:
https://github.com/scikit-learn/scikit-learn/pull/7375/files#diff-1e175ddb0d84aad0a578d34553f6f9c6

jnothman · 2018-01-29T12:09:58Z

I see that there have been a lot of negative reactions on this page. I think there has been a long misunderstanding of the purpose of LabelBinarizer and LabelEncoder. These are for targets, not features. Although admittedly they were designed (and poorly named) before my time.

Although I think users could have been using CountVectorizer (or DictVectorizer with dataframe.to_dict(orient='records') if you're coming from a dataframe) for this purpose for a long time, we have recently merged a CategoricalEncoder (#9151) into master, although this may be rolled into OneHotEncoer, and a new OrdinalEncoder before release (#10521).

I hope this satisfies the needs of a clearly disgruntled populace.

I must say that as someone who has been volunteering enormous quantities of free time for the development of this project for nearly five years now (and recently has been employed to work on it too), seeing the magnitude of negative reactions, rather than constructive contributions to the library is quite saddening. Although admittedly my response above that you should write a new Pipeline-like thing, rather than a new transformer for categorical inputs was a misunderstanding on my part (and should/could have been corrected by others), which I hope is understandable while working through the enormous workload that is maintaining this project.

hxu mentioned this issue Apr 27, 2014

Fix LabelBinarizer and LabelEncoder fit and transform signatures to work with Pipeline #3113

Closed

arjoly added the API label May 11, 2014

jnothman closed this as completed Aug 31, 2014

jnothman mentioned this issue Aug 31, 2014

[MRG] Label Encoder Unseen Labels #3599

Closed

5 tasks

cancan101 mentioned this issue Jan 22, 2015

Allow for Transformers on y #4143

Closed

Kallin mentioned this issue Jul 10, 2017

LabelBinarizer doesn't work in Pipeline ageron/handson-ml#55

Closed

jnothman mentioned this issue Jan 29, 2018

Make LabelBinarizer pipeline friendly #10547

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LabelBinarizer and LabelEncoder fit and transform signatures not compatible with Pipeline #3112

LabelBinarizer and LabelEncoder fit and transform signatures not compatible with Pipeline #3112

hxu commented Apr 26, 2014

jnothman commented Apr 26, 2014

jnothman commented Aug 31, 2014

tutuca commented Jun 15, 2016

jnothman commented Jun 16, 2016

Kallin commented Jul 10, 2017 •

edited

jnothman commented Jan 29, 2018 •

edited

LabelBinarizer and LabelEncoder fit and transform signatures not compatible with Pipeline #3112

LabelBinarizer and LabelEncoder fit and transform signatures not compatible with Pipeline #3112

Comments

hxu commented Apr 26, 2014

jnothman commented Apr 26, 2014

jnothman commented Aug 31, 2014

tutuca commented Jun 15, 2016

jnothman commented Jun 16, 2016

Kallin commented Jul 10, 2017 • edited

jnothman commented Jan 29, 2018 • edited

Kallin commented Jul 10, 2017 •

edited

jnothman commented Jan 29, 2018 •

edited