more-transformers

My own list of "extra" transformers in scikit-learn pipelines.

Intro

When building scikit-learn pipelines I often feel I have to do a lot of my data preparation work outside the pipeline. Moreover, many scikit-learn transformers could be more beginer friendly if they returned pandas DataFrames instead of numpy arrays.

With that in mind, this library includes a few additional transformers that are mostly thin wrappers around scikit-learn.

For example:

from more_transformers.preprocessing import StandardScaler

behaves identically to sklearn.preprocessing.StandardScaler but returns a pandas DataFrame with the same column names and index values as the original.

As another example

from more_transformers.decomposition import PCA

is the same as from sklearn.decomposition import PCA but retains the index and uses column names pca_0, pca_1,...,pca_n.

I've also added my own few helpers, mostly under from more_transformers.common. For example

from more_transformers.preprocessing import GetDummies

is a transformer version of pd.get_dummies. One advantage is that if the test data is transformed to have the same columns as pd.get_dummies on the training data.

Also note

from more_transformers.common import ColumnSelector

allows for very flexible selection of columns in your pipeline. For example

ColumnSelector() # Selects all columns
ColumnSelector(['Age','Weight','Height']) # Selects these columns
ColumnSelector('number') # Selects all integer or float columns
ColumnSelector(lambda x: str(x).starts_with('x_'))  # Selects columns starting with 'x_'

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
dist		dist
more_transformers		more_transformers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dist

dist