Improve `FunctionTransformer` diagram representation #29032

timvink · 2024-05-16T18:07:53Z

Describe the workflow you want to enable

Currently, using multiple FunctionTransformers in a pipeline leads to an uninformative view:

import pandas as pd
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer

df = pd.DataFrame([[1,2,3], [4,5,6]], columns=['one','two','three']) # sample data
def a(df): return df+1 # 1st transformer
def b(df): return df*10 # 2nd transformer

make_pipeline(FunctionTransformer(a), FunctionTransformer(b))

I would like to see the name of the function being used in the visual blocks

Describe your proposed solution

I would like to see something like this:

(or perhaps Function(<name of function>) or <name of function>() or FunctionTransformer_<name of function>)

A sample implementation might be look like this:

from sklearn.preprocessing import FunctionTransformer
from sklearn.utils._estimator_html_repr import _VisualBlock
from functools import partial

class PrettyFunctionTransformer(FunctionTransformer):
    def _sk_visual_block_(self):
        return _VisualBlock(
            "single",
            self,
            names=self.func.func.__name__ if isinstance(self.func, partial) else self.func.__name__,
            name_details=str(self),
        )

Describe alternatives you've considered, if relevant

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

glemaitre · 2024-05-16T19:59:10Z

Some of your suggestions are available when clicking on the transformers:

I don't know if we should treat specifically FunctionTransformer since this is a really generic transformer and extract out the information to display it at the first level.

timvink · 2024-05-17T07:59:29Z

The use case I envision is defining a scikit-learn pipeline for feature engineering. Feature engineering should be done on the training data, but also at inference time. If you add it to the model pipeline, you get 1) easier deployment (no preprocessing) and 2) safer pipelines, as feature engineering would be applied on each split separately. If you use the memory argument to cache the feature engineering pipeline, you also don't get the downside of repeating the same computations.

These two pipelines are identical, but the visualization on the right is much clearer:

glemaitre · 2024-05-17T08:24:15Z

Convinced, we need to work the details but definitely this is better. I still think we should have the info that this is a FunctionTransformer in some way.

Charlie-XIAO · 2024-05-25T12:25:57Z

Lazier way	Harder way

Just @timvink's implementation plus including the class name, wrapping in an inline-block and setting `white-space: pre-wrap`. Directly fits into the framework.	Maybe look a bit better? But this requires altering the structure a bit. In particular, adding a parameter `caption` to the visual blocks (default `None`) and render in the HTML.

glemaitre · 2024-05-31T19:05:40Z

I think that I better the harder way (unfortunately :)).

timvink · 2024-05-31T19:33:20Z

I also like 'the harder way' better.

Two further possible improvements:

switch the titles: 'FunctionTransformer' should be the caption and the function names the titles. This way the repetition is in the small font and the transformer func name in the big
show the partial function name. I don't think there is added value in showing that a function is a partial without showing the original function name. It's the same function but with different defaults.. we can just use the .func.__name__. We use partials a lot as we create pipelines from configuration files (using hydra instantiate)

Charlie-XIAO · 2024-06-01T04:20:29Z

I a so think that I better the harder way (unfortunately :)).

It's actually "fortunately" for me as I also like the harder way but afraid that people don't think it's worth the complexity 🤣

I also like 'the harder way' better.

Thanks for confirmation.

switch the titles: 'FunctionTransformer' should be the caption and the function names the titles. This way the repetition is in the small font and the transformer func name in the big

This is what I initially did, but then I found the info icon tooltip actually shows "documentation of {name}" which in that case would be "documentation of func name" which I think is improper. I will definitely consider this if I can find an easy way to tweak the info icon tooltip text individually.

show the partial function name. I don't think there is added value in showing that a function is a partial without showing the original function name. It's the same function but with different defaults.. we can just use the .func.__name__. We use partials a lot as we create pipelines from configuration files (using hydra instantiate)

This I'm hesitant. I do agree that partial(...) does not provide (sufficient) useful information, but it's hard to consider all corner cases given that partial is not the only other way to construct a function. E.g. np.vectorize would need func.ufunc.__name__. What about partial of partial, partial of partial of partial, vectorize of partial, etc.?

timvink · 2024-06-01T10:57:13Z

I found the info icon tooltip actually shows "documentation of {name}" which in that case would be "documentation of func name" which I think is improper

Checking the Developer API for HTML representation it seems we could overwrite the _doc_link_template and _doc_link_url_param_generator methods for FunctionTransformer

What about partial of partial, partial of partial of partial, vectorize of partial, etc.?

We can implement a recursive function for those edge cases, like so:

sample implementation for `get_function_name`

import numpy as np
import functools

def get_function_name(func):
    """
    Retrieves the name of a function, supporting `np.vectorize` and `functools.partial`, 
    including nested variations.
    """
    # Check if the function has a `__name__` attribute directly
    if hasattr(func, '__name__'):
        return func.__name__
    
    # Check for functools.partial
    if isinstance(func, functools.partial):
        return get_function_name(func.func)
    
    # Check for np.vectorize
    if isinstance(func, np.vectorize):
        return get_function_name(func.pyfunc)
    
    # Check if the function has a `__wrapped__` attribute (for other decorators)
    if hasattr(func, '__wrapped__'):
        return get_function_name(func.__wrapped__)
    
    # If all else fails, return a placeholder name or indication
    return "<unknown_function>"

# Example Usage:
def example_function(x):
    return x

partial_func = functools.partial(example_function, x=2)
vectorized_func = np.vectorize(example_function)
partial_vectorized_func = functools.partial(vectorized_func, x=2)

print(get_function_name(example_function))           # Output: example_function
print(get_function_name(partial_func))               # Output: example_function
print(get_function_name(vectorized_func))            # Output: example_function
print(get_function_name(partial_vectorized_func))    # Output: example_function
print(get_function_name(lambda x: x))    # Output: <lambda>

That will deal with the vast majority of functions and it has a fallback.

timvink added Needs Triage Issue requires triage New Feature labels May 16, 2024

glemaitre removed the Needs Triage Issue requires triage label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `FunctionTransformer` diagram representation #29032

Improve `FunctionTransformer` diagram representation #29032

timvink commented May 16, 2024

glemaitre commented May 16, 2024

timvink commented May 17, 2024 •

edited

glemaitre commented May 17, 2024

Charlie-XIAO commented May 25, 2024

glemaitre commented May 31, 2024

timvink commented May 31, 2024

Charlie-XIAO commented Jun 1, 2024

timvink commented Jun 1, 2024

Improve FunctionTransformer diagram representation #29032

Improve FunctionTransformer diagram representation #29032

Comments

timvink commented May 16, 2024

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

glemaitre commented May 16, 2024

timvink commented May 17, 2024 • edited

glemaitre commented May 17, 2024

Charlie-XIAO commented May 25, 2024

glemaitre commented May 31, 2024

timvink commented May 31, 2024

Charlie-XIAO commented Jun 1, 2024

timvink commented Jun 1, 2024

Improve `FunctionTransformer` diagram representation #29032

Improve `FunctionTransformer` diagram representation #29032

timvink commented May 17, 2024 •

edited