Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PerColumnImputer can raise woodwork.exceptions.TypeConversionError if float values are imputed into Int64 data #4150

Open
tamargrey opened this issue Apr 14, 2023 · 0 comments

Comments

@tamargrey
Copy link
Contributor

The PerColumnImputer can impute floating point values into integer data with the mean or median numeric impute strategies. When this happens, we cannot simply reinitialize the original data's woodwork schema via X_t.ww.init(schema=original_schema.get_subset_schema(X_t.columns)) like we currently do, since it would try to use Int64 on floating point data, which results in an error.

We'll need to use _get_new_logical_types_for_imputed_data similar to how other imputers do in order to use the correct logical types for imputed data. Note that because the per-column imputer can have different strategies for different columns, we'll need to either change _get_new_logical_types_for_imputed_data to allow per column strategies, or call it individually for every column.

below is a test that produces the type conversion error

def test_per_column_imputer_float_imputed_into_int(imputer_test_data):
    X = imputer_test_data.ww[["int with nan"]]
    strategies = {
        "int with nan": {"impute_strategy": "mean"},
    }
    transformer = PerColumnImputer(impute_strategies=strategies)
    transformer.fit(X)
    transformer.transform(X)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant