-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds recommended actions for InvalidTargetDataCheck and update _make_component_list_from_actions to address this action #1989
Merged
Merged
Changes from 23 commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
c9e2e66
init
angela97lin 3d76716
fix tests
angela97lin 70299b5
release notes
angela97lin 8ac18d3
add init code for target imputer
angela97lin a035a41
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin 1327230
welp
angela97lin b53407d
hmm testing
angela97lin 542ec07
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin 8c08bc8
fix some tests
angela97lin 97e2f48
test renaming
angela97lin 18497fb
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin ac28999
some updates, more tests to go
angela97lin f9c04e8
Merge branch '1881_fill_in_actions_cont' of github.com:alteryx/evalml…
angela97lin d0ed8ee
fix tests, add impute strategy
angela97lin 15ec313
lint mclint
angela97lin 364fd95
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin fbf7ead
fix tests
angela97lin f681b28
codecov testing
angela97lin 5f8f2b1
linting
angela97lin e68927f
clean up and fix tests
angela97lin 0944f08
remove unreachable
angela97lin 31145e5
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin d34e0c9
cleanup docstrings
angela97lin ff80ad1
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin 81ba4b8
address feedback, update 100% null or empty case, address target impu…
angela97lin ed223d9
a lot of cleanup
angela97lin 8bd8633
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin 5306bf9
undo simpleimputer subclassing
angela97lin 43a442f
fix up tests
angela97lin 9890722
merge
angela97lin 3b9f13c
oops
angela97lin cdc320c
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin da982d9
move release notes
angela97lin ff4b679
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin c1919e3
merge
angela97lin cccb717
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin 6068a8c
fixing from comments and rename from details to metadata
angela97lin 1402507
Merge branch '1881_fill_in_actions_cont' of github.com:alteryx/evalml…
angela97lin 65a301a
fix test and add one for X not None
angela97lin 9c2ff4e
fix tests with indices
angela97lin aabdbd2
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin fb82cee
cleanup
angela97lin 4f093c9
codecov
angela97lin 56388f8
remove from component graph and cleanup
angela97lin 2d72bc6
add another test
angela97lin 5cdb59b
clean up merge:
angela97lin f0cdcbd
Merge branch 'main' into 1881_fill_in_actions_cont
angela97lin File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
from .per_column_imputer import PerColumnImputer | ||
from .simple_imputer import SimpleImputer | ||
from .imputer import Imputer | ||
from .target_imputer import TargetImputer |
95 changes: 95 additions & 0 deletions
95
evalml/pipelines/components/transformers/imputers/target_imputer.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
import pandas as pd | ||
import woodwork as ww | ||
from sklearn.impute import SimpleImputer as SkImputer | ||
|
||
from evalml.pipelines.components.transformers import Transformer | ||
from evalml.utils import ( | ||
_convert_woodwork_types_wrapper, | ||
_retain_custom_types_and_initalize_woodwork, | ||
infer_feature_types | ||
) | ||
|
||
|
||
class TargetImputer(Transformer): | ||
chukarsten marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Imputes missing data according to a specified imputation strategy.""" | ||
name = 'Target Imputer' | ||
hyperparameter_ranges = {"impute_strategy": ["mean", "median", "most_frequent"]} | ||
|
||
def __init__(self, impute_strategy="most_frequent", fill_value=None, random_seed=0, **kwargs): | ||
"""Initalizes an transformer that imputes missing data according to the specified imputation strategy." | ||
|
||
Arguments: | ||
impute_strategy (string): Impute strategy to use. Valid values include "mean", "median", "most_frequent", "constant" for | ||
numerical data, and "most_frequent", "constant" for object data types. | ||
fill_value (string): When impute_strategy == "constant", fill_value is used to replace missing data. | ||
Defaults to 0 when imputing numerical data and "missing_value" for strings or object data types. | ||
random_seed (int): Seed for the random number generator. Defaults to 0. | ||
""" | ||
parameters = {"impute_strategy": impute_strategy, | ||
"fill_value": fill_value} | ||
parameters.update(kwargs) | ||
imputer = SkImputer(strategy=impute_strategy, | ||
fill_value=fill_value, | ||
**kwargs) | ||
super().__init__(parameters=parameters, | ||
component_obj=imputer, | ||
random_seed=random_seed) | ||
|
||
def fit(self, X, y): | ||
"""Fits imputer to target data. 'None' values are converted to np.nan before imputation and are | ||
treated as the same. | ||
|
||
Arguments: | ||
X (ww.DataTable, pd.DataFrame or np.ndarray): The input training data of shape [n_samples, n_features]. Ignored. | ||
y (ww.DataColumn, pd.Series, optional): The target training data of length [n_samples] | ||
angela97lin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Returns: | ||
self | ||
""" | ||
if y is None: | ||
raise ValueError("y cannot be None") | ||
y = infer_feature_types(y) | ||
y = _convert_woodwork_types_wrapper(y.to_series()).to_frame() | ||
|
||
# Return early since bool dtype doesn't support nans and sklearn errors if all cols are bool | ||
if (y.dtypes == bool).all(): | ||
y = y.astype('category') | ||
angela97lin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
self._component_obj.fit(y) | ||
return self | ||
|
||
def transform(self, X, y): | ||
"""Transforms input target data by imputing missing values. 'None' and np.nan values are treated as the same. | ||
|
||
Arguments: | ||
X (ww.DataTable, pd.DataFrame): Features. Ignored. | ||
y (ww.DataColumn, pd.Series): Target data to impute. | ||
|
||
Returns: | ||
ww.DataColumn: Transformed y | ||
angela97lin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
y_ww = infer_feature_types(y) | ||
y = _convert_woodwork_types_wrapper(y_ww.to_series()) | ||
y_df = y.to_frame() | ||
angela97lin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Return early since bool dtype doesn't support nans and sklearn errors if all cols are bool | ||
if (y_df.dtypes == bool).all(): | ||
return _retain_custom_types_and_initalize_woodwork(y_ww, y) | ||
|
||
transformed = self._component_obj.transform(y_df) | ||
if transformed.shape[1] == 0: | ||
return ww.DataColumn(pd.Series([])) | ||
angela97lin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
y_t = pd.Series(transformed[:, 0], index=y.index) | ||
return _retain_custom_types_and_initalize_woodwork(y_ww, y_t) | ||
|
||
def fit_transform(self, X, y): | ||
"""Fits on y and transforms y | ||
|
||
Arguments: | ||
X (ww.DataTable, pd.DataFrame): Features. Ignored. | ||
y (ww.DataColumn, pd.Series): Target data to impute. | ||
|
||
Returns: | ||
ww.DataColumn: Transformed y | ||
angela97lin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
return self.fit(X, y).transform(X, y) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted a way to specify that we want to impute the target without relying on the name of the column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!