Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise an error if an the target column has an invalid logical type #4117

Open
gsheni opened this issue Mar 28, 2023 · 0 comments
Open

Raise an error if an the target column has an invalid logical type #4117

gsheni opened this issue Mar 28, 2023 · 0 comments

Comments

@gsheni
Copy link
Contributor

gsheni commented Mar 28, 2023

  • As a user of EvalML, I expect EvalML to check the Logical Type of my target column to determine if it is valid.
    • If it is not valid, but can be cast to the correct type, I expect EvalML to change the Logical type.
    • If it is not valid, and cannot be cast to the correct type, I expect EvalML to raise an error.
def check_target_logical_type(y, problem_type):
    if problem_type in [
        ProblemTypes.REGRESSION,
        ProblemTypes.TIME_SERIES_REGRESSION,
    ] and not any(
        isinstance(y.ww.schema.logical_type, x)
        for x in [
            Integer,
            IntegerNullable,
            Double,
        ]
    ):
        raise ValueError(
            "Regression problem type requires a Integer, IntegerNullable or Double target",
        )
    elif problem_type == ProblemTypes.MULTICLASS and not isinstance(
        y.ww.schema.logical_type,
        Categorical,
    ):
        y = y.ww.set_logical_type("Categorical")
    elif problem_type == ProblemTypes.BINARY and not any(
        isinstance(y.ww.schema.logical_type, x)
        for x in [
            Boolean,
            BooleanNullable,
            Categorical,
        ]
    ):
        raise ValueError(
            "Binary problem type requires a Boolean, BooleanNullable or Categorical target",
        )
    return y

Tests

def test_check_target_logical_type():
    y = pd.Series([1, 2, 2, 3, 3, 1], dtype="int64")
    y.ww.init(logical_type="Integer")
    check_target_logical_type(y, ProblemTypes.REGRESSION)
    check_target_logical_type(y, ProblemTypes.TIME_SERIES_REGRESSION)

    with pytest.raises(ValueError, match="Binary problem type requires a"):
        check_target_logical_type(y, ProblemTypes.BINARY)
    new_y = check_target_logical_type(y, ProblemTypes.MULTICLASS)
    assert new_y.ww.schema.logical_type.__class__ == Categorical

    y = pd.Series(["red", "blue", "blue"], dtype="category")
    y.ww.init(logical_type="Categorical")
    with pytest.raises(ValueError, match="Regression problem type requires a"):
        check_target_logical_type(y, ProblemTypes.REGRESSION)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant