Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

('Feature', ... , 'has a value outside the dataset.') caused by type mismatch #390

Open
dylan-kelahealth opened this issue Aug 22, 2023 · 3 comments

Comments

@dylan-kelahealth
Copy link

[problem]

Value errors for out of interval values are thrown when data are of different type

[illustrative example]

...
        data_ranges = df.describe().loc[["min", "max"]].to_dict()
        data_ranges = {
            x: [y, z] # integer values
            for x, y, z in zip(
                df.columns, df.min(), df.max()
            )
        }
...
        d = dice_ml.Data(
            features=data_ranges,
            continuous_features=continuous,
            outcome_name=outcome,
        )
        exp = dice_ml.Dice(d, model, method="random")
        counterfactuals = exp.generate_counterfactuals(query_df, total_CFs=4, desired_class="opposite")

The following error is produced when the values of the intervals are int, and the query values are some precision of float

('Feature', ... , 'has a value outside the dataset.')

However, the feature value can be within the range of the interval and still throw this error.

After casting the feature ranges to the same type as their query values, this error goes away.

[proposed fix]
Intervals should not be required to have the same type as the query values

interval: list[int] =  [1, 10]

should accept value,

value: np.float16 = 3.5

as within the range of the interval.

@gaugup
Copy link
Collaborator

gaugup commented Sep 6, 2023

@dylan-kelahealth, The error looks to me correctly raised. Your model would have been trained on integer features so not sure how it would interpret non-integer values if dice-ml decides to generate floating point values. Correct me if you think that is not accurate.

@dylan-kelahealth
Copy link
Author

@gaugup If this is a correctly raised error, then the error message might benefit from revision (e.g. type checking). Several other issues have posted about the same error message. This is also not clarified in the documentation.

This was confusing because "value outside the dataset" implies outside the defined range. The range contains the query value unless the range is specified as integers only, which it is not.

@gaugup
Copy link
Collaborator

gaugup commented Sep 7, 2023

Thanks for clarifying. I think we should raise a better error message in this case to ask user to align types.

Could you provide the exact notebook code that reproduces this error?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants