-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase compatibility of EBM with scikit-learn #518
Conversation
Signed-off-by: DerWeh <andreas.weh@web.de>
Signed-off-by: DerWeh <andreas.weh@web.de>
Signed-off-by: DerWeh <andreas.weh@web.de>
Signed-off-by: DerWeh <andreas.weh@web.de>
Raising a ValueError is compatible with sklearn and therefore the expected behavior. TypeError is inappropriate in the Python world, as `type(X)` has the correct type (an np.ndarray). Just the X.dtype is wrong. This is a NumPy concept and not a native Python concept. Signed-off-by: DerWeh <andreas.weh@web.de>
Signed-off-by: DerWeh <andreas.weh@web.de>
Signed-off-by: DerWeh <andreas.weh@web.de>
Signed-off-by: DerWeh <andreas.weh@web.de>
We still fail some scikit-learn tests, where I am not sure if we intentionally deviate. I didn't remove the old (currently skipped) The private versions aren't test at the moment. They fail with some cryptic errors which I do not understand. Sorry, but I don't really understand the DP versions, so I cannot easily fix this part. I think we need these tests back, before continuing to work on #514, as #514 would mean a rather drastic refactor. |
On the scikit-learn tests:
We can remove the old test_scikit_learn_compatibility test that wasn't enabled. I can have a look at the DP tests once the rest of this PR is in. |
Support of 2d y we `y.shape[-1] = 1` is desired This reverts commit 230cddd. Signed-off-by: DerWeh <andreas.weh@web.de>
Signed-off-by: DerWeh <andreas.weh@web.de>
Signed-off-by: DerWeh <andreas.weh@web.de>
Signed-off-by: DerWeh <andreas.weh@web.de>
I updated the tags, indicating this, which allowed re-enabling the test. Please take a look if the tags seem suitable now. Check
The scikit-learn fits floating point
I guess the ambiguity is whether we have a single sample and the dimension represents features, or a single feature and the dimension represents samples. Anyway, test is skipped.
Done
Then I'll leave this up to you. I think the PR should be good to merge now, or do you want me to check out if we can be more permissive about older scikit-learn versions? As I said, you can validate that the correct tags are set. |
Ah, now I remember about this, I converted y floats to strings because JSON doesn't differentiate between integers and floats. It just has a number type, which are always floats. I wanted to be able to serialize to JSON and back to a python object. Without indicating the original datatype it would be ambiguous whether the JSON value 1 should be restored in python as an integer or a float type. I figured integers would be more common than floats for y, so I converted floats to strings. If we want to support both integers and floats, we can do it by including an "output_type" field for classifiers in the JSON to differentiate between integers and floats here:
If the rest of the PR is ready I can just merge it first and check afterwards. It's probably fine to use 0.24 but you never know what old environment someone might want to install interpret in, especially a cloud based one where the user might not even be able to upgrade. I'll review now. |
Just tried to test with an old version, which failed as some test I skip didn't exist in old version... If we compare the functions names instead of functions itself, we can alleviate this problem. But more importantly: we actually don't need to bump the version at all, it suffices to increase the scikit-learn version in the |
Thanks @DerWeh -- It's really nice to finally have scikit-learn verification! |
This PR adds the estimator checks for scikit-learn compatibility.
We add the following compatibilities:
ValueError
instead of aTypeError