New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current status of missing values (in oblique trees) #263
Comments
Missing values aren't supported yet, and taking a look, I think this is a silent bug, so thanks for the report! I will need to go through and enable an error message to be raised. Rn it is silent failing cuz there is no check at the Python level. At the Cython level, I would guess that the NaNs are somehow being represented as infinity, so this is definitely erroneous. Will submit a PR to fix |
PR #264 should fix this issue. Lmk what you think @vruusmann |
Yes, the All fine by me! I don't call the shots here anyway. |
Then again, enabling missing value support based on the The thing that needs discussion is what to do when computing "oblique features" when one or more projection matrix elements evaluates to a missing value (eg. consider a PM row with three input features, with two being available and one missing):
|
This is a request for clarification.
The documentation for v0.7.X says that supervised trees (such as oblique decision trees) do not support missing values: https://github.com/neurodata/scikit-tree/blob/v0.7.0/doc/modules/supervised_tree.rst (jump to "Limitations compared to decision trees")
However, the following Python code runs without errors:
If missing values are really not supported, then I would expect the
ObliqueDecisionTreeClassifier.fit(X, y)
method to fail quickly and cleanly with an appropriate error message.The
ObliqueTree.missing_go_to_left
attribute is definitely set. But I can see that its elements hold values that are not valid according to Scikit-Learn's missing-go-to-left conventions (ie. all elements should be either0
or1
, but the "active" values appear arbitrary 1-byte integers with values up to128
).I was experimenting with a setup where a "oblique projection feature" computed to missing value when any of its input features was a missing value, and then scoring the oblique tree using Scikit-Learn's missing-go-to-left algorithm. However, the predictions didn't seem to agree, which suggests that Scikit-Tree is doing something differently.
TLDR: As of today, is it permitted to pass missing values into oblique tree-based estimators or not?
The text was updated successfully, but these errors were encountered: