Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying projection matrix rows #262

Open
vruusmann opened this issue Apr 25, 2024 · 4 comments
Open

Simplifying projection matrix rows #262

vruusmann opened this issue Apr 25, 2024 · 4 comments

Comments

@vruusmann
Copy link

Is your feature request related to a problem? Please describe.

While developing a PMML converter for oblique trees (see #255), I noticed that the projection matrix (as retrievable via the ObliqueTree.proj_vecs attribute) contains two types of "axis aligned split" definitions (ie. projection matrix rows where only a single row element is set to a non-zero value).

These two types are:

  • Positive/default axis aligned split. The only non-zero row element is 1.0. For example, [0, 0, 1, 0].
  • Negative axis aligned split. The only non-zero row element is -1.0. For example, [0, -1, 0, 0].

Describe the solution you'd like

I would propose that all axis aligned splits should be standardized to the positive/default axis aligned split representation.

Negating a split condition does not add any information to it. But it makes interpreting the resulting oblique tree more complicated, because the associated split threshold value also appears negated.

  • Positive/default split: feature <= threshold
  • Negative split: -1 * feature <= -1 * threshold

In other words, the algorithm should not multiply standalone feature values with -1 during training. It should keep them as-is.

Describe alternatives you've considered

The current behaviour (SkTree 0.7.2) is okay, but the resulting oblique trees are unnecessarily complicated.

@adam2392
Copy link
Collaborator

This is hard to support as of now because we sample oblique splits currently using a "density" hyper parameter that dictates how many non-zeros there are in the projection matrix. This sometimes makes certain projection rows just either all 0's, or only one +1/-1. In order to flip all the -1 rows to +1, we would have to add an extra computation, slowing down the overall training of trees.

I think this would have to be handled downstream if other packages want a "simple" interpretation of those specific splits?

@vruusmann
Copy link
Author

In order to flip all the -1 rows to +1, we would have to add an extra computation, slowing down the overall training of trees.

You would standardize the projection matrix only once. Basically, you'd iterate over PM row-wise, and if the "effective row length" is one (ie. contains only one non-zero element), you'd set this one element to +1.0. No need to even check for its actual value.

I think this would have to be handled downstream if other packages want a "simple" interpretation of those specific splits?

I tried to "invert" these negative splits during PMML conversion.

Something like: If input is -1 * feature <= -1 * threshold, then interpret it as feature > threshold. But my integration testing showed that this inversion is not sufficient, because the predictions (between original and inverted splits) came out different. Looks like the threshold value itself also needs to be adjusted somehow.

@vruusmann
Copy link
Author

vruusmann commented Apr 26, 2024

This is hard to support as of now because we sample oblique splits currently using a "density" hyper parameter that dictates how many non-zeros there are in the projection matrix.

But would it be possible to add some training parameter, which allows the data scientist to indicate if she's willing to accept a slight performance penalty during model training, in order to get much simplified oblique trees for later prediction and interpretation?

I think this would have to be handled downstream if other packages want a "simple" interpretation of those specific splits?

Right now, when you train a simplistic oblique decision tree classifier for the iris dataset, then you get two types of splits per feature. For example, Sepal.Length <= 5.3 and -1 * Sepal.Length <= -6.5. Essentially, the "effective number" of features is doubled.

This makes the interpretation of oblique forests twice as hard as it could be.

@adam2392
Copy link
Collaborator

adam2392 commented May 6, 2024

@jovo any thoughts on how to best handle this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants