Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Requests #14

Open
5 of 13 tasks
xuyxu opened this issue Feb 2, 2021 · 17 comments · Fixed by #25 or #28
Open
5 of 13 tasks

Feature Requests #14

xuyxu opened this issue Feb 2, 2021 · 17 comments · Fixed by #25 or #28

Comments

@xuyxu
Copy link
Member

xuyxu commented Feb 2, 2021

This issue collects all features requests. Any one is welcomed to work on issues listed below, and do not forget to include your contributions and name in the CHANGELOG.rst.

If you want to work on a requested feature, please re-open the linked issue, and leave a comment below to let us know that you want to work on it.

New features

Python package

New language wrappers:

Fix

@tczhao
Copy link
Contributor

tczhao commented Feb 2, 2021

I will work on the #4 regressor task

@xuyxu
Copy link
Member Author

xuyxu commented Feb 3, 2021

I will work on the #4 regressor task

That would be really nice @tczhao! Adding the regressor requires many efforts, can you open a draft pull request and upload what you have done there? I am willing to take part in the development on this feature request and have some deeper discussions there.

In addition, here are some things that may be helpful to you:

  • For regression, the augmented features are out-of-bag predicted values from the cascade layer, which is unbounded features (In contrast, the augmented features for classification are bounded, i.e., the class vectors). This poses some problems if we want to use binning for acceleration, because the unbounded feature values after binning will be very sensitive to the boundary values.
  • Use the RandomForestRegressor and ExtraTreeRegressor from Scikit-Learn first. This version only includes the reduced version on classification trees. I am willing to optimize that for regression trees after we have a quick verification on the effectiveness on regression.

@NiMaZi
Copy link
Contributor

NiMaZi commented Feb 3, 2021

I'm working on #13

@tczhao
Copy link
Contributor

tczhao commented Feb 8, 2021

I will work on the #4 regressor task

That would be really nice @tczhao! Adding the regressor requires many efforts, can you open a draft pull request and upload what you have done there? I am willing to take part in the development on this feature request and have some deeper discussions there.

In addition, here are some things that may be helpful to you:

  • For regression, the augmented features are out-of-bag predicted values from the cascade layer, which is unbounded features (In contrast, the augmented features for classification are bounded, i.e., the class vectors). This poses some problems if we want to use binning for acceleration, because the unbounded feature values after binning will be very sensitive to the boundary values.
  • Use the RandomForestRegressor and ExtraTreeRegressor from Scikit-Learn first. This version only includes the reduced version on classification trees. I am willing to optimize that for regression trees after we have a quick verification on the effectiveness on regression.

Thanks, will have a draft ready in 2 days

@tczhao
Copy link
Contributor

tczhao commented Feb 10, 2021

maybe we can skip the Build wheels for Python 2.7 since python 2.7 is no longer maintained since 2020-01

@xuyxu
Copy link
Member Author

xuyxu commented Feb 10, 2021

maybe we can skip the Build wheels for Python 2.7 since python 2.7 is no longer maintained since 2020-01

Wheels for Python 2.7 is not included in the CI on build wheels, I have created an individual branch for people of interests ;-)

EDIT: This is actually a feature request from several users in the industrial community, who told me that ver2.7 is still the most frequently used python version in their environment.

@xuyxu xuyxu closed this as completed in #25 Feb 11, 2021
@xuyxu xuyxu reopened this Feb 11, 2021
@davidlkl
Copy link

davidlkl commented Feb 11, 2021

Hi,

Thanks @tczhao for the hard work!

Just would like to understand that if it would be sufficient to supply a custom loss by predictor_kwargs, (in other words, is there any other part in the CascadeForestRegressor using MSE as default?).

Thanks
David

@xuyxu
Copy link
Member Author

xuyxu commented Feb 11, 2021

Hi,

Thanks @tczhao for the hard work!

Just would like to understand that if it would be sufficient to supply a custom loss by predictor_kwargs, (in other words, is there any other part in the CascadeForestRegressor using MSE as default?).

Thanks
David

I think it is relatively easy to add the Mean Absolute Error (MAE), which is also available in Scikit-Learn. For custom loss functions, a new splitting criterion should be implemented for decision trees.

Maybe we can add another parameter to CascadeForestClassifier and CascadeForestRegression (e.g., criterion), which specifies the splitting criterion for decision trees in the model.

@T-Allen-sudo
Copy link
Contributor

I will work on the package for Mac-OS (#6, #32)

@xuyxu
Copy link
Member Author

xuyxu commented Feb 12, 2021

I will work on the package for Mac-OS (#6, #32)

Thanks ;-). You may find the documentation on cibuildwheel helpful when working on the CI: build-wheels.

@chendingyan
Copy link
Contributor

Hi @xuyxu ,
I found that in the current master branch, input y value will be checked by "deepforest.cascade._check_target_values". But when I input a sequence of integers as y value, it will be defined as "multiclass" instead of "continuous". In my point of view, y value in regression problem can be float number or integer number. It may cause big error in the future.
The images is the example from sklearn.utils.multiclass function type_of_target.
image

@xuyxu
Copy link
Member Author

xuyxu commented Mar 5, 2021

Hi @chendingyan, I agree with you on this point, the current check may be too strict. Any idea on how to improve this?

@chendingyan
Copy link
Contributor

Hi @xuyxu ,if you use "type_of_target" to check for input y values, I might add multiclass and multiclass-multioutput for univariate and multivariate regression, and also check the value in numpy array is numeric.

@xuyxu
Copy link
Member Author

xuyxu commented Mar 5, 2021

Hi @xuyxu ,if you use "type_of_target" to check for input y values, I might add multiclass and multiclass-multioutput for univariate and multivariate regression, and also check the value in numpy array is numeric.

That's a nice idea, and this should be easy to implement. I will appreciate it very much if you could contribute a PR for this enhancement ;-)

@chendingyan
Copy link
Contributor

Hi @xuyxu ,if you use "type_of_target" to check for input y values, I might add multiclass and multiclass-multioutput for univariate and multivariate regression, and also check the value in numpy array is numeric.

That's a nice idea, and this should be easy to implement. I will appreciate it very much if you could contribute a PR for this enhancement ;-)

Submit a PR~

@chendingyan
Copy link
Contributor

Hi @xuyxu , can you help me check my pr? How can I pass the code quality check?

@xuyxu
Copy link
Member Author

xuyxu commented Mar 5, 2021

Thanks for the PR @chendingyan, I will fix the code quality problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants