Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method for regression #571

Open
chjq201410695 opened this issue May 18, 2019 · 13 comments
Open

Add method for regression #571

chjq201410695 opened this issue May 18, 2019 · 13 comments
Labels
Type: Enhancement Indicates new feature requests
Milestone

Comments

@chjq201410695
Copy link

chjq201410695 commented May 18, 2019

As title. and I find a method in R as following:
https://github.com/paobranco/Pre-processingApproachesImbalanceRegression

and paper as :
https://www.semanticscholar.org/paper/SMOTE-for-Regression-Torgo-Ribeiro/43cda672b9ac0833086e19c90d42c2c0fbc361c6

@glemaitre
Copy link
Member

I am not opposed to it.

@glemaitre
Copy link
Member

closing in favor of #105

@bwang482
Copy link

bwang482 commented Jul 30, 2019

Hi @glemaitre am I right that currently only BalancedRandomForestClassifier from imblearn.ensemble can take real numbers as y for regression problems? Other ensemble models such as RUSBoostClassifier cannot do this? The oversampling strategies cannot do this either?

Thanks!

@chkoar
Copy link
Member

chkoar commented Jul 30, 2019

Hi @glemaitre am I right that currently only BalancedRandomForestClassifier from imblearn.ensemble can take real numbers as y for regression problems? Other ensemble models such as RUSBoostClassifier cannot do this?

@bluemonk482 the name of the models you mentioned ends with Classifier. That implies that are applicable in classification tasks.

The oversampling strategies cannot do this either?

Currently no, but we are interested on including an implementation of such a method.

@bwang482
Copy link

Thanks @chkoar !

I assume it is more complex than simply changing class BalancedRandomForestClassifier(RandomForestClassifier) to class BalancedRandomForestClassifier(RandomForestRegressor) in https://github.com/scikit-learn-contrib/imbalanced-learn/blob/c0aa81c40173bd28b863ccc1b82bbafcacb240c4/imblearn/ensemble/_forest.py ???

@glemaitre
Copy link
Member

glemaitre commented Jul 30, 2019 via email

@bwang482
Copy link

Understood. Thanks @glemaitre !

@akatav
Copy link

akatav commented Oct 30, 2019

@glemaitre this thread is such a godsend for me! so, i understand there is no way presently to generate synthetic data for regression problems where obviously the output variable Y is a continuous value. is that correct ?
Can the expert machine learners here suggest some way out of this sort of a problem then? more details included in my post - https://stats.stackexchange.com/questions/433740/regression-on-unevenly-distributed-high-dimensional-dataset

@glemaitre glemaitre reopened this Nov 17, 2019
@glemaitre glemaitre changed the title would you add some methods for regression problem Add method for regression Nov 17, 2019
@glemaitre
Copy link
Member

I reopen this issue, we could make a generic tool which would quantize the target and allow to apply any sampler. We could think about a meta-estimator to do the job. This would require what is called a relevance function.

@glemaitre glemaitre added the Type: Enhancement Indicates new feature requests label Nov 17, 2019
@glemaitre glemaitre added this to the 0.7 milestone Nov 17, 2019
@ogencoglu
Copy link

I believe these are relevant for this issue:

  • Torgo, Luís, et al. "Smote for regression." Portuguese conference on artificial intelligence. Springer, Berlin, Heidelberg, 2013.

  • Torgo, Luís, et al. "Resampling strategies for regression." Expert Systems 32.3 (2015): 465-476.

  • Branco, Paula. "Re-sampling approaches for regression tasks under imbalanced domains." Unpublished Master's Thesis), Dep. Computer Science, Faculty of Sciences‐University of Porto (2014).

  • Branco, Paula Oliveira, Luís Torgo, and Rita Paula Ribeiro. "SMOGN: a pre-processing approach for imbalanced regression." (2017).

@glevv
Copy link

glevv commented Feb 10, 2021

https://github.com/paobranco

She wrote several papers on the topic and has some of them implemented in R.

@glemaitre glemaitre modified the milestones: 0.8, 0.9 Feb 18, 2021
@glevv
Copy link

glevv commented Apr 28, 2021

I think the most simple way to do it without adding new methods, is to discretize target (uniformly or kmeans, quantiles won't do), then fit oversampler and then make an inverse transform (assign midrange bin values instead of bin numbers).

It should work through Pipeline and TargetTransformer.

@pavelkomarov
Copy link

I also vote for SMOTER. I don't want to have to download a different package https://pypi.org/project/smogn/ to do SMOTE with regression problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

8 participants