Add method for regression #571

chjq201410695 · 2019-05-18T03:30:40Z

As title. and I find a method in R as following:
https://github.com/paobranco/Pre-processingApproachesImbalanceRegression

and paper as :
https://www.semanticscholar.org/paper/SMOTE-for-Regression-Torgo-Ribeiro/43cda672b9ac0833086e19c90d42c2c0fbc361c6

glemaitre · 2019-06-07T12:22:03Z

I am not opposed to it.

glemaitre · 2019-06-11T22:45:30Z

closing in favor of #105

bwang482 · 2019-07-30T12:52:26Z

Hi @glemaitre am I right that currently only BalancedRandomForestClassifier from imblearn.ensemble can take real numbers as y for regression problems? Other ensemble models such as RUSBoostClassifier cannot do this? The oversampling strategies cannot do this either?

Thanks!

chkoar · 2019-07-30T12:58:11Z

Hi @glemaitre am I right that currently only BalancedRandomForestClassifier from imblearn.ensemble can take real numbers as y for regression problems? Other ensemble models such as RUSBoostClassifier cannot do this?

@bluemonk482 the name of the models you mentioned ends with Classifier. That implies that are applicable in classification tasks.

The oversampling strategies cannot do this either?

Currently no, but we are interested on including an implementation of such a method.

bwang482 · 2019-07-30T13:13:45Z

Thanks @chkoar !

I assume it is more complex than simply changing class BalancedRandomForestClassifier(RandomForestClassifier) to class BalancedRandomForestClassifier(RandomForestRegressor) in https://github.com/scikit-learn-contrib/imbalanced-learn/blob/c0aa81c40173bd28b863ccc1b82bbafcacb240c4/imblearn/ensemble/_forest.py ???

glemaitre · 2019-07-30T13:18:50Z

Yes because you need to understand and make a proper resampling strategy in the context of regression which is not really straightforward and there is almost no literature on this.

…

On Tue, 30 Jul 2019 at 15:13, bluemonk482 ***@***.***> wrote: Thanks @chkoar <https://github.com/chkoar> ! I assume it is more complex than simply changing class BalancedRandomForestClassifier(RandomForestClassifier) to class BalancedRandomForestClassifier(RandomForestRegressor) in https://github.com/scikit-learn-contrib/imbalanced-learn/blob/c0aa81c40173bd28b863ccc1b82bbafcacb240c4/imblearn/ensemble/_forest.py ??? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#571?email_source=notifications&email_token=ABY32P44ML33YLHD4EI62A3QCA5A3A5CNFSM4HNZNXWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3D5MVQ#issuecomment-516413014>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABY32P2YI5JL4TJ4OGTZV43QCA5A3ANCNFSM4HNZNXWA> .

-- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/

bwang482 · 2019-07-30T13:21:10Z

Understood. Thanks @glemaitre !

akatav · 2019-10-30T07:45:08Z

@glemaitre this thread is such a godsend for me! so, i understand there is no way presently to generate synthetic data for regression problems where obviously the output variable Y is a continuous value. is that correct ?
Can the expert machine learners here suggest some way out of this sort of a problem then? more details included in my post - https://stats.stackexchange.com/questions/433740/regression-on-unevenly-distributed-high-dimensional-dataset

glemaitre · 2019-11-17T11:48:40Z

I reopen this issue, we could make a generic tool which would quantize the target and allow to apply any sampler. We could think about a meta-estimator to do the job. This would require what is called a relevance function.

ogencoglu · 2020-01-24T06:56:46Z

I believe these are relevant for this issue:

Torgo, Luís, et al. "Smote for regression." Portuguese conference on artificial intelligence. Springer, Berlin, Heidelberg, 2013.
Torgo, Luís, et al. "Resampling strategies for regression." Expert Systems 32.3 (2015): 465-476.
Branco, Paula. "Re-sampling approaches for regression tasks under imbalanced domains." Unpublished Master's Thesis), Dep. Computer Science, Faculty of Sciences‐University of Porto (2014).
Branco, Paula Oliveira, Luís Torgo, and Rita Paula Ribeiro. "SMOGN: a pre-processing approach for imbalanced regression." (2017).

glevv · 2021-02-10T18:16:15Z

https://github.com/paobranco

She wrote several papers on the topic and has some of them implemented in R.

glevv · 2021-04-28T10:02:06Z

I think the most simple way to do it without adding new methods, is to discretize target (uniformly or kmeans, quantiles won't do), then fit oversampler and then make an inverse transform (assign midrange bin values instead of bin numbers).

It should work through Pipeline and TargetTransformer.

pavelkomarov · 2021-07-12T20:34:47Z

I also vote for SMOTER. I don't want to have to download a different package https://pypi.org/project/smogn/ to do SMOTE with regression problems.

glemaitre closed this as completed Jun 11, 2019

glemaitre reopened this Nov 17, 2019

glemaitre changed the title ~~would you add some methods for regression problem~~ Add method for regression Nov 17, 2019

glemaitre added the Type: Enhancement Indicates new feature requests label Nov 17, 2019

glemaitre added this to the 0.7 milestone Nov 17, 2019

glemaitre mentioned this issue Jun 22, 2020

Is there any technique for regression problem here? #729

Closed

glemaitre modified the milestones: 0.7, 0.8 Nov 26, 2020

glemaitre modified the milestones: 0.8, 0.9 Feb 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add method for regression #571

Add method for regression #571

chjq201410695 commented May 18, 2019 •

edited

glemaitre commented Jun 7, 2019

glemaitre commented Jun 11, 2019

bwang482 commented Jul 30, 2019 •

edited

chkoar commented Jul 30, 2019

bwang482 commented Jul 30, 2019

glemaitre commented Jul 30, 2019 via email

bwang482 commented Jul 30, 2019

akatav commented Oct 30, 2019

glemaitre commented Nov 17, 2019

ogencoglu commented Jan 24, 2020

glevv commented Feb 10, 2021 •

edited

glevv commented Apr 28, 2021 •

edited

pavelkomarov commented Jul 12, 2021

Add method for regression #571

Add method for regression #571

Comments

chjq201410695 commented May 18, 2019 • edited

glemaitre commented Jun 7, 2019

glemaitre commented Jun 11, 2019

bwang482 commented Jul 30, 2019 • edited

chkoar commented Jul 30, 2019

bwang482 commented Jul 30, 2019

glemaitre commented Jul 30, 2019 via email

bwang482 commented Jul 30, 2019

akatav commented Oct 30, 2019

glemaitre commented Nov 17, 2019

ogencoglu commented Jan 24, 2020

glevv commented Feb 10, 2021 • edited

glevv commented Apr 28, 2021 • edited

pavelkomarov commented Jul 12, 2021

chjq201410695 commented May 18, 2019 •

edited

bwang482 commented Jul 30, 2019 •

edited

glevv commented Feb 10, 2021 •

edited

glevv commented Apr 28, 2021 •

edited