Multiple output regression #2087

miguelmartin75 · 2017-03-08T04:45:07Z

How do I perform multiple output regression? Or is it simply not possible?

My current assumption is that I would have to modify the code-base such that XGMatrix supports a matrix as labels and that I would have to create a custom objective function.

My end goal would be to perform regression to output two variables (a point) and to optimise euclidean loss. Would I be better off to make two seperate models (one for x coordinates and one for y coordinates).

Or... would I be better off using a random forest regressor within sklearn or some other alternative algorithm?

khotilov · 2017-03-11T06:21:22Z

Multivariate/multilabel regression is not currently implemented #574 #680
Tianqi had added some relevant placeholder data structures to gbtree learner, but no one had time, I guess, to work the machinery out.

jindongwang · 2017-03-13T00:54:48Z

Pity, since many competitions are with multi-outputs

MarkusBonsch · 2017-05-10T17:54:08Z

This would be a really nice feature to have.

joel-thomas-wilson · 2018-09-07T04:28:11Z

Do we have any updates on this?

hcho3 · 2018-09-07T18:24:28Z

I'm adding this feature to the feature request tracker: #3439. Hopefully, we can get to it some point.

JacobKempster · 2018-11-06T17:20:32Z

I agree - this feature would be extremely valuable (exactly what I need right now...)

lenselinkbart · 2019-01-31T09:27:15Z

I also agree, while this is quite trivial to do in neural nets, it would be nice to also be able to do this in xgboost.

cp9612 · 2019-03-26T18:52:06Z

Would like to see this feature coming

veonua · 2019-04-15T08:10:10Z

any reason why it is closed?

hcho3 · 2019-04-15T08:21:38Z

@veonua See #3439.

loretoparisi · 2019-09-24T16:34:13Z

In the meanwhile there is any alternative, like any ensemble of single output models like:

# Fit a model and predict the lens values from the original features
model = XGBRegressor(n_estimators=2000, max_depth=20, learning_rate=0.01)
model = multioutput.MultiOutputRegressor(model)
model.fit(X_train, X_lens_train)
preds = model.predict(X_test)

from: https://gist.github.com/MLWave/4a3f8b0fee43d45646cf118bda4d202a

jimmywan · 2019-09-25T03:39:32Z

In the meanwhile there is any alternative, like any ensemble of single output models like:

https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputRegressor.html

cmottet · 2020-01-22T15:22:26Z

I am going to also weigh in and say that having such feature would be extremely handy. The MultiOutputRegressor mentioned above is a nice wrapper to build multiple models at once and it does work well for predicting target variables that are independent from one another. However, if the target variables are highly correlated, then you really want to build one model that predicts a vector.

MxNl · 2021-01-07T14:26:21Z

A year has passed soon since the last comment :-). This is why I want to repeat the wish to have such an interesting feature. I would be happy to see this. Thanks anyway for all your work.

hcho3 · 2021-01-21T12:54:08Z

Reopening for visibility.

kk26269 · 2021-02-04T14:18:20Z

Multivariate/multilabel regression is not currently implemented #574 #680
Tianqi had added some relevant placeholder data structures to gbtree learner, but no one had time, I guess, to work the machinery out.

Hello ,I have used the SckitLearn estimator and passed my script (.py)written for multioutput regression to the same and I could create endpoints.
I have reffered following repo.
https://github.com/qlanners/ml_deploy/tree/master/aws/scikit-learn/sklearn_estimators_locally.
Changes done are :

Y= dataset.iloc[:,-3:]
X = dataset.iloc[:,:-3]
X_train, X_test , Y_train , Y_test =train_test_split(X, Y , test_size = 0.20,random_state =100)

gbr = GradientBoostingRegressor()
modelMOR = MultiOutputRegressor(estimator=gbr)
modelMOR.fit(X_train, Y_train)

mirik123 · 2021-07-22T21:22:03Z

The MultiOutputRegressor is bad alternative because it doesn't update eval_set dataset together with the main train (X, y) dataset.

trivialfis · 2021-07-23T08:45:26Z

I would love to spend some time on this ...

loretoparisi · 2021-07-23T09:04:02Z

I would love to spend some time on this ...

I have used this approach and it seems to work fine

#2087 (comment)

StatMixedML · 2021-09-14T09:00:45Z

Is there any update on this? Can we make it a joint effort to have the multioutput regression available. Irrespective of the independence modelling of several responses/y-variables, it would be great to have the xgb.DMatrix accept a list or a np.array with shape >1.

jameslamb · 2022-10-30T18:20:59Z

To be honest, I also am not sure whether it's exactly the same. But it should be similar, right? Whether you are working on multiple tasks like "regression and classification" or multiple targets like "regression predicting y_1 and y_2", you still are in a situation like "find splits that balance gain across multiple loss functions".

To be honest, I haven't read the paper and am not planning to actively work on this (we have many other higher priorities in LightGBM right now).

StatMixedML · 2022-11-02T10:29:50Z

@jameslamb

To be honest, I haven't read the paper and am not planning to actively work on this (we have many other higher priorities in LightGBM right now).

Sure, I understand that. I am not sure I find the time either. So maybe let's pause this and see if the community picks it up.

trivialfis · 2022-11-30T13:37:39Z

Did a quick scan over a couple of papers. I don't have a good understanding of various algorithms yet, but vector leaf seems to be the essential component of all proposed methods. I will try to prioritize it and share a roadmap for a path forward.

StatMixedML · 2022-11-30T14:06:44Z

@trivialfis Ok nice! Looking forward

StatMixedML · 2022-12-16T09:25:16Z

@trivialfis This might be an interesting approach to incorporate into XGBoost

SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems

The paper says

Moreover, the proposed methods are easy to implement upon modern boosting frameworks such as XGBoost

You find the code here https://github.com/sb-ai-lab/Py-Boost

trivialfis · 2022-12-20T08:08:23Z

@StatMixedML Thank you for the references. Here's an early version of vector-leaf: #8616 No specific optimization yet.

StatMixedML · 2022-12-20T12:46:13Z

@trivialfis Very nice, I`ll have a look into it

trivialfis · 2022-12-20T13:39:24Z

I will cleanup the code in the coming days. There are some known issues that break the existing code, the only thing that's working is the demo at the moment. It's for discussion and far from ready.

StatMixedML · 2022-12-31T13:26:04Z

@trivialfis Sure, take your time. Let me know once I can use it.

Looking forward to it!

StatMixedML · 2023-02-05T09:42:49Z

@trivialfis I have seen that you created a PR for a first version of the multi-target tree. This is awesome!!

Let me know once I can test it. Would be great to run some examples and compare accuracy and runtime. Willing to volunteer on this!

lcrmorin · 2023-02-09T16:58:13Z

I am currently trying this. Should I expect any performance / memory gain over tuning multiple models ?

trivialfis · 2023-02-09T18:00:52Z

Hi @StatMixedML @lcrmorin, thank you for volunteering! The PR is not ready yet, I still need to figure out some parts of the parameter interface and do more tests. If you really want to try to code, the demo/guide-python/multioutput_regression.py would be a good starting place, See the rmse_model function and parameters used in there.

StatMixedML · 2023-02-10T07:39:58Z

@lcrmorin So the advantage of using multi-output models is that you don't have to train a separate model for each response variable. Also, as outlined in Multi-Target XGBoostLSS Regression, you can model dependencies between the different responses. What @trivialfis is currently working on is to speed-up the estimation using multi-target trees to better and efficiently scale to multiple response variables.

loretoparisi · 2023-02-10T08:15:43Z

Wow! Shared
https://twitter.com/loretoparisi/status/1623958275279519745?s=46&t=bN2GaGENwqbkg0nK-MY7tg

lcrmorin · 2023-02-15T10:02:16Z

So that would also help inference time, right ?

StatMixedML · 2023-02-15T14:34:27Z

@lcrmorin We would expect to see the highest efficiency gains during training time, especially for HPO / cross-validation.

trivialfis · 2023-03-22T15:54:20Z

The bare bone implementation is merged, please help test out. :-)
You can find reference for nightly build in xgboost's Python installation document. The computation performance is not yet optimized, please expect some quirks.

trivialfis · 2023-03-22T15:59:17Z

The link will be available once the CI passes

trivialfis · 2023-03-24T12:52:40Z

A bug fix PR for prediction along with a small optimization: #8968 .

lcrmorin · 2023-04-17T06:30:09Z

Just being curious; would this allows some missing / masked targets ? (I have multi-targets time-series applications in mind, where longer horizons targets are not available immediatly).

trivialfis · 2023-04-17T15:17:42Z

Not planned at the moment, the label is required to be dense. But I will mark that as a feature request and see if we can find a way to train boosting tree models with missing labels.

trivialfis · 2023-04-17T15:42:30Z

Hi all, thank you for joining the discussion and the helpful feedbacks! Let's continue the discussion in #9043 .

miaotianyi mentioned this issue May 26, 2017

Support multi-output regression/classification microsoft/LightGBM#524

Closed

jeremyhermann mentioned this issue Jul 17, 2017

custom loss functions #2522

Closed

pommedeterresautee mentioned this issue Aug 21, 2017

multilabel classification using xgboost #2454

Closed

tqchen closed this as completed Jul 4, 2018

hcho3 mentioned this issue Sep 7, 2018

Roadmap: feature requests #3439

Open

32 tasks

hcho3 added the feature-request label Oct 23, 2018

xuyxu mentioned this issue May 30, 2020

Fitting Linear Functions inside Tree leaves (Feature Request) #5725

Open

hcho3 reopened this Jan 21, 2021

tczhao mentioned this issue Feb 10, 2021

feat(regressor): add regressor interface LAMDA-NJU/Deep-Forest#25

Merged

trivialfis self-assigned this Sep 14, 2021

trivialfis mentioned this issue Jan 6, 2023

Initial support for multi-target tree. #8616

Merged

trivialfis moved this from Need prioritize to 2.0 In Progress in 2.0 Roadmap Mar 17, 2023

trivialfis moved this from 2.0 In Progress to 2.0 Done in 2.0 Roadmap Mar 22, 2023

hcho3 pinned this issue Mar 29, 2023

trivialfis unpinned this issue Mar 30, 2023

trivialfis mentioned this issue Apr 17, 2023

[Roadmap] Multiple outputs. #9043

Open

39 tasks

trivialfis closed this as completed Apr 17, 2023

yc2984 mentioned this issue Jul 14, 2023

add support for multi-output prediction ray-project/xgboost_ray#286

Open

Multiple output regression #2087

Multiple output regression #2087

Comments

miguelmartin75 commented Mar 8, 2017 • edited

khotilov commented Mar 11, 2017

jindongwang commented Mar 13, 2017

MarkusBonsch commented May 10, 2017

joel-thomas-wilson commented Sep 7, 2018

hcho3 commented Sep 7, 2018

JacobKempster commented Nov 6, 2018

lenselinkbart commented Jan 31, 2019

cp9612 commented Mar 26, 2019

veonua commented Apr 15, 2019

hcho3 commented Apr 15, 2019

loretoparisi commented Sep 24, 2019 • edited

jimmywan commented Sep 25, 2019

cmottet commented Jan 22, 2020 • edited

MxNl commented Jan 7, 2021

hcho3 commented Jan 21, 2021

kk26269 commented Feb 4, 2021

mirik123 commented Jul 22, 2021 • edited

trivialfis commented Jul 23, 2021

loretoparisi commented Jul 23, 2021

StatMixedML commented Sep 14, 2021

jameslamb commented Oct 30, 2022

StatMixedML commented Nov 2, 2022

trivialfis commented Nov 30, 2022

StatMixedML commented Nov 30, 2022

StatMixedML commented Dec 16, 2022

trivialfis commented Dec 20, 2022

StatMixedML commented Dec 20, 2022

trivialfis commented Dec 20, 2022

StatMixedML commented Dec 31, 2022

StatMixedML commented Feb 5, 2023

lcrmorin commented Feb 9, 2023

trivialfis commented Feb 9, 2023

StatMixedML commented Feb 10, 2023 • edited

loretoparisi commented Feb 10, 2023

lcrmorin commented Feb 15, 2023

StatMixedML commented Feb 15, 2023

trivialfis commented Mar 22, 2023

trivialfis commented Mar 22, 2023

trivialfis commented Mar 24, 2023

lcrmorin commented Apr 17, 2023

trivialfis commented Apr 17, 2023

trivialfis commented Apr 17, 2023

miguelmartin75 commented Mar 8, 2017 •

edited

loretoparisi commented Sep 24, 2019 •

edited

cmottet commented Jan 22, 2020 •

edited

mirik123 commented Jul 22, 2021 •

edited

StatMixedML commented Feb 10, 2023 •

edited