Research : Boosted-TabNet? #124

Optimox · 2020-06-04T08:59:10Z

Main Remark

Tabnet architecture is using sequential steps in order to mimic some kind of random forest paradigm.
But since boosting algorithms often outperform random forests shouldn't we try to move towards boosting methods instead of random forest?

Proposed Solutions

One solution I see here would be to predict different things at each step of the tabnet to perform boosting:

first step would remain as now
second step would try to predict the residuals (i.e the difference between the actual target and the first step predictions)
next step would try to predict residuals as well (i.e the difference between the actual target and the sum of previous steps predictions)

This looks like it could work quite easily for regression problems but I'm not sure how it could work for classification tasks, you can't stay in the classification paradigm and try to predict residuals. If anyone knows about a specific loss function that would make that happen I think it's worth a try!

If you feel like this is interesting and would like to contribute, please share your ideas in comments or open a PR!

AlexisMignon · 2020-06-09T11:48:39Z

Why not try a mere application of gradient boosting ? Each step fits the gradient of the loss function (as computed so far) and adds it (using line search) to the previous result. Only regression is needed internally (to fit the gradient) and it allows for regression and classification.

rasenganai · 2020-06-09T12:46:33Z

Interesting ,
For classification , i think we can try the same way as gradient boosting algorithm and or adaboost as mentioned in ther paper using cross_entropy loss function ?

In case of Gradient Boostin technique the output of each step will be multiplied by a Learning rate and will be sum to get the log odd on which we can apply sigmoid to get probability (0/1)?

In case of adaboost we can maybe use the same weightage formula as mentioned in the paper.

interesting would be to some how use MASK weights to give "IMPORTANCE WEIGHT" to each step to contribute to the final prediction as MASK heatmap shows us that some MASK weights are not that activated as others , It may improve decision making.

Different would be the training as in case of boosted algorithms they trin one tree then use it in boosting but here all the weak learners would be learning simultaneously.

I would like to do some research and contribution to this.

#Abhishek-eBook

Optimox · 2020-06-09T12:47:20Z

@AlexisMignon approaching classification problems with regression could be a solution but I feel like it's not satisfying and especially for multi class classification...

@JaskaranSingh-Precily tabnet is using cross entropy already, but you need to have integers as targets to apply cross entropy, so I don't see how a boosted version could use cross entropy at every step. Could you explain and/or give some links to literature? I probably just need to dig a bit deeper on how XGBoost deals with multi class classification.

@Jaskaran170599 Not sure you'll double your chance of winning Abhishek's ebook that way to be honnest! :)

rasenganai · 2020-06-09T12:50:37Z

@Optimox actually commented with the company account that was not my personal account

rasenganai · 2020-06-09T13:08:04Z

@Optimox I think the problem here is how to train the weak learners .

As in boosted trees this is done by gini index (for training a weak tree) etc. And the cross entropy was used on whole algorithm that is to find out residuals and another tree is trained on those residuals.

But here each step requires some gradients to train not as in tree (gini index only).

A solution could be to train each step on cross-entropy (for 1 or 2 epochs or gradient steps (weak learners)) predicitng classes probab and using that probab calculating the residuals for the next step to train using cross entropy in a same way and so on ?

AlexisMignon · 2020-06-09T14:23:20Z

@Optimox
You may want to have a look at the Friedman paper about Gradient Boosting :
https://statweb.stanford.edu/~jhf/ftp/trebst.pdf
You'll see what I meant by using regressors only.

@Jaskaran170599

In case of Gradient Boostin technique the output of each step will be multiplied by a Learning rate and will be sum to get the log odd on which we can apply sigmoid to get probability (0/1)?

Exactly. The idea is to fit the decision function before the sigmoid is applied. And compute the gradient with respect to this decision function values. So at each step, the weak learner is trained to fit the gradient (hence the regressor), the result is added (with a weight) to the the previous decision function. And class probability can be computed by applying the sigmoid function for binary problems or softmax for multi-class problems.

rasenganai · 2020-06-09T18:10:01Z

@AlexisMignon Yeah and i think here in tabnet case that weak learner is one block of the architecture and the main task that is different than Boosting algos is to train that block .

bibhabasumohapatra · 2021-12-29T16:03:23Z

https://github.com/tusharsarkar3/XBNet

Optimox · 2021-12-29T17:10:53Z

Thanks @bibhabasumohapatra, looks promising. Is there a research paper related to the repo?

bibhabasumohapatra · 2021-12-29T17:15:42Z

Thanks @bibhabasumohapatra, looks promising. Is there a research paper related to the repo?

Yes.

bibhabasumohapatra · 2021-12-29T17:16:37Z

Thanks @bibhabasumohapatra, looks promising. Is there a research paper related to the repo?

Yes.
https://arxiv.org/abs/2106.05239

ShuyangenFrance · 2022-05-04T16:30:03Z

https://github.com/tusharsarkar3/XBNet

This is a good job, but rather a completely design from my point of view.

Optimox added the enhancement New feature or request label Jun 4, 2020

Optimox assigned Hartorn, Optimox, j-abi and eduardocarvp Jun 4, 2020

Optimox changed the title ~~Boosted-TabNet?~~ Research : Boosted-TabNet? Jun 4, 2020

Optimox unassigned Hartorn, Optimox, j-abi and eduardocarvp Jun 4, 2020

Optimox added the Research Research Ideas to improve architecture label Jun 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research : Boosted-TabNet? #124

Research : Boosted-TabNet? #124

Optimox commented Jun 4, 2020

AlexisMignon commented Jun 9, 2020

rasenganai commented Jun 9, 2020

Optimox commented Jun 9, 2020 •

edited

rasenganai commented Jun 9, 2020

rasenganai commented Jun 9, 2020 •

edited

AlexisMignon commented Jun 9, 2020

rasenganai commented Jun 9, 2020 •

edited

bibhabasumohapatra commented Dec 29, 2021

Optimox commented Dec 29, 2021

bibhabasumohapatra commented Dec 29, 2021

bibhabasumohapatra commented Dec 29, 2021

ShuyangenFrance commented May 4, 2022

Research : Boosted-TabNet? #124

Research : Boosted-TabNet? #124

Comments

Optimox commented Jun 4, 2020

Main Remark

Proposed Solutions

AlexisMignon commented Jun 9, 2020

rasenganai commented Jun 9, 2020

Optimox commented Jun 9, 2020 • edited

rasenganai commented Jun 9, 2020

rasenganai commented Jun 9, 2020 • edited

AlexisMignon commented Jun 9, 2020

rasenganai commented Jun 9, 2020 • edited

bibhabasumohapatra commented Dec 29, 2021

Optimox commented Dec 29, 2021

bibhabasumohapatra commented Dec 29, 2021

bibhabasumohapatra commented Dec 29, 2021

ShuyangenFrance commented May 4, 2022

Optimox commented Jun 9, 2020 •

edited

rasenganai commented Jun 9, 2020 •

edited

rasenganai commented Jun 9, 2020 •

edited