Small data support for ensembles: use out-of-sample preds to train metalearner #1898

dsherry · 2021-02-25T19:58:18Z

Problem
In the case when we don't have many samples (<1k total) provided to automl, we should adjust our splitting strategy to maximize our usage of what data we have available.

Background
#1746 tracks creating a separate 20% split for stacked ensemblers.

Proposal
If our data is "small", we should instead do the following:

Do not create a separate split for stacked ensemblers
Perform normal CV during pipeline training, as we do now
When training stacked ensemble method, use the complete set of each pipelines' predictions on each validation split during CV to train the metalearner.

We still need to resolve: what data do we use to validate the ensembler?

dsherry added enhancement An improvement to an existing feature. performance Issues tracking performance improvements. labels Feb 25, 2021

dsherry mentioned this issue Feb 25, 2021

AutoML: use separate CV split for ensembling #1746

Closed

dsherry closed this as completed Mar 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small data support for ensembles: use out-of-sample preds to train metalearner #1898

Small data support for ensembles: use out-of-sample preds to train metalearner #1898

dsherry commented Feb 25, 2021

Small data support for ensembles: use out-of-sample preds to train metalearner #1898

Small data support for ensembles: use out-of-sample preds to train metalearner #1898

Comments

dsherry commented Feb 25, 2021