Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend PipeOpLearnerCV for other resamplings #500

Open
pfistfl opened this issue Sep 9, 2020 · 4 comments · May be fixed by #513
Open

Extend PipeOpLearnerCV for other resamplings #500

pfistfl opened this issue Sep 9, 2020 · 4 comments · May be fixed by #513
Assignees

Comments

@pfistfl
Copy link
Sponsor Member

pfistfl commented Sep 9, 2020

levels = c("cv", "insample") is currently very limited and not flexibly extensible i.e. for mlr3spatiotmpcv resamplings.
Might need some mechanism to allow a subset of mlr_resamplings

@mb706
Copy link
Collaborator

mb706 commented Sep 11, 2020

problem to anticipate here is that not all resamplings create a prediction for every input row (#216 will get relevant here), and some may create multiple predictios (does mean averaging usually make sense?)

@pfistfl
Copy link
Sponsor Member Author

pfistfl commented Sep 11, 2020

This is rather urgent though, as the current version blocks e.g. mlr3spatiotempcv resampling and makes the PO non-extensible which popped up in at least one project.
We could perhaps allow all mlr_resamplings and just document that this might break?

In general, another possibility would be to

  • in case of e.g. holdout, just fill up the rest of the predictions with ǸA`
  • in case multiple predictions exist just create additional columns as needed.
    the current return format is a data.table anyway.

@mb706
Copy link
Collaborator

mb706 commented Sep 11, 2020

filling with NA is probably a good idea. I don't like multiple columns because the number of output columns must be the same in train() and predict. I guess if it's supposed to be quick we can just mean(). It would be nice though to do something sensible with se and prob. (But I guess we haven't solved that problem for PipeOpRegrAvg / PipeOpClassifAvg either.)

@pfistfl
Copy link
Sponsor Member Author

pfistfl commented Sep 14, 2020

The number of output cols: We can just store the number of outputs in train and enforce the same length in test.
Aggregating using mean or m̀ode` might also work (perhaps this could be an option),

@sumny sumny self-assigned this Sep 30, 2020
@sumny sumny linked a pull request Oct 1, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants