Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task cbind breaks when task's backend has primary_key different to ..row_id #961

Open
sebffischer opened this issue Aug 31, 2023 · 3 comments

Comments

@sebffischer
Copy link
Sponsor Member

sebffischer commented Aug 31, 2023

library(mlr3verse)
#> Loading required package: mlr3
library(data.table)

d = data.table(
  x = factor(letters[1:10]),
  y = rnorm(10),
  my_key = 1:10
)

backend = as_data_backend(d, primary_key = "my_key")

task = as_task_regr(backend, target = "y")

learner = as_learner(ppl("robustify") %>>% lrn("regr.rpart"))

learner$train(task)
#> Error: All backends to rbind must have the primary_key 'my_key'
#> This happened PipeOp encode's $train()

Created on 2023-08-31 with reprex v2.0.2

@mb706
Copy link
Collaborator

mb706 commented Aug 31, 2023

probably an issue with Task$cbind()

@sebffischer sebffischer transferred this issue from mlr-org/mlr3pipelines Aug 31, 2023
@sebffischer
Copy link
Sponsor Member Author

When a data.frame is passed to Task$cbind as_data_backend.data.frame is called which automtically sets the primary key to ..row_id

@sebffischer sebffischer changed the title PipeOpFeatureUnion does not work when underlying backend has primary_key different to ..row_id Task cbind breaks when task's backend has primary_key different to ..row_id Aug 31, 2023
@sebffischer
Copy link
Sponsor Member Author

sebffischer commented Aug 31, 2023

we could handle both cases:

  1. A data.frame is passed to $cind() --> then we create the primary_key under the name of the existing primary_key
  2. A backend is passed to $cbind() --> then we can call DataBackendRename in case the primary key's don't match and the primary key of the task's backend is not a column name in the backend passed to $cbind().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants