Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imputation of dataframe with order factors fails #16

Open
sibipx opened this issue May 19, 2022 · 0 comments
Open

Imputation of dataframe with order factors fails #16

sibipx opened this issue May 19, 2022 · 0 comments

Comments

@sibipx
Copy link

sibipx commented May 19, 2022

Imputing a dataframe with ordered factors gives error. See example below on diamonds dataset from ggplot2.

I am not sure, but the problem seems to be when checking classes. It seems that regression models are assigned to ordered factors (they are not seen as factor)

  newClasses <- sapply(dat[, vara, with = FALSE], class)
  modelTypes <- ifelse(newClasses[varn] == "factor", "Classification", 
    "Regression")

It would be more sensible to treat ordered factors as factors (multinomial). Thanks!

Example:

> library(miceRanger)
> library(ggplot2)
> 
> data(diamonds)
> 
> diamonds_miss <- amputeData(diamonds, perc = 0.3)
> 
> str(diamonds_miss)
Classes ‘data.table’ and 'data.frame':	53940 obs. of  10 variables:
 $ carat  : num  0.23 NA 0.23 0.29 NA 0.24 0.24 NA NA 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 NA 2 NA NA 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 NA NA 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 NA 5 NA 2 6 7 3 4 5 ...
 $ depth  : num  61.5 59.8 NA NA 63.3 62.8 NA 61.9 NA 59.4 ...
 $ table  : num  55 61 NA 58 NA 57 57 55 61 61 ...
 $ price  : int  326 326 327 334 335 336 336 337 337 NA ...
 $ x      : num  3.95 NA NA 4.2 4.34 NA NA 4.07 3.87 NA ...
 $ y      : num  3.98 3.84 4.07 4.23 NA 3.96 NA 4.11 NA 4.05 ...
 $ z      : num  NA 2.31 NA 2.63 2.75 2.48 2.47 2.53 2.49 NA ...
 - attr(*, ".internal.selfref")=<externalptr> 
> 
> is.factor(diamonds_miss$cut)
[1] TRUE
> class(diamonds_miss$cut)
[1] "ordered" "factor" 
> miceRanger::miceRanger(diamonds_miss, m = 2, maxiter = 2,
+                        returnModels = TRUE,
+                        verbose = TRUE)

Process started at 2022-05-19 17:39:38 
data.table 1.14.0 using 6 threads (see ?getDTthreads).  Latest news: r-datatable.com

dataset 1 
iteration 1 	 | carat | cut
dataset 2 
iteration 1 	 | carat | cutError in miceRanger::miceRanger(diamonds_miss, m = 2, maxiter = 2, returnModels = TRUE,  : 
  Evaluation failed with error <Error in get.knnx(data, query, k, algorithm): Data non-numeric
>. This is probably our fault - please open an issue at https://github.com/FarrellDay/miceRanger/issues with a reproduceable example.
> miceRanger::miceRanger(data.table(diamonds_miss), m = 2, maxiter = 2,
+                        returnModels = TRUE,
+                        verbose = TRUE)

Process started at 2022-05-19 17:41:29 

dataset 1 
iteration 1 	 | carat | cut
dataset 2 
iteration 1 	 | carat | cutError in miceRanger::miceRanger(data.table(diamonds_miss), m = 2, maxiter = 2,  : 
  Evaluation failed with error <Error in get.knnx(data, query, k, algorithm): Data non-numeric
>. This is probably our fault - please open an issue at https://github.com/FarrellDay/miceRanger/issues with a reproduceable example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant