Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

augment error with na.action = na.exclude in lm #1187

Open
wbvguo opened this issue Jan 12, 2024 · 1 comment
Open

augment error with na.action = na.exclude in lm #1187

wbvguo opened this issue Jan 12, 2024 · 1 comment
Labels
bug an unexpected problem or unintended behavior

Comments

@wbvguo
Copy link

wbvguo commented Jan 12, 2024

Dear broom maintainer,

the problem

I was runnning lm on a dataset with NA values, and found augment doesn't work with na.action = na.exclude

code

df <- data.frame(
  id = 1:10,
  x = rnorm(10),
  y = rnorm(10)
)

df$x[5] = NA

broom::augment(lm(y~x, data = df, na.action = na.exclude))

output

> Error in `$<-`:
! Assigned data `predict(x, na.action = na.pass, ...) %>% unname()` must be compatible with existing data.
✖ Existing data has 9 rows.
✖ Assigned data has 10 rows.
ℹ Only vectors of size 1 are recycled.
Caused by error in `vectbl_recycle_rhs_rows()`:
! Can't recycle input of size 10 to size 9.
Run `rlang::last_trace()` to see where the error occurred.

remove the na.action = na.exclude option will work. Actually, the following z1 and z2

z1 = lm(y~x, data = df, na.action = na.exclude)
z2 = lm(y~x, data = df) # the default na.action is na.omit

have the same model, coefficients, residuals components, making me really wonder how the na.exclude and na.omit will influence augment's behavior

I'm not entirely sure if the issue we saw above originating from the augment function or the lm function. I would greatly appreciate any insights or guidance you could offer on this matter. Thank you in advance for your assistance.

Thanks!

sessioninfo

> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] purrr_1.0.2   broom_1.0.5   tidyr_1.3.0   dplyr_1.1.3   furrr_0.3.1   future_1.33.0

loaded via a namespace (and not attached):
 [1] parallelly_1.36.0 rstudioapi_0.15.0 knitr_1.44        magrittr_2.0.3    tidyselect_1.2.0  R6_2.5.1          rlang_1.1.1       fansi_1.0.5      
 [9] globals_0.16.2    tools_4.2.3       parallel_4.2.3    xfun_0.40         utf8_1.2.4        cli_3.6.1         digest_0.6.33     tibble_3.2.1     
[17] lifecycle_1.0.3   vctrs_0.6.4       codetools_0.2-19  glue_1.6.2        compiler_4.2.3    pillar_1.9.0      generics_0.1.3    backports_1.4.1  
[25] listenv_0.9.0     pkgconfig_2.0.3 
@simonpcouch
Copy link
Collaborator

Thanks for the issue, @wbvguo!

You may find the documentation helpful here:

When the modeling was performed with na.action = "na.exclude", one should provide the original data as a second argument, at which point the augmented data will contain those rows (typically with NAs in place of the new columns).

As in:

library(broom)

df <- data.frame(
  id = 1:10,
  x = rnorm(10),
  y = rnorm(10)
)

df$x[5] = NA

m <- lm(y~x, data = df, na.action = na.exclude)
augment(m, df)
#> # A tibble: 10 × 9
#>       id      x      y .fitted .resid  .hat .sigma  .cooksd .std.resid
#>    <int>  <dbl>  <dbl>   <dbl>  <dbl> <dbl>  <dbl>    <dbl>      <dbl>
#>  1     1  0.593  0.278 -0.480   0.758 0.269  0.629  0.321        1.32 
#>  2     2  0.185 -0.639 -0.316  -0.323 0.168  0.712  0.0281      -0.527
#>  3     3 -0.830 -0.224  0.0929 -0.316 0.135  0.713  0.0200      -0.506
#>  4     4 -1.86   1.40   0.509   0.896 0.421  0.545  1.11         1.75 
#>  5     5 NA     -0.474 NA      NA     0      0.672 NA           NA    
#>  6     6 -0.687  0.583  0.0355  0.547 0.121  0.686  0.0519       0.868
#>  7     7 -0.680 -0.206  0.0324 -0.238 0.120  0.719  0.00977     -0.378
#>  8     8 -1.53  -0.596  0.375  -0.971 0.294  0.552  0.614       -1.72 
#>  9     9  0.605 -0.993 -0.485  -0.509 0.273  0.684  0.148       -0.887
#> 10    10  0.332 -0.219 -0.375   0.156 0.199  0.723  0.00834      0.259

Created on 2024-01-22 with reprex v2.1.0

Looks like we fail to raise an informative warning here, as is documented. Will make a note to look into this. :)

@simonpcouch simonpcouch added the bug an unexpected problem or unintended behavior label Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants