Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_delim fails on non utf 8 charset when delim is NULL with R 4.3.1 #1508

Open
nbc opened this issue Aug 17, 2023 · 1 comment
Open

read_delim fails on non utf 8 charset when delim is NULL with R 4.3.1 #1508

nbc opened this issue Aug 17, 2023 · 1 comment

Comments

@nbc
Copy link

nbc commented Aug 17, 2023

When used on https://raw.githubusercontent.com/tidyverse/readr/main/tests/testthat/enc-iso-8859-1.txt with delim NULL, read_delim should fails with error :

Error: Could not guess the delimiter.

It works perfectly with R 4.2 but on R 4.3.1 it fails with error :

Error in gsub("\"[^\"]*\"", "", lines) : input string 1 is invalid
In addition: Warning message:
In gsub("\"[^\"]*\"", "", lines) :
  unable to translate 'fran<e7>ais' to a wide string

Complete reprex :

library(readr)

readr::read_delim(
  "https://raw.githubusercontent.com/tidyverse/readr/main/tests/testthat/enc-iso-8859-1.txt",
  delim = NULL,
  locale = readr::locale(encoding = "ISO-8859-1")
)

This is my sessionInfo() :

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-serial/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-serial/libopenblas-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Paris
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] didoscalim_0.1.3.9000 testthat_3.1.10       devtools_2.4.5        usethis_2.2.2        

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.2 remotes_2.4.2.1   processx_3.8.2    callr_3.7.3       tzdb_0.4.0        vctrs_0.6.3       tools_4.3.1       ps_1.7.5          generics_0.1.3   
[10] curl_5.0.2        parallel_4.3.1    tibble_3.2.1      fansi_1.0.4       pkgconfig_2.0.3   desc_1.4.2        lifecycle_1.0.3   compiler_4.3.1    stringr_1.5.0    
[19] brio_1.1.3        progress_1.2.2    httpuv_1.6.11     htmltools_0.5.6   later_1.3.1       pillar_1.9.0      crayon_1.5.2      urlchecker_1.0.1  tidyr_1.3.0      
[28] ellipsis_0.3.2    cachem_1.0.8      sessioninfo_1.2.2 mime_0.12         tidyselect_1.2.0  digest_0.6.33     stringi_1.7.12    dplyr_1.1.2       diffobj_0.3.5    
[37] purrr_1.0.2       rematch2_2.1.2    rprojroot_2.0.3   fastmap_1.1.1     cli_3.6.1         magrittr_2.0.3    pkgbuild_1.4.2    utf8_1.2.3        readr_2.1.4      
[46] withr_2.5.0       prettyunits_1.1.1 waldo_0.5.1       promises_1.2.1    bit64_4.0.5       lubridate_1.9.2   timechange_0.2.0  httr_1.4.6        bit_4.0.5        
[55] hms_1.1.3         memoise_2.0.1     shiny_1.7.5       miniUI_0.1.1.1    profvis_0.3.8     rlang_1.1.1       Rcpp_1.0.11       xtable_1.8-4      glue_1.6.2       
[64] pkgload_1.3.2.1   rstudioapi_0.15.0 vroom_1.6.3       jsonlite_1.8.7    R6_2.5.1          fs_1.6.3         
@ramiromagno
Copy link

Same here:

library(readr)

readr::read_delim(
  "https://raw.githubusercontent.com/tidyverse/readr/main/tests/testthat/enc-iso-8859-1.txt",
  delim = NULL,
  locale = readr::locale(encoding = "ISO-8859-1")
)
#> Warning in gsub("\"[^\"]*\"", "", lines): unable to translate 'fran<e7>ais' to
#> a wide string
#> Error in gsub("\"[^\"]*\"", "", lines): input string 1 is invalid

sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Arch Linux
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/libblas.so.3.11.0 
#> LAPACK: /usr/lib/liblapack.so.3.11.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Lisbon
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] readr_2.1.4
#> 
#> loaded via a namespace (and not attached):
#>  [1] crayon_1.5.2      vctrs_0.6.3       cli_3.6.1         knitr_1.44       
#>  [5] rlang_1.1.1       xfun_0.40         purrr_1.0.2       styler_1.10.2    
#>  [9] bit_4.0.5         glue_1.6.2        htmltools_0.5.6   hms_1.1.3        
#> [13] fansi_1.0.4       rmarkdown_2.25    R.cache_0.16.0    evaluate_0.21    
#> [17] tibble_3.2.1      tzdb_0.4.0        fastmap_1.1.1     yaml_2.3.7       
#> [21] lifecycle_1.0.3   compiler_4.3.1    fs_1.6.3          pkgconfig_2.0.3  
#> [25] rstudioapi_0.15.0 R.oo_1.25.0       R.utils_2.12.2    digest_0.6.33    
#> [29] R6_2.5.1          tidyselect_1.2.0  utf8_1.2.3        reprex_2.0.2     
#> [33] curl_5.0.2        parallel_4.3.1    vroom_1.6.3       pillar_1.9.0     
#> [37] magrittr_2.0.3    R.methodsS3_1.8.2 bit64_4.0.5       tools_4.3.1      
#> [41] withr_2.5.1

Created on 2023-09-29 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants