Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Check Excel sheets and consistency with readr #729

Open
olivroy opened this issue Apr 14, 2023 · 0 comments
Open

FR: Check Excel sheets and consistency with readr #729

olivroy opened this issue Apr 14, 2023 · 0 comments

Comments

@olivroy
Copy link

olivroy commented Apr 14, 2023

Hi,

I was wondering if it would be possible to add a check for Excel sheets?

When I miswrite the Excel sheet, I have to go investigate the Excel file, but I wrote a small helper to help for that with rlang::arg_match() that gives helpful error message.

Also, would it be possible to rename the path argument to file, just like in readr, and .name_repair to name_repair since all other arguments in read_excel() don't start with ..

Thanks

library(readxl)

read_excel(readxl_example("datasets.xlsx"), sheet = "iris2")
#> Error: Sheet 'iris2' not found


# Helper with consistency with readr
read_excel_check <- function(file, sheet = NULL, name_repair = "unique") {
  if (!is.null(sheet) && rlang::is_string(sheet)) {
      all_sheets <- readxl::excel_sheets(path = path)
      rlang::arg_match(sheet, all_sheets)
    }

  read_excel(
    path = file,
    sheet = sheet,
    .name_repair = name_repair
  )
}
read_excel_check(
  readxl_example("datasets.xlsx"),
  sheet = "iris2",
  name_repair = "universal"
  )
#> Error in `read_excel_check()`:
#> ! `sheet` must be one of "iris", "mtcars", "chickwts", or "quakes", not
#>   "iris2".
#> ℹ Did you mean "iris"?

Created on 2023-04-14 with reprex v2.0.2

Edit: This addition is costly in terms of performance. For a small datasets, it increases computing time by ~30%, but it reduces when the data is larger, and only check is sheet is not NULL

bench::mark(
  x1 = read_excel(readxl_example("datasets.xlsx"), sheet = "iris"),
  x2 = read_excel_check(readxl_example("datasets.xlsx"), sheet = "iris")
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x1           81.3ms   86.1ms     11.3     1.01MB     2.27
#> 2 x2          110.1ms  112.1ms      8.53  227.62KB     0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant