Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"readxl Workflows" vignette has a minor issue when checking for equality #702

Open
mafw opened this issue Jul 28, 2022 · 0 comments
Open

Comments

@mafw
Copy link

mafw commented Jul 28, 2022

The check that ensures equality between the original Excel spreadsheet and the CSV snapshot fails in the vignette("readxl-workflows") article. The check can be found here in the R markdown file.

My guess is that it's because the object classes are different and the problems attribute that readr seems to be adding.

library(tidyverse)
library(readxl)

iris_xl <- readxl_example("datasets.xlsx") %>% 
  read_excel(sheet = "iris") %>% 
  write_csv("iris-raw.csv")

iris_xl
dir(pattern = "iris")

iris_alt <- read_csv("iris-raw.csv")
## readr leaves a note-to-self in `spec` that records its column guessing,
## so we remove that attribute before the check
attr(iris_alt, "spec") <- NULL
identical(iris_xl, iris_alt)

What would be the "right" way to make the call to base::identical() return TRUE? Calling as_tibble() on the iris_alt object before comparing them works, but maybe this is a bad idea?

identical(iris_xl, as_tibble(iris_alt))

Alternatively, I can follow the approach in the vignette by first removing the problems attribute. Then I can set the class of the iris_alt object equal to that of iris_xl.

attr(iris_alt, "problems") <- NULL
attr(iris_alt, "class") <- class(iris_xl)
identical(iris_xl, iris_alt)

In my view an easy solution is to call base::all.equal() with check.attributes = FALSE. Even easier is to use (the now deprecated) dplyr::all_equal().

all.equal(iris_xl, iris_alt, check.attributes = FALSE)
all_equal(iris_alt, iris_xl)

The question is if it's important that the check fails if the objects have differing attributes? There might exist scenarios where differing attributes between the original Excel spreadsheet and the CSV file can cause problems down the road?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant