Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting 'number stored as text' error in mixed columns #93

Open
cat-hand opened this issue Apr 19, 2023 · 4 comments
Open

Getting 'number stored as text' error in mixed columns #93

cat-hand opened this issue Apr 19, 2023 · 4 comments
Labels
bug Something isn't working priority: high Important or urgent
Milestone

Comments

@cat-hand
Copy link

I'm exporting rounded and suppressed numbers and am getting the 'number stored as text' error in all of my numeric columns, even for columns where there are no rounded or suppressed numbers. I tried using numFmt = "#,##0" in the openxlsx createStyle function and it hasn't helped.

It looks like it might be an openxlsx issue and seems like it's solved in openxlsx2.

Can anyone help?

@matt-dray
Copy link
Collaborator

Hi Cat, thanks for highlighting this. Are you able to provide a reproducible example of the issue?

I've created a small example below that generates a workbook with a table (in sheet 'Table_1') that has two columns: one column (mpg) that's numeric but has a suppressed value ('[c]') and one column (cyl) that is fully numeric. In the attached screenshot, you can see (as expected) the 'number stored as text' warning is in column mpg, but there's no warning in cyl. Do you get the same result?

(Sidenote: as it happens, I may be converting {a11ytables} to use {openxlsx2} instead of {openxlsx} in future (see #70). This issue may be another reason to make the change.)

# Prepare supporting tables
support_tables <- list(
  cover_df = tibble::tribble(
    ~subsection_title, ~subsection_content,
    "Purpose", "Example results for something.",
    "Workbook properties", "Some placeholder information.",
    "Contact", "Placeholder email"
  ),
  contents_df = tibble::tribble(
    ~"Sheet name", ~"Sheet title",
    "Notes", "Notes",
    "Table_1", "Example sheet title"
  ),
  notes_df = tibble::tribble(
    ~"Note number", ~"Note text",
    "[note 1]", "Placeholder note.",
    "[note 2]", "Placeholder note."
  )
)

# Prepare example dataset
table_df <- mtcars
table_df[["car [note 1]"]] <- row.names(mtcars)  # row names to named column
row.names(table_df) <- NULL  # remove row names
table_df <- table_df[1:5, c("car [note 1]", "mpg", "cyl")]  # add note to header
table_df[2, "mpg"] <- "[c]"  # make a value confidential in the mpg column
table_df["Notes"] <- c("[note 2]", rep(NA_character_, 4))  # include note column

# Create new a11ytable
my_a11ytable <-
  a11ytables::create_a11ytable(
    tab_titles = c(
      "Cover",
      "Contents",
      "Notes",
      "Table_1"
    ),
    sheet_types = c(
      "cover",
      "contents",
      "notes",
      "tables"
    ),
    sheet_titles = c(
      "Cover title (example)",
      "Contents",
      "Notes",
      "Example sheet title"
    ),
    blank_cells = c(
      NA_character_,
      NA_character_,
      NA_character_,
      "Blank cells mean that a row does not have a note."
    ),
    sources = c(
      NA_character_,
      NA_character_,
      NA_character_,
      "Example source."
    ),
    tables = list(
      support_tables$cover_df,
      support_tables$contents_df,
      support_tables$notes_df,
      table_df
    )
  )

# Generate workbook from a11ytable
my_wb <- a11ytables::generate_workbook(my_a11ytable)

# Preview output
openxlsx::openXL(my_wb)

image

@cat-hand
Copy link
Author

Thanks for the quick reply Matt! I think the problem is that all my 'numeric' columns are actually text by the time I create my a11ytable - I have a function which rounds and suppresses all of the figures, so they are all stored as strings after using that function. I read that openxlx bases the excel column type on the column type in the input data, so I'm assuming that this is what is causing the error.

Was hoping there was some kind of workaround but I guess not haha. I have loads of tables in my publication so getting rid of the error manually isn't awesome. openxlsx2 sounds very promising though!

For info, this is essentially how that function works:

library(dplyr)
table_df <- mtcars %>%
  mutate(across(where(is.numeric), ~case_when(
    # in this example, figures less than 5 are suppressed
    .x < 5 ~ "[c]",
    # figures less than 6 are 'unreliable' so are shown but marked with 'u'
    .x < 6 ~ paste(janitor::round_half_up(.x, 1), "[u]"),
    TRUE ~ as.character(janitor::round_half_up(.x, 0)))))

@matt-dray
Copy link
Collaborator

Okey dokey, I think I understand: columns get converted to character class even if they remain unsuppressed. I think there's a few ways to convert numbers-only columns back to numeric en masse. Below is some demo code that uses type.convert() to do this (see how numeric column y gets converted to character in table_supp, but then ends up as numeric at the end). Let me know if I haven't quite understood.

suppressPackageStartupMessages(library(tidyverse))
library(janitor, warn.conflicts = FALSE)

set.seed(1337)

tbl <- tibble(
  id = paste0("id_", 1:10),
  x  = runif(10, 0, 10),  # will contain values to be suppressed
  y  = runif(10, 6, 10)   # won't need any suppression
)

tbl
#> # A tibble: 10 × 3
#>    id        x     y
#>    <chr> <dbl> <dbl>
#>  1 id_1  5.76   9.92
#>  2 id_2  5.65   9.97
#>  3 id_3  0.740  9.31
#>  4 id_4  4.54   6.78
#>  5 id_5  3.73   9.93
#>  6 id_6  3.31   6.10
#>  7 id_7  9.48   9.89
#>  8 id_8  2.81   9.70
#>  9 id_9  2.45   7.36
#> 10 id_10 1.46   6.99

tbl_supp <- tbl |> 
  mutate(
    across(
      where(is.numeric),
      \(value) case_when(
        value < 5 ~ "[c]",                                  # suppress
        value < 6 ~ paste(round_half_up(value, 1), "[u]"),  # 'unreliable' flag
        TRUE      ~ as.character(round_half_up(value, 0))   # otherwise round
      )
    )
  )

tbl_supp  # note 'y' is character but contains only numbers
#> # A tibble: 10 × 3
#>    id    x       y    
#>    <chr> <chr>   <chr>
#>  1 id_1  5.8 [u] 10   
#>  2 id_2  5.6 [u] 10   
#>  3 id_3  [c]     9    
#>  4 id_4  [c]     7    
#>  5 id_5  [c]     10   
#>  6 id_6  [c]     6    
#>  7 id_7  9       10   
#>  8 id_8  [c]     10   
#>  9 id_9  [c]     7    
#> 10 id_10 [c]     7

type.convert(tbl_supp, as.is = TRUE)  # note 'y' is now numeric
#> # A tibble: 10 × 3
#>    id    x           y
#>    <chr> <chr>   <int>
#>  1 id_1  5.8 [u]    10
#>  2 id_2  5.6 [u]    10
#>  3 id_3  [c]         9
#>  4 id_4  [c]         7
#>  5 id_5  [c]        10
#>  6 id_6  [c]         6
#>  7 id_7  9          10
#>  8 id_8  [c]        10
#>  9 id_9  [c]         7
#> 10 id_10 [c]         7

@cat-hand
Copy link
Author

Thank Matt, I didn't know about that function 😃 will give that a go

@matt-dray matt-dray added the bug Something isn't working label Aug 29, 2023
@matt-dray matt-dray added the priority: high Important or urgent label Nov 5, 2023
@matt-dray matt-dray added this to the v0.4 milestone Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: high Important or urgent
Projects
None yet
Development

No branches or pull requests

2 participants