Getting 'number stored as text' error in mixed columns #93

cat-hand · 2023-04-19T12:24:01Z

I'm exporting rounded and suppressed numbers and am getting the 'number stored as text' error in all of my numeric columns, even for columns where there are no rounded or suppressed numbers. I tried using numFmt = "#,##0" in the openxlsx createStyle function and it hasn't helped.

It looks like it might be an openxlsx issue and seems like it's solved in openxlsx2.

Can anyone help?

matt-dray · 2023-04-19T14:01:33Z

Hi Cat, thanks for highlighting this. Are you able to provide a reproducible example of the issue?

I've created a small example below that generates a workbook with a table (in sheet 'Table_1') that has two columns: one column (mpg) that's numeric but has a suppressed value ('[c]') and one column (cyl) that is fully numeric. In the attached screenshot, you can see (as expected) the 'number stored as text' warning is in column mpg, but there's no warning in cyl. Do you get the same result?

(Sidenote: as it happens, I may be converting {a11ytables} to use {openxlsx2} instead of {openxlsx} in future (see #70). This issue may be another reason to make the change.)

# Prepare supporting tables
support_tables <- list(
  cover_df = tibble::tribble(
    ~subsection_title, ~subsection_content,
    "Purpose", "Example results for something.",
    "Workbook properties", "Some placeholder information.",
    "Contact", "Placeholder email"
  ),
  contents_df = tibble::tribble(
    ~"Sheet name", ~"Sheet title",
    "Notes", "Notes",
    "Table_1", "Example sheet title"
  ),
  notes_df = tibble::tribble(
    ~"Note number", ~"Note text",
    "[note 1]", "Placeholder note.",
    "[note 2]", "Placeholder note."
  )
)

# Prepare example dataset
table_df <- mtcars
table_df[["car [note 1]"]] <- row.names(mtcars)  # row names to named column
row.names(table_df) <- NULL  # remove row names
table_df <- table_df[1:5, c("car [note 1]", "mpg", "cyl")]  # add note to header
table_df[2, "mpg"] <- "[c]"  # make a value confidential in the mpg column
table_df["Notes"] <- c("[note 2]", rep(NA_character_, 4))  # include note column

# Create new a11ytable
my_a11ytable <-
  a11ytables::create_a11ytable(
    tab_titles = c(
      "Cover",
      "Contents",
      "Notes",
      "Table_1"
    ),
    sheet_types = c(
      "cover",
      "contents",
      "notes",
      "tables"
    ),
    sheet_titles = c(
      "Cover title (example)",
      "Contents",
      "Notes",
      "Example sheet title"
    ),
    blank_cells = c(
      NA_character_,
      NA_character_,
      NA_character_,
      "Blank cells mean that a row does not have a note."
    ),
    sources = c(
      NA_character_,
      NA_character_,
      NA_character_,
      "Example source."
    ),
    tables = list(
      support_tables$cover_df,
      support_tables$contents_df,
      support_tables$notes_df,
      table_df
    )
  )

# Generate workbook from a11ytable
my_wb <- a11ytables::generate_workbook(my_a11ytable)

# Preview output
openxlsx::openXL(my_wb)

cat-hand · 2023-04-19T15:04:10Z

Thanks for the quick reply Matt! I think the problem is that all my 'numeric' columns are actually text by the time I create my a11ytable - I have a function which rounds and suppresses all of the figures, so they are all stored as strings after using that function. I read that openxlx bases the excel column type on the column type in the input data, so I'm assuming that this is what is causing the error.

Was hoping there was some kind of workaround but I guess not haha. I have loads of tables in my publication so getting rid of the error manually isn't awesome. openxlsx2 sounds very promising though!

For info, this is essentially how that function works:

library(dplyr)
table_df <- mtcars %>%
  mutate(across(where(is.numeric), ~case_when(
    # in this example, figures less than 5 are suppressed
    .x < 5 ~ "[c]",
    # figures less than 6 are 'unreliable' so are shown but marked with 'u'
    .x < 6 ~ paste(janitor::round_half_up(.x, 1), "[u]"),
    TRUE ~ as.character(janitor::round_half_up(.x, 0)))))

matt-dray · 2023-04-19T16:47:57Z

Okey dokey, I think I understand: columns get converted to character class even if they remain unsuppressed. I think there's a few ways to convert numbers-only columns back to numeric en masse. Below is some demo code that uses type.convert() to do this (see how numeric column y gets converted to character in table_supp, but then ends up as numeric at the end). Let me know if I haven't quite understood.

suppressPackageStartupMessages(library(tidyverse))
library(janitor, warn.conflicts = FALSE)

set.seed(1337)

tbl <- tibble(
  id = paste0("id_", 1:10),
  x  = runif(10, 0, 10),  # will contain values to be suppressed
  y  = runif(10, 6, 10)   # won't need any suppression
)

tbl
#> # A tibble: 10 × 3
#>    id        x     y
#>    <chr> <dbl> <dbl>
#>  1 id_1  5.76   9.92
#>  2 id_2  5.65   9.97
#>  3 id_3  0.740  9.31
#>  4 id_4  4.54   6.78
#>  5 id_5  3.73   9.93
#>  6 id_6  3.31   6.10
#>  7 id_7  9.48   9.89
#>  8 id_8  2.81   9.70
#>  9 id_9  2.45   7.36
#> 10 id_10 1.46   6.99

tbl_supp <- tbl |> 
  mutate(
    across(
      where(is.numeric),
      \(value) case_when(
        value < 5 ~ "[c]",                                  # suppress
        value < 6 ~ paste(round_half_up(value, 1), "[u]"),  # 'unreliable' flag
        TRUE      ~ as.character(round_half_up(value, 0))   # otherwise round
      )
    )
  )

tbl_supp  # note 'y' is character but contains only numbers
#> # A tibble: 10 × 3
#>    id    x       y    
#>    <chr> <chr>   <chr>
#>  1 id_1  5.8 [u] 10   
#>  2 id_2  5.6 [u] 10   
#>  3 id_3  [c]     9    
#>  4 id_4  [c]     7    
#>  5 id_5  [c]     10   
#>  6 id_6  [c]     6    
#>  7 id_7  9       10   
#>  8 id_8  [c]     10   
#>  9 id_9  [c]     7    
#> 10 id_10 [c]     7

type.convert(tbl_supp, as.is = TRUE)  # note 'y' is now numeric
#> # A tibble: 10 × 3
#>    id    x           y
#>    <chr> <chr>   <int>
#>  1 id_1  5.8 [u]    10
#>  2 id_2  5.6 [u]    10
#>  3 id_3  [c]         9
#>  4 id_4  [c]         7
#>  5 id_5  [c]        10
#>  6 id_6  [c]         6
#>  7 id_7  9          10
#>  8 id_8  [c]        10
#>  9 id_9  [c]         7
#> 10 id_10 [c]         7

cat-hand · 2023-04-20T06:55:13Z

Thank Matt, I didn't know about that function 😃 will give that a go

JanMarvin mentioned this issue Jun 8, 2023

add to_string() to convert variables with labels attribute JanMarvin/openxlsx2#639

Merged

matt-dray added the bug Something isn't working label Aug 29, 2023

matt-dray added the priority: high Important or urgent label Nov 5, 2023

matt-dray added this to the v0.4 milestone Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting 'number stored as text' error in mixed columns #93

Getting 'number stored as text' error in mixed columns #93

cat-hand commented Apr 19, 2023

matt-dray commented Apr 19, 2023

cat-hand commented Apr 19, 2023

matt-dray commented Apr 19, 2023

cat-hand commented Apr 20, 2023

Getting 'number stored as text' error in mixed columns #93

Getting 'number stored as text' error in mixed columns #93

Comments

cat-hand commented Apr 19, 2023

matt-dray commented Apr 19, 2023

cat-hand commented Apr 19, 2023

matt-dray commented Apr 19, 2023

cat-hand commented Apr 20, 2023