Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

format_tt mishandles Inf, NaN, and NA #218

Closed
avehtari opened this issue Apr 3, 2024 · 6 comments
Closed

format_tt mishandles Inf, NaN, and NA #218

avehtari opened this issue Apr 3, 2024 · 6 comments

Comments

@avehtari
Copy link

avehtari commented Apr 3, 2024

With plain tt() all as expected

> library(tinytable)
> data.frame(x=1, y=Inf) |> tt()

+---+-----+
| x | y   |
+===+=====+
| 1 | Inf |
+---+-----+ 
> data.frame(x=1, y=NaN) |> tt()

+---+-----+
| x | y   |
+===+=====+
| 1 | NaN |
+---+-----+ 
> data.frame(x=1, y=NA) |> tt()

+---+----+
| x | y  |
+===+====+
| 1 | NA |
+---+----+ 

with format_tt() not as expected

> data.frame(x=1, y=Inf) |> tt() |> format_tt()
Error in if (any(abs(x - round(x)) > (.Machine$double.eps)^0.5)) return(FALSE) : 
  missing value where TRUE/FALSE needed
> data.frame(x=1, y=NaN) |> tt() |> format_tt()

+---+---+
| x | y |
+===+===+
| 1 |   |
+---+---+ 
> data.frame(x=1, y=NA) |> tt() |> format_tt()

+---+---+
| x | y |
+===+===+
| 1 |   |
+---+---+ 

Setting some tinytable options cause the same problem with tt()

options(tinytable_format_num_fmt = "significant_cell", tinytable_format_digits = 2, tinytable_tt_digits=2)
> data.frame(x=1, y=Inf) |> tt()
Error in if (any(abs(x - round(x)) > (.Machine$double.eps)^0.5)) return(FALSE) : 
  missing value where TRUE/FALSE needed
@vincentarelbundock
Copy link
Owner

vincentarelbundock commented Apr 3, 2024

Thanks a lot for the report. I can confirm that I see the same error in the Inf case only.

To clarify, do you agree that the default behavior of format_tt() should be to not raise a warning or error, and should convert:

  • NA -> ""
  • NaN -> ""
  • Inf -> "Inf"

Keep in mind that the default argument is format_tt(replace_na=""). If you agree, then the fix I just pushed on Github should do the trick:

library(tinytable)

data.frame(x = 1, y = Inf) |> tt() |> format_tt()
x y
1 Inf
data.frame(x = 1, y = NaN) |> tt() |> format_tt()
x y
1
data.frame(x = 1, y = NA) |> tt() |> format_tt()
x y
1

@avehtari
Copy link
Author

avehtari commented Apr 3, 2024

I had not noticed replace_na="" default. I think it makes sense for NA. I would prefer replacing NaN -> "NaN" as Inf and NaN are both numeric values and possible results from arithmetic operations (part of the IEC 60559 standard) and thus it would be good to have the same behavior for them. It is a bit confusing that is.na(NaN) gives TRUE, while NaN is not a missing value in the sense as NA is a missing value. See also more about this in my suggestion of how to improve R documentation in avehtari/r-source#1

Did you change the behavior on tt() as currently tinytable_tt_digits option changes the behavior of tt()

> data.frame(x=1, y=NaN) |> tt()

+---+-----+
| x | y   |
+===+=====+
| 1 | NaN |
+---+-----+ 
> options(tinytable_tt_digits=2)
> data.frame(x=1, y=NaN) |> tt()

+---+---+
| x | y |
+===+===+
| 1 |   |
+---+---+ 

@vincentarelbundock
Copy link
Owner

vincentarelbundock commented Apr 3, 2024

Thanks for the link. Interesting discussion.

Yes, I had modified the default display of NaN while fixing the bug, but I now agree with you that this is unadvisable. NaN should have the same behavior as Inf, and display the "NaN" characters as a string.

Here's a new proposal:

  • replace_na argument is deprecated in favor of a more general replace argument. We maintain backward compatibility and raise a warning to indicate the name change.
  • By default, replace="" and only replaces NA by an empty cell.
  • Users can supply a different string to replace NA, such as format_tt(replace="-")
  • Users can also supply a named list to replace different values with different strings.

Examples below.

You can try it by installing the branch in this PR: #220

What do you think?

library(tinytable)

x <- data.frame(x = 1:5, y = c(pi, NA, NaN, -Inf, Inf))

dict <- list("-" = c(NA, NaN), "-∞" = -Inf, "" = Inf)

tt(x) |> format_tt(replace = "-")

    +---+----------+
    | x | y        |
    +===+==========+
    | 1 | 3.141593 |
    +---+----------+
    | 2 |      -   |
    +---+----------+
    | 3 |      NaN |
    +---+----------+
    | 4 |     -Inf |
    +---+----------+
    | 5 |      Inf |
    +---+----------+ 

tt(x) |> format_tt(replace = dict)

    +---+----------+
    | x | y        |
    +===+==========+
    | 1 | 3.141593 |
    +---+----------+
    | 2 | -        |
    +---+----------+
    | 3 | -        |
    +---+----------+
    | 4 | -|
    +---+----------+
    | 5 ||
    +---+----------+ 

@avehtari
Copy link
Author

avehtari commented Apr 4, 2024

I tested running the PR

  • replace argument looks great and useful
  • do you plan adding a global option for replace? It would be useful in notebooks with many tables
  • options(tinytable_tt_digits=2) is still changing the NA behavior of plain tt(), which is an unexpected side effect of setting the number of digits

@vincentarelbundock
Copy link
Owner

Thanks for trying it out!

I have added global options for all arguments in format_tt().

The digits docs are now clarified and the behavior more consistent. It does require some attention, but I think the new behavior does make some sense:

?tt:

digits: Number of significant digits to keep for numeric variables.
        When ‘digits’ is an integer, ‘tt()’ calls ‘format_tt(x,
        digits = digits)’ before proceeding to draw the table. Note
        that this will apply all default argument values of
        ‘format_tt()’, such as replacing ‘NA’ by "". Users who need
        more control can use the ‘format_tt()’ function instead.

?format_tt:

replace: Logical, String or Named list of vectors
            • TRUE: Replace ‘NA’ by an empty string.
            • FALSE: Print ‘NA’ as the string "NA".
            • String: Replace ‘NA’ entries by the user-supplied string.
            • Named list: Replace matching elements of the vectors in
            the list by theirs names. Example:
                • ‘list("-" = c(NA, NaN), "-∞" = -Inf, "∞" = Inf)’

This means you get:

library(tinytable)

x <- data.frame(x = pi, y = NA)

options(tinytable_tt_digits = 2)
tt(x)

| x   | y   |
|-----|-----|
| 3.1 |     |

options(tinytable_format_replace = "zzz")
tt(x)

| x   | y   |
|-----|-----|
| 3.1 | zzz |

options(tinytable_format_replace = FALSE)
tt(x)

| x   | y   |
|-----|-----|
| 3.1 | NA  |

@avehtari
Copy link
Author

avehtari commented Apr 4, 2024

I'm fine with this approach. I think the docs are clear. Thanks for making the changes so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants