Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should rbindlist(..., fill=TRUE) return NA_logical_ in list columns? #4198

Closed
sritchie73 opened this issue Jan 25, 2020 · 6 comments
Closed
Labels
non-atomic column e.g. list columns, S4 vector columns question rbindlist

Comments

@sritchie73
Copy link
Contributor

From #4196:

When filling a list column, rbindlist departs from the behaviour of all other column types, and returns NULL elements instead of NA:

> A = data.table(c1=0, c2=list(1:3))
> B = data.table(c1=1)
> rbind(A,B,fill=TRUE)
      c1     c2
   <num> <list>
1:     0  1,2,3
2:     1   

Expected:

> A = data.table(c1=0, c2=list(1:3))
> B = data.table(c1=1)
> rbind(A,B,fill=TRUE)
      c1     c2
   <num> <list>
1:     0  1,2,3
2:     1  NA   

Should we change this behaviour for list columns to fill the rows with NA values to match the behaviour of fill=TRUE for other column types?

@sritchie73
Copy link
Contributor Author

PR with fix provided if we want to make the change

@jangorecki
Copy link
Member

jangorecki commented Jan 25, 2020

Current behaviour seems fine to me.

> str(as.integer(NULL)[1L])
 int NA
> str(as.list(NULL)[1L])
List of 1
 $ : NULL

IMO it should not be NA because:

  • it changes the type from a missing field (undefined) to a logical vector
  • it changes the length from 0 length to length 1

@mattdowle
Copy link
Member

mattdowle commented Feb 18, 2020

I'm leaning towards Jan's point. Current behavior of empty element is actually a list's way of representing missing (there isn't any object to point to). We could construct an example where each item of the list was a logical vector, each item being the result of some computation. In such a case, 3 different states might need to be represented: length 0 logical, length 1 NA logical, and missing computation. If length 1 NA logical was used for missing, those 2 couldn't be distinguished.

Would changing the print method suffice? Instead of nothing being printed, how about NULL ? Printing NA could again imply a length 1 NA logical, whereas NULL would be unambiguous, consistent with what base R prints for empty list items, and would give a further visual reminder that it was a list column.

@MichaelChirico
Copy link
Member

also agree w Jan, in particular about length 0 --> length 1.

I'm using lengths(x)>0 a lot to filter rows by empty list columns.

we could I guess put logical() there instead, is there any advantage of logical() over NULL though?

@sritchie73
Copy link
Contributor Author

What if we just instead add a sentence to the documentation noting the behaviour in the case of a missing list column:

Current entry for the fill argument:

TRUE fills missing columns with NAs. By default FALSE. When TRUE, use.names is set to TRUE.

Proposed:

TRUE fills missing columns with NAs, or NULL for missing list columns. By default FALSE. When TRUE, use.names is set to TRUE.

@mattdowle
Copy link
Member

Doc change looks good. Plus the print method change I suggested too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-atomic column e.g. list columns, S4 vector columns question rbindlist
Projects
None yet
Development

No branches or pull requests

5 participants