Allow space as a separator #76

jonocarroll · 2018-07-23T13:25:10Z

Closes #61.

Simple fix to guess_sep to allow space to be used as a separator. Tested with the example provided.

to RealAge
513 59.608
513 84.18
0 85.23
119 74.764
116 65.356
data.frame(
          to = c(513L, 513L, 0L, 119L, 116L),
     RealAge = c(59.608, 84.18, 85.23, 74.764, 65.356)
)
tibble::tribble(
   ~to, ~RealAge,
  513L,   59.608,
  513L,    84.18,
    0L,    85.23,
  119L,   74.764,
  116L,   65.356
  )

Also modified a test file for use with this

datapasta:::guess_sep(readr::read_lines(file = "tests/testthat/brisbane_weather_empty_lines_spaces.txt"))
#> [1] " "

suitable test file closes MilesMcBain#61

jonocarroll · 2018-07-23T14:30:31Z

I tested this with a random example off of StackOverflow and it uncovered a couple of bugs in my implementation - firstly, multiple spaces may be used as a separator. Secondly, if the SO content is pasted in from R then it may have rownames, in which case there will be no column header.

I have pushed a fix which accounts for both of these, and creates a quoted \"NA\" column header in place. This probably also fixes the scenario where NA is used as a column header by accident.

Example:

   chr       genome region
1 chr1 hg19_refGene    CDS
2 chr1 hg19_refGene   exon
3 chr1 hg19_refGene    CDS
4 chr1 hg19_refGene   exon
5 chr1 hg19_refGene    CDS
6 chr1 hg19_refGene   exon

data.frame(stringsAsFactors=FALSE,
        "NA" = c(1L, 2L, 3L, 4L, 5L, 6L),
         chr = c("chr1", "chr1", "chr1", "chr1", "chr1", "chr1"),
      genome = c("hg19_refGene", "hg19_refGene", "hg19_refGene",
                 "hg19_refGene", "hg19_refGene", "hg19_refGene"),
      region = c("CDS", "exon", "CDS", "exon", "CDS", "exon")
)
tibble::tribble(
  ~"NA",   ~chr,        ~genome, ~region,
   1L, "chr1", "hg19_refGene",   "CDS",
   2L, "chr1", "hg19_refGene",  "exon",
   3L, "chr1", "hg19_refGene",   "CDS",
   4L, "chr1", "hg19_refGene",  "exon",
   5L, "chr1", "hg19_refGene",   "CDS",
   6L, "chr1", "hg19_refGene",  "exon"
  )

You could go one step further and detect when the first column is a regular integer sequence 1:nrow(d) with no column name and auto-name it "rownames".

jonocarroll · 2018-07-23T14:34:38Z

The added quotes around NA for tibbles throws off the indentation - that will need looking at.

MilesMcBain · 2018-07-28T06:54:44Z

I'll need to have a think about this one. My first reaction is row names that are 1:n are redundant so just blow them away.

But then dealing with the actual row names still needs to happen.

Also: Is it my imagination or is there another addin already that does this?

jonocarroll · 2019-02-20T22:43:02Z

This PR now has an interaction with #87 -- what about dropping rownames if they are simply 1:nrow(x) but otherwise keeping them as rownames for a data.frame, and as a "rownames" column if tibble?

caldwellst · 2020-01-10T12:43:07Z

Hey all, found this as I ran into this issue trying to copy from SO (where it seems this feature request originated). However, unlike in #61, I wasn't even able to Paste as tibble at all, as it now generates the following warning and returns NULL. The feature request still works as intended.

# Using CRAN version 3.0.0
tribble_paste()
#> Could not paste clipboard as tibble. Text could not be parsed as table.
#> NULL

# Using jonocarroll's feature/space_separator branch
tibble::tribble(
  ~ID, ~Type,     ~Group, ~Week, ~Value,
  111,   "A",   "Pepper",    -1,     10,
  112,   "B",     "Salt",     2,     20,
  113,   "C",    "Curry",     4,     40,
  114,   "D", "Rosemary",     9,     90,
  211,   "A",   "Pepper",    -1,     15,
  212,   "B",     "Salt",     2,     30,
  214,   "D", "Rosemary",     9,    135
)

# And readr
readr::read_table(
  "ID  Type  Group      Week    Value
   111 A      Pepper     -1      10
   112 B      Salt        2      20
   113 C      Curry       4      40
   114 D      Rosemary    9      90
   211 A      Pepper     -1      15
   212 B      Salt        2      30
   214 D      Rosemary    9      135"
)
#> # A tibble: 7 x 4
#>   ID    Type  `Group      Week` Value
#>   <lgl> <chr> <chr>             <dbl>
#> 1 NA    111 A Pepper     -1        10
#> 2 NA    112 B Salt        2        20
#> 3 NA    113 C Curry       4        40
#> 4 NA    114 D Rosemary    9        90
#> 5 NA    211 A Pepper     -1        15
#> 6 NA    212 B Salt        2        30
#> 7 NA    214 D Rosemary    9       135

Anyway, not really new findings, but just wanted to show additional interest in this feature. Thanks for the great package!

jonocarroll added 2 commits July 23, 2018 22:53

⚡ adds support for space as a separator

12c72fe

suitable test file closes MilesMcBain#61

allow wrong number of header lines (NA)

938f0f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow space as a separator #76

Allow space as a separator #76

jonocarroll commented Jul 23, 2018

jonocarroll commented Jul 23, 2018 •

edited

jonocarroll commented Jul 23, 2018

MilesMcBain commented Jul 28, 2018

jonocarroll commented Feb 20, 2019

caldwellst commented Jan 10, 2020

Allow space as a separator #76

Are you sure you want to change the base?

Allow space as a separator #76

Conversation

jonocarroll commented Jul 23, 2018

jonocarroll commented Jul 23, 2018 • edited

jonocarroll commented Jul 23, 2018

MilesMcBain commented Jul 28, 2018

jonocarroll commented Feb 20, 2019

caldwellst commented Jan 10, 2020

jonocarroll commented Jul 23, 2018 •

edited