Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow space as a separator #76

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jonocarroll
Copy link
Contributor

Closes #61.

Simple fix to guess_sep to allow space to be used as a separator. Tested with the example provided.

to RealAge
513 59.608
513 84.18
0 85.23
119 74.764
116 65.356
data.frame(
          to = c(513L, 513L, 0L, 119L, 116L),
     RealAge = c(59.608, 84.18, 85.23, 74.764, 65.356)
)
tibble::tribble(
   ~to, ~RealAge,
  513L,   59.608,
  513L,    84.18,
    0L,    85.23,
  119L,   74.764,
  116L,   65.356
  )

Also modified a test file for use with this

datapasta:::guess_sep(readr::read_lines(file = "tests/testthat/brisbane_weather_empty_lines_spaces.txt"))
#> [1] " "

@jonocarroll
Copy link
Contributor Author

jonocarroll commented Jul 23, 2018

I tested this with a random example off of StackOverflow and it uncovered a couple of bugs in my implementation - firstly, multiple spaces may be used as a separator. Secondly, if the SO content is pasted in from R then it may have rownames, in which case there will be no column header.

I have pushed a fix which accounts for both of these, and creates a quoted \"NA\" column header in place. This probably also fixes the scenario where NA is used as a column header by accident.

Example:

   chr       genome region
1 chr1 hg19_refGene    CDS
2 chr1 hg19_refGene   exon
3 chr1 hg19_refGene    CDS
4 chr1 hg19_refGene   exon
5 chr1 hg19_refGene    CDS
6 chr1 hg19_refGene   exon

data.frame(stringsAsFactors=FALSE,
        "NA" = c(1L, 2L, 3L, 4L, 5L, 6L),
         chr = c("chr1", "chr1", "chr1", "chr1", "chr1", "chr1"),
      genome = c("hg19_refGene", "hg19_refGene", "hg19_refGene",
                 "hg19_refGene", "hg19_refGene", "hg19_refGene"),
      region = c("CDS", "exon", "CDS", "exon", "CDS", "exon")
)
tibble::tribble(
  ~"NA",   ~chr,        ~genome, ~region,
   1L, "chr1", "hg19_refGene",   "CDS",
   2L, "chr1", "hg19_refGene",  "exon",
   3L, "chr1", "hg19_refGene",   "CDS",
   4L, "chr1", "hg19_refGene",  "exon",
   5L, "chr1", "hg19_refGene",   "CDS",
   6L, "chr1", "hg19_refGene",  "exon"
  )

You could go one step further and detect when the first column is a regular integer sequence 1:nrow(d) with no column name and auto-name it "rownames".

@jonocarroll
Copy link
Contributor Author

The added quotes around NA for tibbles throws off the indentation - that will need looking at.

@MilesMcBain
Copy link
Owner

I'll need to have a think about this one. My first reaction is row names that are 1:n are redundant so just blow them away.

But then dealing with the actual row names still needs to happen.

Also: Is it my imagination or is there another addin already that does this?

@jonocarroll
Copy link
Contributor Author

This PR now has an interaction with #87 -- what about dropping rownames if they are simply 1:nrow(x) but otherwise keeping them as rownames for a data.frame, and as a "rownames" column if tibble?

@caldwellst
Copy link

Hey all, found this as I ran into this issue trying to copy from SO (where it seems this feature request originated). However, unlike in #61, I wasn't even able to Paste as tibble at all, as it now generates the following warning and returns NULL. The feature request still works as intended.

# Using CRAN version 3.0.0
tribble_paste()
#> Could not paste clipboard as tibble. Text could not be parsed as table.
#> NULL

# Using jonocarroll's feature/space_separator branch
tibble::tribble(
  ~ID, ~Type,     ~Group, ~Week, ~Value,
  111,   "A",   "Pepper",    -1,     10,
  112,   "B",     "Salt",     2,     20,
  113,   "C",    "Curry",     4,     40,
  114,   "D", "Rosemary",     9,     90,
  211,   "A",   "Pepper",    -1,     15,
  212,   "B",     "Salt",     2,     30,
  214,   "D", "Rosemary",     9,    135
)

# And readr
readr::read_table(
  "ID  Type  Group      Week    Value
   111 A      Pepper     -1      10
   112 B      Salt        2      20
   113 C      Curry       4      40
   114 D      Rosemary    9      90
   211 A      Pepper     -1      15
   212 B      Salt        2      30
   214 D      Rosemary    9      135"
)
#> # A tibble: 7 x 4
#>   ID    Type  `Group      Week` Value
#>   <lgl> <chr> <chr>             <dbl>
#> 1 NA    111 A Pepper     -1        10
#> 2 NA    112 B Salt        2        20
#> 3 NA    113 C Curry       4        40
#> 4 NA    114 D Rosemary    9        90
#> 5 NA    211 A Pepper     -1        15
#> 6 NA    212 B Salt        2        30
#> 7 NA    214 D Rosemary    9       135

Anyway, not really new findings, but just wanted to show additional interest in this feature. Thanks for the great package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read tables from Stack Overflow
3 participants