New inverted grammar, starting with header cells #31

nacnudus · 2019-12-10T20:01:09Z

The current unpivotr grammar starts from the point of view of data cells, and searches for associated headers. This imitated databaker, because it is useful in the most common case (in my experience).

The header cells surround the data cells.
There are more different headers than you care to hardcode into a script

At long last, there is an example of a consistent schema that breaks (1) and doesn't suffer from (2).

Tweet
Gist (fork)

Untidy data

Tidy version

Thoughts

Locate each type of header by filtering, e.g. character == "Species:". Error if not unique (see step 4 for when whole tables repeat, as in the example).
Describe the domain of the header over related data cells by its direction and limit, e.g. direction = "W" and limit = 1 or limit = Inf. Unlike the existing grammar, the direction is from the point of view of the header cell, rather than the data cells.
Given a set of headers so described, unpivotr would resolve the data cells to the matching headers.
If the whole table repeats, as in the example above, the same technique would apply as now -- identify a corner cell of each table, nest, and unpivot one at a time.

The text was updated successfully, but these errors were encountered:

jl5000 · 2019-12-10T23:47:28Z

Do we know if there are any other datasets with this structure or if it's an evil one-off? I've never seen a structure like this before.

nacnudus · 2019-12-11T09:53:34Z

That's a reasonable point, although it isn't how nerd-sniping works 😄

danstrobridge-Weston · 2020-10-17T17:38:19Z

I often get this sort of semi-structured format when working spreadsheets / text files generated by exporting pivoted tables from pdf. i'm eager to test the readr::melt functionality for dealing with it on my next project that can afford to pay me for some development time.

bedantaguru mentioned this issue Jan 16, 2020

Create Example Data on which read_cells works (as expected) r-rudra/tidycells#5

Open

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New inverted grammar, starting with header cells #31

New inverted grammar, starting with header cells #31

nacnudus commented Dec 10, 2019 •

edited

jl5000 commented Dec 10, 2019

nacnudus commented Dec 11, 2019

danstrobridge-Weston commented Oct 17, 2020

New inverted grammar, starting with header cells #31

New inverted grammar, starting with header cells #31

Comments

nacnudus commented Dec 10, 2019 • edited

Untidy data

Tidy version

Thoughts

jl5000 commented Dec 10, 2019

nacnudus commented Dec 11, 2019

danstrobridge-Weston commented Oct 17, 2020

nacnudus commented Dec 10, 2019 •

edited