Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with approaches to view Record-heterogeneity in csv files. (ragged csvs) #79

Open
alexhallam opened this issue Oct 1, 2021 · 2 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@alexhallam
Copy link
Owner

alexhallam commented Oct 1, 2021

Here I am using the term "ragged csv" like miller.

In a standard csv, if a cell is missing ,NA, it is common to just omit the value, but retain the commas ,,. @lithiumfrost uploaded a "ragged csv" where the omitted data came with no commas --- or any delimiter. See line 361 in the below image

image

It was mentioned that miller has an option to work with ragged csvs.

#75 (comment)

This is an open issue to think about how tv should work with these types of files.

Proposal

Since tv is based on pillar I lean on the shoulders of giants and see what the creators of the fantastic GNU-R pillar library decided to do. There are two components:

  1. Truncated warnings when parsing
  2. A pretty print of readable records

Here is the output.

image

Original Test file:
en_climate_hourly_AB_3012209_05-2021_P1H.csv

@alexhallam
Copy link
Owner Author

Next steps:

  1. see if I can wrap the reading of csv in a error type.
  2. Pass the error and continue reading.
  3. Print the errors (truncated if more than 5 -- add all errors to current debug option). Print the readable data.

@alexhallam
Copy link
Owner Author

@lithiumfrost I wanted to keep you in the loop. I am currently leaning heavily to the above proposal. This will close #75

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant