Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some unit tests #35

Merged
merged 1 commit into from
Jun 18, 2018
Merged

Fix some unit tests #35

merged 1 commit into from
Jun 18, 2018

Conversation

martindiphoorn
Copy link
Collaborator

No description provided.

@benoitx benoitx merged commit 30e41bb into Appendium:master Jun 18, 2018
@benoitx
Copy link
Contributor

benoitx commented Jun 18, 2018

Thanks for this Martin. It is greatly appreciated.

What do you think should happen to 'invalid' lines?
e.g. "hello,world

At the moment it will parse as [hello,world], single column, no double quote...

and this one:
"col1",col2",col3 returns col1 col2" col3

AAARRRRGGG

@benoitx
Copy link
Contributor

benoitx commented Jun 18, 2018

My thinking is: leave the parsing of the line as is BUT have the DataSet parser detect if we have an ODD number of Qualifiers (e.g. " or | or whatever). It is the number is odd -> Do not parse the line and raise an error, the line is malformed. What do you think?

@martindiphoorn
Copy link
Collaborator Author

martindiphoorn commented Jun 18, 2018

If i use a library like flatpack i would be happy that this decision is left to the user of the library. So I would make it a setting. Something like failOnError setting and skipLineOnErrror.

failOnError=true: Stop processing and raise an exception
failOnError=false: Check the skipLineOnError value
skipLineOnError=true: Skip the line if there is an error in it and continue
skipLineOnError=false: Do our best to get the right value

Just an idea, it gives the user some flexibility. And every use case is different.

Some examples:

col1,"col2",col3       // good
col1,"co""l2",col3    // good
"col1,col2,col3"       // valid, its just one column
"col1"",""col2,col3"  // valid, its just one column
"col1,coll2,col3       // invalid, when starting with qualifier it should end also  with it
col1",col2,col3        // invalid, end but no starting
co""l1,col2,col3       // invalid, if qualifiers are escaped the field should be wrapped with it

The big question is, what should be the best effort result?

CSV lint i was talking about is here:
https://csvlint.io/

@martindiphoorn
Copy link
Collaborator Author

Just thinking, should we create an seperate unit test for checking RFC compatibility?

https://www.ietf.org/rfc/rfc4180.txt

@benoitx
Copy link
Contributor

benoitx commented Jun 18, 2018 via email

benoitx added a commit that referenced this pull request Jun 18, 2018
Issue #35 add tests for invalid line.
benoitx added a commit that referenced this pull request Jun 18, 2018
benoitx added a commit that referenced this pull request Jun 18, 2018
@benoitx
Copy link
Contributor

benoitx commented Jul 4, 2018

Hi Martin
Thinking of doing a release but do you know how to build the Maven site with TRAVIS CI? Getting stuck on this... Thanks
Benoit

@martindiphoorn
Copy link
Collaborator Author

martindiphoorn commented Jul 5, 2018

Hi Benout,

i never used TRAVIS CI. I have experience with Gitlab CI.
If you can give me some pointers. I can look in to it.

Martin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants