Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header Detection Improvement #45

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ben-bitdotio
Copy link

@ben-bitdotio ben-bitdotio commented Aug 17, 2021

Summary:
Added resolution between float and int types so they aren't recognized as incompatible.

Tests:
Verified that the following file is correctly predicted to have a header via Detector.has_header().

col1,col2,col3
hello,"hello world", 1.2
world,"hello world", 1.2
test,"hello world 您", 1

Update: I will be unable to contribute to this discussion under this account after today. It appears that I'm unable to modify the assignees list but @ellie-bitio should be able to follow up if necessary.

@GjjvdBurg
Copy link
Collaborator

Thanks for opening an issue on this and creating a PR @ben-bitdotio! The header detection code could definitely be improved, but I've been waiting until I have a dataset to evaluate the accuracy of different algorithms. This fix seems pretty harmless though, so I think we can merge it for now.

Would you be able to add a unit test to tests/test_unit/test_detect.py that fails without your fix but passes with your fix? That would be a nice confirmation that it works as expected (the example you give above could work as a test case). Thank you!

(cc-ing @ellie-bitio as suggested)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants