Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show sniffed delimiter on exception #1011

Open
wataash opened this issue Jan 9, 2019 · 2 comments
Open

Show sniffed delimiter on exception #1011

wataash opened this issue Jan 9, 2019 · 2 comments

Comments

@wataash
Copy link

wataash commented Jan 9, 2019

# colA,colB
# aaaaa...aaaaa zzzzz...zzzzz  \
# ...                           } 10 or 100 rows
# aaaaa...aaaaa zzzzz...zzzzz  /
#
# \___________/ \___________/
#  1000chars     1000chars

# 10 rows
# "," is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(10)]" | csvstat
# => ok

# 100 rows
# " " is used as delimiter
python3 -c "print('colA,colB') ; [print('a'*1000 + ' ' + 'z'*1000) for _ in range(100)]" | csvstat
# => Row 0 has 3 values, but Table only has 2 columns.

In the latter case, sample is trimmed, losing the header colA,colB, thus white space " " is used as the delimiter.

It was tough for me to figure out this behavior. So how about showing "what delimiter is used" in:

  1. Debug output
$ csvstat -v ...
inferred delimiter: ' '
  1. Error message
$ csvstat -v ...
Row 0 has 3 values, but Table only has 2 columns (delimiter: ' ').

and, how about showing warning of excessing SNIFF_LIMIT?:

$ csvstat -v ...
warning: input (XXX bytes) exceeds SNIFF_LIMIT (YYY bytes), delimiter guessing may be incorrect (NOTE: SNIFF_LIMIT can be changed by -y flag)
warning: guessed delimiter: ' '
Row 0 has 3 values, but Table only has 2 columns.
@wataash wataash changed the title Want delimiter to be shown when raise or -v Want delimiter to be shown on exception Jan 9, 2019
@jpmckinney jpmckinney added this to the 1.0.4 milestone Feb 7, 2019
@jpmckinney
Copy link
Member

Thanks - we'll try to do this as part of the next version.

@jpmckinney jpmckinney changed the title Want delimiter to be shown on exception Show sniffed delimiter on exception Oct 17, 2023
@jpmckinney jpmckinney removed this from the Next version milestone Oct 17, 2023
@jpmckinney
Copy link
Member

Hmm, agate raises ValueError for "Row 0 has 3 values, but Table only has 2 columns." type errors in agate/table/__init__.py. We'd have to introduce a new error class (subclass'ing ValueError, in case anyone catches these). We'd also have to handle it all over the place, because we need access to the reader to print the dialect.

Debug output

This is a good idea. As above, we'd have to add it in a lot of places. Happy to merge a PR!

and, how about showing warning of excessing SNIFF_LIMIT?:

The snifflimit was reduced in 1.0.7 to avoid sniffing huge files (which is very slow). So, this warning would now be emitted too frequently to be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants