Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomprehensible UnicodeDecodeError #575

Open
fsaad opened this issue Jul 28, 2017 · 0 comments
Open

Incomprehensible UnicodeDecodeError #575

fsaad opened this issue Jul 28, 2017 · 0 comments

Comments

@fsaad
Copy link
Collaborator

fsaad commented Jul 28, 2017

Download the file: t1.csv, where the offending character is in the last column of the last line.

probcomp-1:/scratch/fsaad/sandbox/preproc% cat t1.csv
tag,version,custom,abstract,datatype,iord,crdr,tlabel
RedeemableCommonStockMember,0001654954-17-000551,1,0,member,D,,Redeemable Common Stock
RedeemableCommonStockValue,0001654954-17-000551,1,0,monetary,I,C,"Common stock subject to possible redemption, at $200,004; 38,364 shares issued and outstanding at redemption value as of October 31, 2016, none as of October 31, 2015"
SupplementalDisclosureOfNoncashInvestingAndFinancingActivitiesAbstract,0001654954-17-000551,1,1,,,,SUPPLEMENTAL DISCLOSURE OF NON�_CASH INVESTING AND FINANCING ACTIVITIES:

Loading the data in bayeslite gives:

probcomp-1:/scratch/fsaad/sandbox/preproc% python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bayeslite
>>> bdb = bayeslite.bayesdb_open(':memory:')
>>> bdb.execute('CREATE TABLE t FROM \'t1.csv\'')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/bayesdb.py", line 228, in execute
    self.tracer, self._do_execute, string, bindings)
  File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/bayesdb.py", line 236, in _maybe_trace
    return meth(string, bindings)
  File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/bayesdb.py", line 277, in _do_execute
    cursor = bql.execute_phrase(self, phrase, bindings)
  File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/bql.py", line 113, in execute_phrase
    bdb, phrase.name, phrase.csv, header=True, create=True)
  File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/read_csv.py", line 37, in bayesdb_read_csv_file
    ifnotexists=ifnotexists)
  File "/scratch/fsaad/bayeslite/build/lib.linux-x86_64-2.7/bayeslite/read_csv.py", line 121, in bayesdb_read_csv
    bdb.sql_execute(sql, [unicode(v, 'utf8').strip() for v in row])
  File "/scratch/fsaad/.pyenv2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8c in position 30: invalid start byte
>>> 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant