Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeError when run example: python -m unittest discover tests/ #225

Open
ruleGreen opened this issue Nov 29, 2019 · 2 comments
Open

UnicodeError when run example: python -m unittest discover tests/ #225

ruleGreen opened this issue Nov 29, 2019 · 2 comments

Comments

@ruleGreen
Copy link

eEncodeError: 'ascii' codec can't encode character '\u0100' in position 6: ordinal not in range(128)

======================================================================
ERROR: test_weighted_layers (test_elmo.TestWeightedLayers)

Traceback (most recent call last):
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 118, in test_weighted_layers
self._check_weighted_layer(1.0, do_layer_norm=True, use_top_only=False)
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 28, in _check_weighted_layer
batcher = Batcher(vocab_file, 50)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 204, in init
lm_vocab_file, max_token_length
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 117, in init
super(UnicodeCharsVocabulary, self).init(filename, **kwargs)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 29, in init
for line in f:
File "/research/d2/hrwang/pythonlib/anaconda3/envs/tensorflow-gpu/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 820: ordinal not in range(128)

======================================================================
ERROR: test_weighted_layers_no_norm (test_elmo.TestWeightedLayers)

Traceback (most recent call last):
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 121, in test_weighted_layers_no_norm
self._check_weighted_layer(1.0, do_layer_norm=False, use_top_only=False)
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 28, in _check_weighted_layer
batcher = Batcher(vocab_file, 50)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 204, in init
lm_vocab_file, max_token_length
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 117, in init
super(UnicodeCharsVocabulary, self).init(filename, **kwargs)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 29, in init
for line in f:
File "/research/d2/hrwang/pythonlib/anaconda3/envs/tensorflow-gpu/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 820: ordinal not in range(128)

======================================================================
ERROR: test_weighted_layers_top_only (test_elmo.TestWeightedLayers)

Traceback (most recent call last):
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 124, in test_weighted_layers_top_only
self._check_weighted_layer(None, do_layer_norm=False, use_top_only=True)
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 28, in _check_weighted_layer
batcher = Batcher(vocab_file, 50)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 204, in init
lm_vocab_file, max_token_length
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 117, in init
super(UnicodeCharsVocabulary, self).init(filename, **kwargs)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 29, in init
for line in f:
File "/research/d2/hrwang/pythonlib/anaconda3/envs/tensorflow-gpu/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 820: ordinal not in range(128)

======================================================================
ERROR: test_bilm (test_model.TestBidirectionalLanguageModel)

Traceback (most recent call last):
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_model.py", line 56, in test_bilm
batcher = Batcher(vocab_file, 50)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 204, in init
lm_vocab_file, max_token_length
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 117, in init
super(UnicodeCharsVocabulary, self).init(filename, **kwargs)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 29, in init
for line in f:
File "/research/d2/hrwang/pythonlib/anaconda3/envs/tensorflow-gpu/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 820: ordinal not in range(128)

======================================================================
ERROR: test_bilm_token (test_model.TestBidirectionalLanguageModelTokenInput)

Traceback (most recent call last):
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_model.py", line 161, in test_bilm_token
fout.write('\n'.join(all_tokens))
UnicodeEncodeError: 'ascii' codec can't encode character '\u2022' in position 488: ordinal not in range(128)


Ran 24 tests in 200.365s

FAILED (errors=14)

@ruleGreen
Copy link
Author

packages in environment

Name Version Build Channel

_libgcc_mutex 0.1 main
backports-weakref 1.0rc1 pypi_0 pypi
bilm 0.1.post5 pypi_0 pypi
bleach 1.5.0 pypi_0 pypi
ca-certificates 2019.10.16 0
certifi 2018.8.24 py35_1
h5py 2.10.0 pypi_0 pypi
html5lib 0.9999999 pypi_0 pypi
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
markdown 2.2.0 pypi_0 pypi
ncurses 6.1 he6710b0_1
numpy 1.17.4 pypi_0 pypi
openssl 1.0.2t h7b6447c_1
pip 19.3.1 pypi_0 pypi
protobuf 3.11.0 pypi_0 pypi
python 3.5.6 hc3d631a_0
readline 7.0 h7b6447c_5
setuptools 40.2.0 py35_0
six 1.13.0 pypi_0 pypi
sqlite 3.30.1 h7b6447c_0
tensorflow-gpu 1.2.0 pypi_0 pypi
tk 8.6.8 hbc83047_0
tqdm 4.39.0 py_0
werkzeug 0.16.0 pypi_0 pypi
wheel 0.31.1 py35_0
xz 5.2.4 h14c3975_4
zlib 1.2.11 h7b6447c_3

@tsteffek
Copy link

Might be too late for you, but maybe someone else runs into the same problems.

I had a similar problem with one of the other tests, got it fixed by specifying the encoding for the file.
So in your case adding encoding='utf-8' to open in

with open(filename) as f:
will probably fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants