UnicodeError when run example: python -m unittest discover tests/ #225

ruleGreen · 2019-11-29T09:52:24Z

eEncodeError: 'ascii' codec can't encode character '\u0100' in position 6: ordinal not in range(128)

======================================================================
ERROR: test_weighted_layers (test_elmo.TestWeightedLayers)

Traceback (most recent call last):
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 118, in test_weighted_layers
self._check_weighted_layer(1.0, do_layer_norm=True, use_top_only=False)
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 28, in _check_weighted_layer
batcher = Batcher(vocab_file, 50)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 204, in init
lm_vocab_file, max_token_length
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 117, in init
super(UnicodeCharsVocabulary, self).init(filename, **kwargs)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 29, in init
for line in f:
File "/research/d2/hrwang/pythonlib/anaconda3/envs/tensorflow-gpu/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 820: ordinal not in range(128)

======================================================================
ERROR: test_weighted_layers_no_norm (test_elmo.TestWeightedLayers)

Traceback (most recent call last):
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 121, in test_weighted_layers_no_norm
self._check_weighted_layer(1.0, do_layer_norm=False, use_top_only=False)
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 28, in _check_weighted_layer
batcher = Batcher(vocab_file, 50)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 204, in init
lm_vocab_file, max_token_length
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 117, in init
super(UnicodeCharsVocabulary, self).init(filename, **kwargs)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 29, in init
for line in f:
File "/research/d2/hrwang/pythonlib/anaconda3/envs/tensorflow-gpu/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 820: ordinal not in range(128)

======================================================================
ERROR: test_weighted_layers_top_only (test_elmo.TestWeightedLayers)

Traceback (most recent call last):
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 124, in test_weighted_layers_top_only
self._check_weighted_layer(None, do_layer_norm=False, use_top_only=True)
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_elmo.py", line 28, in _check_weighted_layer
batcher = Batcher(vocab_file, 50)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 204, in init
lm_vocab_file, max_token_length
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 117, in init
super(UnicodeCharsVocabulary, self).init(filename, **kwargs)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 29, in init
for line in f:
File "/research/d2/hrwang/pythonlib/anaconda3/envs/tensorflow-gpu/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 820: ordinal not in range(128)

======================================================================
ERROR: test_bilm (test_model.TestBidirectionalLanguageModel)

Traceback (most recent call last):
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_model.py", line 56, in test_bilm
batcher = Batcher(vocab_file, 50)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 204, in init
lm_vocab_file, max_token_length
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 117, in init
super(UnicodeCharsVocabulary, self).init(filename, **kwargs)
File "/research/d2/hrwang/biLM/bilm-tf/bilm/data.py", line 29, in init
for line in f:
File "/research/d2/hrwang/pythonlib/anaconda3/envs/tensorflow-gpu/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 820: ordinal not in range(128)

======================================================================
ERROR: test_bilm_token (test_model.TestBidirectionalLanguageModelTokenInput)

Traceback (most recent call last):
File "/research/d2/hrwang/biLM/bilm-tf/tests/test_model.py", line 161, in test_bilm_token
fout.write('\n'.join(all_tokens))
UnicodeEncodeError: 'ascii' codec can't encode character '\u2022' in position 488: ordinal not in range(128)

Ran 24 tests in 200.365s

FAILED (errors=14)

ruleGreen · 2019-11-29T09:53:55Z

packages in environment

tsteffek · 2020-03-27T12:37:25Z

Might be too late for you, but maybe someone else runs into the same problems.

I had a similar problem with one of the other tests, got it fixed by specifying the encoding for the file.
So in your case adding encoding='utf-8' to open in

bilm-tf/bilm/data.py

Line 27 in 7cffee2

with open(filename) as f:

will probably fix it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeError when run example: python -m unittest discover tests/ #225

UnicodeError when run example: python -m unittest discover tests/ #225

ruleGreen commented Nov 29, 2019

ruleGreen commented Nov 29, 2019

tsteffek commented Mar 27, 2020

UnicodeError when run example: python -m unittest discover tests/ #225

UnicodeError when run example: python -m unittest discover tests/ #225

Comments

ruleGreen commented Nov 29, 2019

====================================================================== ERROR: test_weighted_layers (test_elmo.TestWeightedLayers)

====================================================================== ERROR: test_weighted_layers_no_norm (test_elmo.TestWeightedLayers)

====================================================================== ERROR: test_weighted_layers_top_only (test_elmo.TestWeightedLayers)

====================================================================== ERROR: test_bilm (test_model.TestBidirectionalLanguageModel)

====================================================================== ERROR: test_bilm_token (test_model.TestBidirectionalLanguageModelTokenInput)

ruleGreen commented Nov 29, 2019

packages in environment

Name Version Build Channel

tsteffek commented Mar 27, 2020

======================================================================
ERROR: test_weighted_layers (test_elmo.TestWeightedLayers)

======================================================================
ERROR: test_weighted_layers_no_norm (test_elmo.TestWeightedLayers)

======================================================================
ERROR: test_weighted_layers_top_only (test_elmo.TestWeightedLayers)

======================================================================
ERROR: test_bilm (test_model.TestBidirectionalLanguageModel)

======================================================================
ERROR: test_bilm_token (test_model.TestBidirectionalLanguageModelTokenInput)