Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting UnicodeDecodeError accessing trie read from file #18

Open
jottos opened this issue Jan 26, 2015 · 4 comments
Open

Getting UnicodeDecodeError accessing trie read from file #18

jottos opened this issue Jan 26, 2015 · 4 comments

Comments

@jottos
Copy link

jottos commented Jan 26, 2015

Hi, I'm consistently getting the following error when trying to access a trie from a load or read from a file.

./read_trie_test.py
Traceback (most recent call last):
  File "./read_trie_test.py", line 18, in <module>
    print(t.restore_key(0))
  File "marisa_trie.pyx", line 324, in marisa_trie.Trie.restore_key (src/marisa_trie.cpp:6365)
  File "marisa_trie.pyx", line 334, in marisa_trie.Trie.restore_key (src/marisa_trie.cpp:6299)
  File "marisa_trie.pyx", line 62, in marisa_trie._get_key (src/marisa_trie.cpp:1615)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 10: invalid start byte

I get the same error if the following code is used...

  for k in t.keys():
      print(k)

and again the same error if I use:

  t['someKey']  # or t[u'somekey']

The trie file reads in w/o any error and i've written the file using both trie.save() and trie.write()
and in writing file I've used a codec.open() and codec.write() to force utf-8 encoding

I'm not sure if this is similar issue #10

@jottos
Copy link
Author

jottos commented Jan 26, 2015

ok, never mind. I was taking the examples a little to litterally

so i was loading a BytesTrie() into a constructed Trie() - once I switched to a constructed BytesTrie() it worked fine

@kmike
Copy link
Member

kmike commented Jan 26, 2015

I'm glad it is not a bug in the marisa-trie source code :)
Do you have any suggestions about how to change the docs to make them more clear regarding this?

@jottos
Copy link
Author

jottos commented Jan 29, 2015

So am I :)

so as for the documentation, at the end of the load/save section, I'd just call out, that the Trie() constructor will not load a RecordTrie or a BytesTrie even though it will not fail. You need to construct the Trie class that you are trying to load.

Alternatively, the load() methods could throw an exception if a trie file of the wrong type is presented.

@rspeer
Copy link

rspeer commented Apr 10, 2017

Part of the problem here is that the BytesTrie class should offer a static method for loading. The thought process that I think both jottos and I encountered was:

  • Okay, my trie is saved, now I want to load it
  • Huh, that's weird, the load method requires you to already have a trie
  • I guess I'll create an empty trie first, how do I do that? Oh right, marisa_trie.Trie().

If you could call BytesTrie.load('trie.marisa') as a static method, it would be easier to not go astray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants