Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated keys #40

Open
antongulikov opened this issue Oct 19, 2017 · 2 comments
Open

Repeated keys #40

antongulikov opened this issue Oct 19, 2017 · 2 comments

Comments

@antongulikov
Copy link
Contributor

antongulikov commented Oct 19, 2017

import marisa_trie
a = [(u'1', '1'), (u'1', '2')]
tr = marisa_trie.BytesTrie(a)
print tr.keys()

This will output [u'1', u'1'].

I guess, that, this function should returns a [u'1']

I'm ready to fix it, if someone consider, that this is bug.

@superbobry
Copy link
Member

Woah, looks like a bug to me. Interestingly, it is not present in marisa_trie.Trie:

>>> marisa_trie.Trie(["foo", "foo"]).keys()
['foo']
>>> marisa_trie.BytesTrie([("foo", b"1"), ("foo", b"2")]).keys()
['foo', 'foo']

@antongulikov
Copy link
Contributor Author

antongulikov commented Oct 20, 2017

@superbobry
In the same time in tests this situation is expectable: https://github.com/pytries/marisa-trie/blob/0.7.5/tests/test_bytes_trie.py#L106.

By the way I understood why this happens:
For Trie class everything is perfect. For BytesTrie and others inherited from him, in the constructor pairs (key, value) transform to new_key = key + separator + value. Inside the methods keys and iterkeys you iterate over the set of new_keys and then cut old key back.

for i in range(0, ag.key().length()):
  if raw_key[i] == self._c_value_separator:
    key = raw_key[:i].decode('utf8')

And then you add the same key to the res several times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants