Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge luikore/hat-trie? #20

Open
canadaduane opened this issue Sep 15, 2014 · 7 comments
Open

Merge luikore/hat-trie? #20

canadaduane opened this issue Sep 15, 2014 · 7 comments

Comments

@canadaduane
Copy link
Contributor

I'm really happy to have found hat-trie and want to give back. However, my use case is one that uses luikore's ruby gem, backed by a modified hat-trie. Rather than give back to just one repository, I wonder if it would be possible to merge his changes back and then discuss possible improvements in documentation and/or readability in code? Related: luikore/triez#4

@kmike
Copy link
Contributor

kmike commented Sep 15, 2014

luikore's fork has some nice features, but when I last checked it I didn't like the API: instead of being based on callbacks it should have been based on iterator objects (which hold a state of the iteration).

Callback-based C API is slower, and it limits what can be done in a wrapper. With a callback-based interface you can't implement iterkeys() / iteritems() / itervalues() methods and step-by-step walking guided by Python code (or at least it is far from being straightforward). There was a similar API limitation in datrie, and as I recall after switching to iterator objects the relevant library parts became faster (~10% when used from C, more when using with Python), and it became possible to implement features like .iterkey(prefix) efficiently.

@canadaduane
Copy link
Contributor Author

It looks to me like his hattrie_walk is implemented with a callback, while hattrie_iter_with_prefix is done with an iterator. I'm not a C expert, however, so I'd defer to you if you told me different :) If I have that right, perhaps you could include hattrie_iter_with_prefix? And then we could work on an iterator-based hattrie_walk?

@kmike
Copy link
Contributor

kmike commented Sep 15, 2014

I'm not a C expert either, and I'm not an author of this repo - let's wait for @dcjones :)

@kmike
Copy link
Contributor

kmike commented Sep 15, 2014

See also: #10

@canadaduane
Copy link
Contributor Author

Ah! I see, you implemented https://github.com/kmike/hat-trie ? Very cool. It seems like there would be a lot of benefit in making sure dcjones' repository works for all derivatives then (Python, Ruby, etc.). Thanks!

@dcjones
Copy link
Owner

dcjones commented Sep 16, 2014

I'll have a go at merging @luikore's changes. At a glance, it looks like it just adds features without breaking backwards compatibility, so it should be possible to have one version that works for everyone.

The use of callbacks does make it harder to write language bindings, but it seems limited to hattrie_walk function. Since @luikore also added a hattrie_iter_with_prefix function, maybe it isn't necessary to use that function for bindings anyway, since you could implement .iterkeys(prefix) without it.

@luikore Would you be open to this? I think it would be great if we pooled our efforts, unless there is a compelling reason to maintain two version. I'd be happy to give you commit access to this repo as well.

@luikore
Copy link
Contributor

luikore commented Sep 17, 2014

@dcjones Thanks! I'd like to make a PR with iter_with_prefix first. (or name it iter_under_prefix ?)

Though, the walk function does a different thing from iter_with_prefix. For example, if we walk into a trie {"foobar" => 1} with "fooxip", it iterates over "f", "fo", "foo" and then stops, then we know "foobar" and "fooxip" have a longest common prefix "foo". With this and a postfix-trie, we can implement a longest-common-substring algorithm 😃 It is badly named. I wish some one has better idea on the name...

On the callback interface: it is very simple to implement, and easier to lock, and not always slower (function call is very fast on some modern CPUs, and heap iterator may make a page fault ...), and (mainly) I was too lazy to extract the loop context into a new iterator. But I like the unified API too 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants