-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
50% speed increase for suggest call #41
Conversation
…ed suggest to use self.checkExact rather than self.check as we would only be looking for exact matches here
good work :) one fix you can do is that in https://github.com/Thomas101/Typo.js/blob/bb9862bd6d0c81fa0ebdc848bdd0ebb7b222f644/typo/typo.js#L932 it's enought to do index[v]=true; which will save some memory. But in general the problem with your solution is memory usage. doing ed1.forEach(function(word) { ed2.add(word) }); creates a huge array (easily over 100,000 words), your WordList then adds another array (index) of the same size and another hash (words) of the same size, and Set probably does the same internally. All this on top of the dictionaries that typo loads to memory. In #45 I optimized the memory use of suggest to ~ 2* edit1 that is ~1000 words. This can be further optimized to 1 * edit1 - when you think about it we can run 'known' inside the iteration that builds ed2 without storing the ed2 values as we never need them again. I found I got huge speed gains as well - I think not holding these huge structures results in less paging and everything going much faster. But that kind of solution and async implementation still leaves the issue of duplicate entries. I wonder if a more clever edit1 algorithm can not produce duplicate in the first place. Or something like https://github.com/itslenny/SymSpell.js/blob/master/dist/SymSpell.js Cheers |
I agree on the fix, I don't know what came over me there for adding the value in twice! This patch was the lowest hanging fruit I could find with a pretty big bang for buck in terms of speed improvements. There's definately further optimisation that can be done on the suggestion algorithm as a whole. Both in terms of speed and memory. |
Thanks for the contribution @Thomas101 and @kofifus, and sorry for the delay in merging. |
I had to revert this -- the quality of the |
The results should have been the same. Do you have any sample suggestion sets? Were you using node or the browser? |
I was running the tests in the browser via the Chrome extension. An example: |
I realized today why the suggestions changed -- Typo was using the fact that a word added to the |
I've done some performance tweaking for the suggest call. No changes to api, no changes to results. Algorithm remains the same. This removes some redundancy in processing.
edits1
by about a half just by removing duplicatesself.checkExact
rather thanself.check
as we will only be looking for exact matches here andself.check
just introduces extra processing.Here's some stats from some test words...
takr
es6: 52ms
polyfil: 54ms
original: 115ms
55% faster
hoouse
es6: 134ms
polyfil: 133ms
original: 200ms
33% faster
suggestionsr
es6: 590ms
polyfil: 598ms
original: 1201ms
51% faster
dsdsfdsfdsfdfsdfsdfsdf
es6: 3105ms
polyfil: 3116ms
original: 5676ms
46% faster