50% speed increase for suggest call #41

Thomas101 · 2016-07-02T11:54:57Z

I've done some performance tweaking for the suggest call. No changes to api, no changes to results. Algorithm remains the same. This removes some redundancy in processing.

Now uses a stripped down version of es6 sets (where available) or a polyfill when required. This reduces the amount of suggestions generate by edits1 by about a half just by removing duplicates
When checking the suggested results it now uses self.checkExact rather than self.check as we will only be looking for exact matches here and self.check just introduces extra processing.

Here's some stats from some test words...

takr
es6: 52ms
polyfil: 54ms
original: 115ms
55% faster

hoouse
es6: 134ms
polyfil: 133ms
original: 200ms
33% faster

suggestionsr
es6: 590ms
polyfil: 598ms
original: 1201ms
51% faster

dsdsfdsfdsfdfsdfsdfsdf
es6: 3105ms
polyfil: 3116ms
original: 5676ms
46% faster

…ed suggest to use self.checkExact rather than self.check as we would only be looking for exact matches here

kofifus · 2016-09-04T22:26:04Z

good work :)

one fix you can do is that in https://github.com/Thomas101/Typo.js/blob/bb9862bd6d0c81fa0ebdc848bdd0ebb7b222f644/typo/typo.js#L932

it's enought to do index[v]=true; which will save some memory.

But in general the problem with your solution is memory usage. doing ed1.forEach(function(word) { ed2.add(word) }); creates a huge array (easily over 100,000 words), your WordList then adds another array (index) of the same size and another hash (words) of the same size, and Set probably does the same internally. All this on top of the dictionaries that typo loads to memory.

In #45 I optimized the memory use of suggest to ~ 2* edit1 that is ~1000 words. This can be further optimized to 1 * edit1 - when you think about it we can run 'known' inside the iteration that builds ed2 without storing the ed2 values as we never need them again. I found I got huge speed gains as well - I think not holding these huge structures results in less paging and everything going much faster.

But that kind of solution and async implementation still leaves the issue of duplicate entries. I wonder if a more clever edit1 algorithm can not produce duplicate in the first place. Or something like https://github.com/itslenny/SymSpell.js/blob/master/dist/SymSpell.js

Cheers

Thomas101 · 2016-09-05T14:00:23Z

I agree on the fix, I don't know what came over me there for adding the value in twice! This patch was the lowest hanging fruit I could find with a pretty big bang for buck in terms of speed improvements. There's definately further optimisation that can be done on the suggestion algorithm as a whole. Both in terms of speed and memory.

cfinke · 2016-09-20T00:41:30Z

Thanks for the contribution @Thomas101 and @kofifus, and sorry for the delay in merging.

cfinke · 2016-09-20T00:52:21Z

I had to revert this -- the quality of the suggest() results dropped noticeably. @Thomas101 Was there an intended change to functionality, or was it supposed to be performance changes only?

Thomas101 · 2016-09-20T16:52:02Z

The results should have been the same. Do you have any sample suggestion sets? Were you using node or the browser?

cfinke · 2016-09-25T00:11:07Z

I was running the tests in the browser via the Chrome extension.

An example: suggest( 'speling' ) returned ["seeking", "spieling", "pelting"], which does not seem as helpful as the old [ "spelling", "spieling", "spewing" ]. All of the old suggestions were one modification away from the argument to suggest(), but two of the new suggestions are two modifications away.

cfinke · 2017-02-13T05:23:28Z

I realized today why the suggestions changed -- Typo was using the fact that a word added to the edits1() return value multiple times in order to weight that word as a more useful spelling suggestion. The WordList class can't take advantage of this because it uses a hash to store those values.

Updated suggest to use es6 sets or a polyfill if not available. Chang…

bb9862b

…ed suggest to use self.checkExact rather than self.check as we would only be looking for exact matches here

cfinke merged commit 985c0db into cfinke:master Sep 20, 2016

cfinke mentioned this pull request Sep 20, 2016

Revert "50% speed increase for suggest call" #51

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

50% speed increase for suggest call #41

50% speed increase for suggest call #41

Thomas101 commented Jul 2, 2016

kofifus commented Sep 4, 2016 •

edited

Thomas101 commented Sep 5, 2016

cfinke commented Sep 20, 2016

cfinke commented Sep 20, 2016

Thomas101 commented Sep 20, 2016

cfinke commented Sep 25, 2016

cfinke commented Feb 13, 2017

50% speed increase for suggest call #41

50% speed increase for suggest call #41

Conversation

Thomas101 commented Jul 2, 2016

kofifus commented Sep 4, 2016 • edited

Thomas101 commented Sep 5, 2016

cfinke commented Sep 20, 2016

cfinke commented Sep 20, 2016

Thomas101 commented Sep 20, 2016

cfinke commented Sep 25, 2016

cfinke commented Feb 13, 2017

kofifus commented Sep 4, 2016 •

edited