Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU and memory usage with long misspelled words #56

Open
jhuckaby opened this issue Feb 12, 2017 · 7 comments
Open

High CPU and memory usage with long misspelled words #56

jhuckaby opened this issue Feb 12, 2017 · 7 comments

Comments

@jhuckaby
Copy link

Hello, and thank you for developing this module. We need more pure JavaScript solutions like this!

I noticed that when trying to lookup suggestions for long words, the library seems to chug resources and lag pretty hard. The word djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf takes over 7 seconds to process using typo-js, and the Node process ends up eating over 800 MB of RAM.

Example code:

var Typo = require("typo-js");
var dictionary = new Typo( "en_US" );

var time_start = (new Date()).getTime();
var word = 'djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf';

var correct = dictionary.check(word);

if (!correct) {
	console.log( word + " is NOT spelled correctly." );
	var suggestions = dictionary.suggest(word);
	console.log( "Suggestions: " + JSON.stringify(suggestions) );
}
else {
	console.log( word + " is spelled correctly." );
}

var elapsed = Math.floor( (new Date()).getTime() - time_start );
var mem = process.memoryUsage();

console.log( elapsed + "ms elapsed, " + mem.rss + " bytes used" );

Output:

djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf is NOT spelled correctly.
Suggestions: []
7743ms elapsed, 857202688 bytes used

This is on a late 2016 MacBook Pro (2.9 GHz Intel Core i7) with OS X 10.12.3 and Node.js v6.9.1.

@cfinke
Copy link
Owner

cfinke commented Feb 12, 2017

Thanks for the report; I've confirmed the same buggy behavior on my own machine.

Not only does it take longer to retrieve suggestions for longer words, it takes exponentially longer based on the number of letters in the misspelled word:

5 letters: 22 ms per letter
6 letters: 17 ms per letter
7 letters: 19 ms per letter
8 letters: 23 ms per letter
9 letters: 24 ms per letter
10 letters: 27 ms per letter
11 letters: 30 ms per letter
12 letters: 49 ms per letter
13 letters: 49 ms per letter
14 letters: 51 ms per letter
15 letters: 63 ms per letter
16 letters: 59 ms per letter
17 letters: 62 ms per letter
18 letters: 76 ms per letter
19 letters: 72 ms per letter
20 letters: 75 ms per letter
21 letters: 86 ms per letter
22 letters: 110 ms per letter
23 letters: 179 ms per letter
24 letters: 201 ms per letter
25 letters: 210 ms per letter
26 letters: 196 ms per letter
27 letters: 114 ms per letter
28 letters: 149 ms per letter
29 letters: 243 ms per letter
30 letters: 286 ms per letter
31 letters: 218 ms per letter
32 letters: 183 ms per letter
33 letters: 268 ms per letter
34 letters: 324 ms per letter
35 letters: 333 ms per letter
36 letters: 239 ms per letter
37 letters: 352 ms per letter
38 letters: 354 ms per letter
39 letters: 366 ms per letter

@cfinke
Copy link
Owner

cfinke commented Feb 13, 2017

bf580be helps by not saving edit-2-distance possible suggestions in memory unless they're actual dictionary words.

@jhuckaby
Copy link
Author

Still takes about 7 seconds to lookup suggestions for djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf on my machine, but you have definitely reduced the memory footprint. It's down to 200MB. Nice job!

djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf is NOT spelled correctly.
Suggestions: []
6867ms elapsed, 211562496 bytes used

I wonder if there is any way to do a faster rejection of long word suggestions, because there are so few of them in the dictionary. Like index them all by character length, so you only have to compare a word against other words of similar length.

@cfinke
Copy link
Owner

cfinke commented Feb 13, 2017

4aa8162 improves memory usage further, reducing the usage for the djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf by about 15% on my machine. Lookup time is still in the 6-7 second range for me.

@jhuckaby
Copy link
Author

Yup, down to about 166 MB RAM now, but still takes 7 seconds. Good progress tho!

djfhjfhdjfhskdfhskhdfksjdfhksdjfhksdf is NOT spelled correctly.
Suggestions: []
7065ms elapsed, 166289408 bytes used

blurymind added a commit to blurymind/ace_spell_check_js that referenced this issue Dec 12, 2018
…uggestions and is way more maintained

Typo js is very very slow at suggestions for long misspelled words! It's also not as popular atm.
See cfinke/Typo.js#56
@blurymind
Copy link

blurymind commented Dec 12, 2018

sorry guys, had to move to nspell, as this bug was causing noticeable lag here:
InfiniteAmmoInc/Yarn#80

nspell doesn't have the delay when suggesting difficult words. Will consider going back to typo.js if this is fixed 👍

My suggestion here is to have a look at how nspell does it and copy the approach

The issue for me is not the cpu/memory, as much as the HUGE delay (over 7 seconds looks like a bug to the user)

@FeepingCreature
Copy link

You guys might want to grab my changes from over at hunspell-spellchecker.

Lookup for alternatives is a lot faster if you pre-build a dictionary tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants