StopIteration error when using word_segmentation #79

avacaondata · 2020-04-28T15:54:31Z

Hi, I'm trying to use symspellpy for correcting some spanish texts. I've loaded a dictionary of spanish words and their absolute frequency, and it seems to be correctly loaded. However, when I try to use the word_segmentation, the following error appears, no matter the text I introduce in it:

StopIteration Traceback (most recent call last)
in
----> 1 result = symspell.word_segmentation('holaadiós')

~/miniconda/envs/bertology/lib/python3.7/site-packages/symspellpy/symspellpy.py in word_segmentation(self, phrase, max_edit_distance, max_segmentation_word_length, ignore_token)
1001 compositions[idx].distance_sum + separator_len + top_ed,
1002 compositions[idx].log_prob_sum + top_log_prob)
-> 1003 idx = next(circular_index)
1004 return compositions[idx]
1005

StopIteration:

avacaondata · 2020-04-28T15:55:17Z

For making it easier, I put the full code here:

symspell.load_dictionary('CREA_total.TXT', term_index=0, count_index=1, separator='\t', encoding='latin-1') result = symspell.word_segmentation('holaadiós')

rebouvet · 2020-07-15T08:54:23Z

Hi,
I have the exact same issue with another dictionary.
Have you found any fix?
Thanks

mammothb · 2020-07-26T22:51:00Z

@rebouvet Hi, can you upload a sample of the dictionary which causes the error so I can try and debug?

lucaslrolim · 2020-09-15T21:36:51Z

@mammothb Same problem here using this dictionary: https://raw.githubusercontent.com/hermitdave/FrequencyWords/master/content/2018/pt_br/pt_br_full.txt

vection · 2020-11-10T08:52:33Z

Anyone managed to solve it? I also get StopIteration error for loading french dictionary and using word_segmentation. I used this one. link
sym_spell = SymSpell(max_dictionary_edit_distance=2, count_threshold=10, prefix_length=7) dictionary_path = pkg_resources.resource_filename( "symspellpy", "fr-100k.txt") sym_spell.load_dictionary(dictionary_path) sym_spell.word_segmentation('mama mia')

Error:

/symspellpy.py in word_segmentation(self, phrase, max_edit_distance, max_segmentation_word_length, ignore_token)
1091 top_ed),
1092 compositions[idx].log_prob_sum + top_log_prob)
-> 1093 idx = next(circular_index)
1094 return compositions[idx]
1095
StopIteration:

mammothb · 2020-11-21T02:24:20Z

@lucaslrolim i was able to run word_segmentation without a StopIteration error with the following code

import os.path

from symspellpy.symspellpy import SymSpell

# Set max_dictionary_edit_distance to avoid spelling correction
sym_spell = SymSpell(max_dictionary_edit_distance=0, prefix_length=7)
dictionary_path = os.path.join(
    os.path.dirname(os.path.realpath(__file__)), "symspellpy", "pt_br_full.txt"
)

# term_index is the column of the term and count_index is the
# column of the term frequency
sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1, encoding="utf8")

# a sentence without any spaces
input_term = "thequickbrownfoxjumpsoverthelazydog"
result = sym_spell.word_segmentation(input_term)
print("{}, {}, {}".format(result.corrected_string, result.distance_sum,
                          result.log_prob_sum))

and the output is

the quick brown fox jumps overthe lazy dog, 7, -73.85138966727551

Initially, I ran into the StopIteration error when I used the wrong path for the dictionary. Perhaps you'd like to check if you're using the correct path for the dictionary file. load_dictionary will return False if the dictionary file could not be found.

mammothb · 2020-11-21T02:29:32Z

@vection I see you're swapping out the "frequency_dictionary_en_82_765.txt" from the sample code with your own dictionary. However, pkg_resources only find the dictionaries that's shipped with the symspellpy packages. As "fr-100k.txt" is not included in the symspellpy package, it will return an invalid path. You can construction your own path to your dictionary and pass that to load_dictionary.

For example,

dictionary_path = "/full/path/to/fr-100k.txt"

sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1)

should work.

mammothb · 2020-11-21T02:32:23Z

@alexvaca0 @rebouvet May I know if you have a similar problem with the dictionary path not pointing to the right location? Similar to what I have described in #79 (comment)

mammothb added the question Further information is requested label Nov 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StopIteration error when using word_segmentation #79

StopIteration error when using word_segmentation #79

avacaondata commented Apr 28, 2020

avacaondata commented Apr 28, 2020

rebouvet commented Jul 15, 2020

mammothb commented Jul 26, 2020

lucaslrolim commented Sep 15, 2020

vection commented Nov 10, 2020 •

edited

mammothb commented Nov 21, 2020

mammothb commented Nov 21, 2020

mammothb commented Nov 21, 2020

StopIteration error when using word_segmentation #79

StopIteration error when using word_segmentation #79

Comments

avacaondata commented Apr 28, 2020

avacaondata commented Apr 28, 2020

rebouvet commented Jul 15, 2020

mammothb commented Jul 26, 2020

lucaslrolim commented Sep 15, 2020

vection commented Nov 10, 2020 • edited

mammothb commented Nov 21, 2020

mammothb commented Nov 21, 2020

mammothb commented Nov 21, 2020

vection commented Nov 10, 2020 •

edited