Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StopIteration error when using word_segmentation #79

Open
avacaondata opened this issue Apr 28, 2020 · 8 comments
Open

StopIteration error when using word_segmentation #79

avacaondata opened this issue Apr 28, 2020 · 8 comments
Labels
question Further information is requested

Comments

@avacaondata
Copy link

Hi, I'm trying to use symspellpy for correcting some spanish texts. I've loaded a dictionary of spanish words and their absolute frequency, and it seems to be correctly loaded. However, when I try to use the word_segmentation, the following error appears, no matter the text I introduce in it:


StopIteration Traceback (most recent call last)
in
----> 1 result = symspell.word_segmentation('holaadiós')

~/miniconda/envs/bertology/lib/python3.7/site-packages/symspellpy/symspellpy.py in word_segmentation(self, phrase, max_edit_distance, max_segmentation_word_length, ignore_token)
1001 compositions[idx].distance_sum + separator_len + top_ed,
1002 compositions[idx].log_prob_sum + top_log_prob)
-> 1003 idx = next(circular_index)
1004 return compositions[idx]
1005

StopIteration:

@avacaondata
Copy link
Author

For making it easier, I put the full code here:

symspell.load_dictionary('CREA_total.TXT', term_index=0, count_index=1, separator='\t', encoding='latin-1') result = symspell.word_segmentation('holaadiós')

@rebouvet
Copy link

Hi,
I have the exact same issue with another dictionary.
Have you found any fix?
Thanks

@mammothb
Copy link
Owner

@rebouvet Hi, can you upload a sample of the dictionary which causes the error so I can try and debug?

@lucaslrolim
Copy link

@vection
Copy link

vection commented Nov 10, 2020

Anyone managed to solve it? I also get StopIteration error for loading french dictionary and using word_segmentation. I used this one. link
sym_spell = SymSpell(max_dictionary_edit_distance=2, count_threshold=10, prefix_length=7) dictionary_path = pkg_resources.resource_filename( "symspellpy", "fr-100k.txt") sym_spell.load_dictionary(dictionary_path) sym_spell.word_segmentation('mama mia')

Error:

/symspellpy.py in word_segmentation(self, phrase, max_edit_distance, max_segmentation_word_length, ignore_token)
1091 top_ed),
1092 compositions[idx].log_prob_sum + top_log_prob)
-> 1093 idx = next(circular_index)
1094 return compositions[idx]
1095
StopIteration:

@mammothb
Copy link
Owner

@lucaslrolim i was able to run word_segmentation without a StopIteration error with the following code

import os.path

from symspellpy.symspellpy import SymSpell

# Set max_dictionary_edit_distance to avoid spelling correction
sym_spell = SymSpell(max_dictionary_edit_distance=0, prefix_length=7)
dictionary_path = os.path.join(
    os.path.dirname(os.path.realpath(__file__)), "symspellpy", "pt_br_full.txt"
)

# term_index is the column of the term and count_index is the
# column of the term frequency
sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1, encoding="utf8")

# a sentence without any spaces
input_term = "thequickbrownfoxjumpsoverthelazydog"
result = sym_spell.word_segmentation(input_term)
print("{}, {}, {}".format(result.corrected_string, result.distance_sum,
                          result.log_prob_sum))

and the output is

the quick brown fox jumps overthe lazy dog, 7, -73.85138966727551

Initially, I ran into the StopIteration error when I used the wrong path for the dictionary. Perhaps you'd like to check if you're using the correct path for the dictionary file. load_dictionary will return False if the dictionary file could not be found.

@mammothb
Copy link
Owner

@vection I see you're swapping out the "frequency_dictionary_en_82_765.txt" from the sample code with your own dictionary. However, pkg_resources only find the dictionaries that's shipped with the symspellpy packages. As "fr-100k.txt" is not included in the symspellpy package, it will return an invalid path. You can construction your own path to your dictionary and pass that to load_dictionary.

For example,

dictionary_path = "/full/path/to/fr-100k.txt"

sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1)

should work.

@mammothb
Copy link
Owner

@alexvaca0 @rebouvet May I know if you have a similar problem with the dictionary path not pointing to the right location? Similar to what I have described in #79 (comment)

@mammothb mammothb added the question Further information is requested label Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants