New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find_concordance() return empty list for left_context #2088
Comments
Could you provide a sample input and desired output so that we can add to the regression test? |
Input:
Output (NLTK 3.3):
Desired Output:
|
Thanks @BLKSerene for reporting the bug! Ah, there's a cool fix here. Instead of the if-else. We can clip the minimum bound to a left_context = self._tokens[max(0, i-context):i] Adding the doctest to https://github.com/nltk/nltk/blob/develop/nltk/test/concordance.doctest for the continuous integration / regression test would be very helpful =) Patching https://github.com/nltk/nltk/issues/2088
The left slice of the left context should be clip to 0 if the `i-context` < 0.
>>> from nltk import Text, word_tokenize
>>> jane_eyre = 'Chapter 1\nTHERE was no possibility of taking a walk that day. We had been wandering, indeed, in the leafless shrubbery an hour in the morning; but since dinner (Mrs. Reed, when there was no company, dined early) the cold winter wind had brought with it clouds so sombre, and a rain so penetrating, that further outdoor exercise was now out of the question.'
>>> text = Text(word_tokenize(jane_eyre))
>>> text.concordance_list('taking')[0].left
['Chapter', '1', 'THERE', 'was', 'no', 'possibility', 'of'] |
Patched in #2103. Thanks @BLKSerene and @dnc1994! |
When the first occurrence of the search term is at the beginning of the text (for example at offset 7), suppose the width parameter is set to 20, then [i-context:i] would be evaluated as [-13:7].
In this case, if the text consists more than 20 words, the variable left_context would be an empty list, rather than a list containing the first 7 words of the text.
A simple fix would do:
The text was updated successfully, but these errors were encountered: