find_concordance() return empty list for left_context #2088

BLKSerene · 2018-08-20T11:12:42Z

if offsets:
            for i in offsets:
                query_word = self._tokens[i]
                # Find the context of query word.
                left_context = self._tokens[i-context:i]

When the first occurrence of the search term is at the beginning of the text (for example at offset 7), suppose the width parameter is set to 20, then [i-context:i] would be evaluated as [-13:7].
In this case, if the text consists more than 20 words, the variable left_context would be an empty list, rather than a list containing the first 7 words of the text.

A simple fix would do:

if offsets:
    for i in offsets:
        query_word = self._tokens[i]
        # Find the context of query word.
        if i - context < 0:
            left_context = self._tokens[:i]
        else:
            left_context = self._tokens[i-context:i]

The text was updated successfully, but these errors were encountered:

alvations · 2018-08-20T14:14:03Z

Could you provide a sample input and desired output so that we can add to the regression test?

BLKSerene · 2018-08-21T04:12:58Z

Input:

jane_eyre = 'Chapter 1\nTHERE was no possibility of taking a walk that day. We had been wandering, indeed, in the leafless shrubbery an hour in the morning; but since dinner (Mrs. Reed, when there was no company, dined early) the cold winter wind had brought with it clouds so sombre, and a rain so penetrating, that further outdoor exercise was now out of the question.'
text = nltk.Text(nltk.word_tokenize(jane_eyre))
text.concordance('taking')
text.concordance_list('taking')[0]

Output (NLTK 3.3):

Displaying 1 of 1 matches:
    taking a walk that day . We had been wander
ConcordanceLine(left=[],
                query='taking',
                right=['a', 'walk', 'that', 'day', '.', 'We', 'had', 'been', 'wandering', ',', 'indeed', ',', 'in', 'the', 'leafless', 'shrubbery', 'an', 'hour'],
                offset=7,
                left_print='',
                right_print='a walk that day . We had been wande',
                line=' taking a walk that day . We had been wande')

Desired Output:

Displaying 1 of 1 matches:
    Chapter 1 THERE was no possibility of taking a walk that day . We had been wander
ConcordanceLine(left=['Chapter', '1', 'THERE', 'was', 'no', 'possibility', 'of'],
                query='taking',
                right=['a', 'walk', 'that', 'day', '.', 'We', 'had', 'been', 'wandering', ',', 'indeed', ',', 'in', 'the', 'leafless', 'shrubbery', 'an', 'hour'],
                offset=7,
                left_print='Chapter 1 THERE was no possibility of',
                right_print='a walk that day . We had been wande',
                line='Chapter 1 THERE was no possibility of taking a walk that day . We had been wande')

alvations · 2018-08-23T07:02:25Z

Thanks @BLKSerene for reporting the bug!

Ah, there's a cool fix here. Instead of the if-else. We can clip the minimum bound to a max(), e.g.

left_context = self._tokens[max(0, i-context):i]

Adding the doctest to https://github.com/nltk/nltk/blob/develop/nltk/test/concordance.doctest for the continuous integration / regression test would be very helpful =)

Patching https://github.com/nltk/nltk/issues/2088
The left slice of the left context should be clip to 0 if the `i-context` < 0.

>>> from nltk import Text, word_tokenize
>>> jane_eyre = 'Chapter 1\nTHERE was no possibility of taking a walk that day. We had been wandering, indeed, in the leafless shrubbery an hour in the morning; but since dinner (Mrs. Reed, when there was no company, dined early) the cold winter wind had brought with it clouds so sombre, and a rain so penetrating, that further outdoor exercise was now out of the question.'
>>> text = Text(word_tokenize(jane_eyre))
>>> text.concordance_list('taking')[0].left
['Chapter', '1', 'THERE', 'was', 'no', 'possibility', 'of']

alvations · 2018-09-19T00:06:41Z

Patched in #2103. Thanks @BLKSerene and @dnc1994!

alvations added good first issue corpus bug labels Aug 23, 2018

dnc1994 mentioned this issue Sep 1, 2018

Fix find_concordance() returning empty list for left_context #2103

Merged

alvations closed this as completed Sep 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

find_concordance() return empty list for left_context #2088

find_concordance() return empty list for left_context #2088

BLKSerene commented Aug 20, 2018 •

edited

alvations commented Aug 20, 2018

BLKSerene commented Aug 21, 2018 •

edited

alvations commented Aug 23, 2018 •

edited

alvations commented Sep 19, 2018

find_concordance() return empty list for left_context #2088

find_concordance() return empty list for left_context #2088

Comments

BLKSerene commented Aug 20, 2018 • edited

alvations commented Aug 20, 2018

BLKSerene commented Aug 21, 2018 • edited

alvations commented Aug 23, 2018 • edited

alvations commented Sep 19, 2018

BLKSerene commented Aug 20, 2018 •

edited

BLKSerene commented Aug 21, 2018 •

edited

alvations commented Aug 23, 2018 •

edited