Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find_concordance() return empty list for left_context #2088

Closed
BLKSerene opened this issue Aug 20, 2018 · 4 comments
Closed

find_concordance() return empty list for left_context #2088

BLKSerene opened this issue Aug 20, 2018 · 4 comments

Comments

@BLKSerene
Copy link
Contributor

BLKSerene commented Aug 20, 2018

if offsets:
            for i in offsets:
                query_word = self._tokens[i]
                # Find the context of query word.
                left_context = self._tokens[i-context:i]

When the first occurrence of the search term is at the beginning of the text (for example at offset 7), suppose the width parameter is set to 20, then [i-context:i] would be evaluated as [-13:7].
In this case, if the text consists more than 20 words, the variable left_context would be an empty list, rather than a list containing the first 7 words of the text.

A simple fix would do:

if offsets:
    for i in offsets:
        query_word = self._tokens[i]
        # Find the context of query word.
        if i - context < 0:
            left_context = self._tokens[:i]
        else:
            left_context = self._tokens[i-context:i]
@alvations
Copy link
Contributor

Could you provide a sample input and desired output so that we can add to the regression test?

@BLKSerene
Copy link
Contributor Author

BLKSerene commented Aug 21, 2018

Input:

jane_eyre = 'Chapter 1\nTHERE was no possibility of taking a walk that day. We had been wandering, indeed, in the leafless shrubbery an hour in the morning; but since dinner (Mrs. Reed, when there was no company, dined early) the cold winter wind had brought with it clouds so sombre, and a rain so penetrating, that further outdoor exercise was now out of the question.'
text = nltk.Text(nltk.word_tokenize(jane_eyre))
text.concordance('taking')
text.concordance_list('taking')[0]

Output (NLTK 3.3):

Displaying 1 of 1 matches:
    taking a walk that day . We had been wander
ConcordanceLine(left=[],
                query='taking',
                right=['a', 'walk', 'that', 'day', '.', 'We', 'had', 'been', 'wandering', ',', 'indeed', ',', 'in', 'the', 'leafless', 'shrubbery', 'an', 'hour'],
                offset=7,
                left_print='',
                right_print='a walk that day . We had been wande',
                line=' taking a walk that day . We had been wande')

Desired Output:

Displaying 1 of 1 matches:
    Chapter 1 THERE was no possibility of taking a walk that day . We had been wander
ConcordanceLine(left=['Chapter', '1', 'THERE', 'was', 'no', 'possibility', 'of'],
                query='taking',
                right=['a', 'walk', 'that', 'day', '.', 'We', 'had', 'been', 'wandering', ',', 'indeed', ',', 'in', 'the', 'leafless', 'shrubbery', 'an', 'hour'],
                offset=7,
                left_print='Chapter 1 THERE was no possibility of',
                right_print='a walk that day . We had been wande',
                line='Chapter 1 THERE was no possibility of taking a walk that day . We had been wande')

@alvations
Copy link
Contributor

alvations commented Aug 23, 2018

Thanks @BLKSerene for reporting the bug!

Ah, there's a cool fix here. Instead of the if-else. We can clip the minimum bound to a max(), e.g.

left_context = self._tokens[max(0, i-context):i]

Adding the doctest to https://github.com/nltk/nltk/blob/develop/nltk/test/concordance.doctest for the continuous integration / regression test would be very helpful =)

Patching https://github.com/nltk/nltk/issues/2088
The left slice of the left context should be clip to 0 if the `i-context` < 0.

>>> from nltk import Text, word_tokenize
>>> jane_eyre = 'Chapter 1\nTHERE was no possibility of taking a walk that day. We had been wandering, indeed, in the leafless shrubbery an hour in the morning; but since dinner (Mrs. Reed, when there was no company, dined early) the cold winter wind had brought with it clouds so sombre, and a rain so penetrating, that further outdoor exercise was now out of the question.'
>>> text = Text(word_tokenize(jane_eyre))
>>> text.concordance_list('taking')[0].left
['Chapter', '1', 'THERE', 'was', 'no', 'possibility', 'of']

@alvations
Copy link
Contributor

Patched in #2103. Thanks @BLKSerene and @dnc1994!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants