You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to the function of frequency calculation :
def _build_frequency_dist(self, phrase_list):
"""Builds frequency distribution of the words in the given body of text.
:param phrase_list: List of List of strings where each sublist is a
collection of words which form a contender phrase.
"""
self.frequency_dist = Counter(chain.from_iterable(phrase_list))
Tracing back to the calculation of phrase_list :
def _generate_phrases(self, sentences):
"""Method to generate contender phrases given the sentences of the text
document.
:param sentences: List of strings where each string represents a
sentence which forms the text.
:return: Set of string tuples where each tuple is a collection
of words forming a contender phrase.
"""
phrase_list = set()
# Create contender phrases from sentences.
for sentence in sentences:
word_list = [word.lower() for word in wordpunct_tokenize(sentence)]
phrase_list.update(self._get_phrase_list_from_words(word_list))
return phrase_list
Clearly, phrase_list is a set, and contains unique keywords. So if keywords repeat in a text, they're ignored, and the value of frequency, as tested by me, comes out faulty.
I have modified the Rake() object to ensure the calculations are correct. @csurfer ,kindly assign me this issue, so I can create a pull request.
The text was updated successfully, but these errors were encountered:
According to the function of frequency calculation :
def _build_frequency_dist(self, phrase_list):
Tracing back to the calculation of phrase_list :
def _generate_phrases(self, sentences):
Clearly, phrase_list is a set, and contains unique keywords. So if keywords repeat in a text, they're ignored, and the value of frequency, as tested by me, comes out faulty.
I have modified the Rake() object to ensure the calculations are correct. @csurfer ,kindly assign me this issue, so I can create a pull request.
The text was updated successfully, but these errors were encountered: