A Question About data_processor.populate_word_blacklist #72

Doragd · 2019-11-02T12:07:47Z

def populate_word_blacklist(word_index):
    blacklisted_words = set()
    blacklisted_words |= set(global_config.predefined_word_index.values())
    if global_config.filter_sentiment_words:
        blacklisted_words |= lexicon_helper.get_sentiment_words()
    if global_config.filter_stopwords:
        blacklisted_words |= lexicon_helper.get_stopwords()

The output of global_config.predefined_word_index.values() are indices of some words, not words.
At this point, the actual value of this global_config.predefined_word_index is equal to word_index, not only {'<unk>': 0,'<sos>': 1,'<eos>': 2}.
Therefore, I think that this blacklisted_words contains unnecessary words and does not match the meaning of the blacklist.

The text was updated successfully, but these errors were encountered:

Doragd changed the title ~~An Question About data_processor.populate_word_blacklist~~ A Question About data_processor.populate_word_blacklist Nov 2, 2019

Doragd changed the title ~~A Question About data_processor.populate_word_blacklist~~ A Question About data_processor.populate_word_blacklist Nov 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Question About data_processor.populate_word_blacklist #72

A Question About data_processor.populate_word_blacklist #72

Doragd commented Nov 2, 2019

A Question About data_processor.populate_word_blacklist #72

A Question About data_processor.populate_word_blacklist #72

Comments

Doragd commented Nov 2, 2019