Consistent pos argument between wn.synsets() and WordNetLemmatizer.lemmatize() #1978

alvations · 2018-03-14T02:31:21Z

Currently, there's some inconsistency of how POS is treated in wn.synsets() and WordNetLemmatizer.lemmatize(), e.g.

>>> from nltk.corpus import wordnet as wn
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()

# Accepts None and let pos be underspecified.
>>> wn.synsets('running', pos=None)
[Synset('run.n.05'), Synset('run.n.07'), Synset('running.n.03'), Synset('running.n.04'), Synset('track.n.11'), Synset('run.v.01'), Synset('scat.v.01'), Synset('run.v.03'), Synset('operate.v.01'), Synset('run.v.05'), Synset('run.v.06'), Synset('function.v.01'), Synset('range.v.01'), Synset('campaign.v.01'), Synset('play.v.18'), Synset('run.v.11'), Synset('tend.v.01'), Synset('run.v.13'), Synset('run.v.14'), Synset('run.v.15'), Synset('run.v.16'), Synset('prevail.v.03'), Synset('run.v.18'), Synset('run.v.19'), Synset('carry.v.15'), Synset('run.v.21'), Synset('guide.v.05'), Synset('run.v.23'), Synset('run.v.24'), Synset('run.v.25'), Synset('run.v.26'), Synset('run.v.27'), Synset('run.v.28'), Synset('run.v.29'), Synset('run.v.30'), Synset('run.v.31'), Synset('run.v.32'), Synset('run.v.33'), Synset('run.v.34'), Synset('ply.v.03'), Synset('hunt.v.01'), Synset('race.v.02'), Synset('move.v.13'), Synset('melt.v.01'), Synset('ladder.v.01'), Synset('run.v.41'), Synset('running.a.01'), Synset('running.s.02'), Synset('running.a.03'), Synset('running.a.04'), Synset('linear.s.05'), Synset('running.s.06')]


# Doesn't accept None and raise a KeyError
>>> wnl.lemmatize('running', pos=None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/nltk/stem/wordnet.py", line 40, in lemmatize
    lemmas = wordnet._morphy(word, pos)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1774, in _morphy
    exceptions = self._exception_map[pos]
KeyError: None

I'm not sure how to allow None to WordNetLemmatizer.lemmatize() though.

What should be the expected behavior of pos=None, default to pos='n'? If so, then we can make changes at https://github.com/nltk/nltk/blob/develop/nltk/stem/wordnet.py#L39:

from nltk.corpus.reader.wordnet import NOUN
...

    def lemmatize(self, word, pos=None):
        pos = NOUN if pos == None else pos
        lemmas = wordnet._morphy(word, pos)
        return min(lemmas, key=len) if lemmas else word

The text was updated successfully, but these errors were encountered:

53X · 2018-06-02T15:55:17Z

@alvations , I think we should the map the expected behavior for None same as that of pos= 'n' . This is because if we don't pass the pos argument to the WordNetLemmatizer , the pos value of noun is automatically assumed

ekaf · 2024-01-12T09:00:00Z

@alvations and @53X, a more consistent interpretation of pos=None could be nice, but in that case, the default should not be "n", but rather "Any pos".

Please consider the morphy() wrapper in corpus/reader/wordnet.py: it uses itertools.chain to collect the lemmas from all the possible pos'es, and that is the behaviour users would normally expect when no particular pos is specified. On the contrary, a user who wants only nouns would specify pos="n".

ekaf · 2024-01-12T10:07:26Z

Ideally, to get a consistent behaviour across the Wordnet Morphy-related wrappers, "WordNetLemmatizer.lemmatizer()" could just be an alias for the morphy() wrapper from wordnet.py.

Actually, I find that the name "WordNetLemmatizer" is not adequate, since this wrapper eventually undoes the WordNet filtering done by _morphy(), and ends up just accepting any garbage input. So although "WordNetLemmatizer" uses _morphy(), it is unfortunate if it is perceived as a canonical wrapper for it.

ekaf · 2024-01-14T09:23:51Z

PR #3225 proposes to add two standard "morphy" modes to the WordNetLemmatizer class, for users who want a standard morphy lemmatizer with a more consistent pos argument.
On the other hand, lemmatize() is probably best left unchanged, to accomodate the many users who are accustomed to its non-standard features.

alvations added the stem/lemma label Mar 14, 2018

ekaf mentioned this issue Jan 12, 2024

A potential edge case for WordNetLemmatizer.lemmatize() #3227

Closed

ekaf mentioned this issue Jan 14, 2024

Avoid recursive suffix stripping in wordnet morphy #3225

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent pos argument between wn.synsets() and WordNetLemmatizer.lemmatize() #1978

Consistent pos argument between wn.synsets() and WordNetLemmatizer.lemmatize() #1978

alvations commented Mar 14, 2018 •

edited

53X commented Jun 2, 2018 •

edited

ekaf commented Jan 12, 2024 •

edited

ekaf commented Jan 12, 2024

ekaf commented Jan 14, 2024

Consistent pos argument between wn.synsets() and WordNetLemmatizer.lemmatize() #1978

Consistent pos argument between wn.synsets() and WordNetLemmatizer.lemmatize() #1978

Comments

alvations commented Mar 14, 2018 • edited

53X commented Jun 2, 2018 • edited

ekaf commented Jan 12, 2024 • edited

ekaf commented Jan 12, 2024

ekaf commented Jan 14, 2024

alvations commented Mar 14, 2018 •

edited

53X commented Jun 2, 2018 •

edited

ekaf commented Jan 12, 2024 •

edited