Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent pos argument between wn.synsets() and WordNetLemmatizer.lemmatize() #1978

Open
alvations opened this issue Mar 14, 2018 · 4 comments

Comments

@alvations
Copy link
Contributor

alvations commented Mar 14, 2018

Currently, there's some inconsistency of how POS is treated in wn.synsets() and WordNetLemmatizer.lemmatize(), e.g.

>>> from nltk.corpus import wordnet as wn
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()

# Accepts None and let pos be underspecified.
>>> wn.synsets('running', pos=None)
[Synset('run.n.05'), Synset('run.n.07'), Synset('running.n.03'), Synset('running.n.04'), Synset('track.n.11'), Synset('run.v.01'), Synset('scat.v.01'), Synset('run.v.03'), Synset('operate.v.01'), Synset('run.v.05'), Synset('run.v.06'), Synset('function.v.01'), Synset('range.v.01'), Synset('campaign.v.01'), Synset('play.v.18'), Synset('run.v.11'), Synset('tend.v.01'), Synset('run.v.13'), Synset('run.v.14'), Synset('run.v.15'), Synset('run.v.16'), Synset('prevail.v.03'), Synset('run.v.18'), Synset('run.v.19'), Synset('carry.v.15'), Synset('run.v.21'), Synset('guide.v.05'), Synset('run.v.23'), Synset('run.v.24'), Synset('run.v.25'), Synset('run.v.26'), Synset('run.v.27'), Synset('run.v.28'), Synset('run.v.29'), Synset('run.v.30'), Synset('run.v.31'), Synset('run.v.32'), Synset('run.v.33'), Synset('run.v.34'), Synset('ply.v.03'), Synset('hunt.v.01'), Synset('race.v.02'), Synset('move.v.13'), Synset('melt.v.01'), Synset('ladder.v.01'), Synset('run.v.41'), Synset('running.a.01'), Synset('running.s.02'), Synset('running.a.03'), Synset('running.a.04'), Synset('linear.s.05'), Synset('running.s.06')]


# Doesn't accept None and raise a KeyError
>>> wnl.lemmatize('running', pos=None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/nltk/stem/wordnet.py", line 40, in lemmatize
    lemmas = wordnet._morphy(word, pos)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1774, in _morphy
    exceptions = self._exception_map[pos]
KeyError: None

I'm not sure how to allow None to WordNetLemmatizer.lemmatize() though.

What should be the expected behavior of pos=None, default to pos='n'? If so, then we can make changes at https://github.com/nltk/nltk/blob/develop/nltk/stem/wordnet.py#L39:

from nltk.corpus.reader.wordnet import NOUN
...

    def lemmatize(self, word, pos=None):
        pos = NOUN if pos == None else pos
        lemmas = wordnet._morphy(word, pos)
        return min(lemmas, key=len) if lemmas else word
@53X
Copy link
Contributor

53X commented Jun 2, 2018

@alvations , I think we should the map the expected behavior for None same as that of pos= 'n' . This is because if we don't pass the pos argument to the WordNetLemmatizer , the pos value of noun is automatically assumed

@ekaf
Copy link
Contributor

ekaf commented Jan 12, 2024

@alvations and @53X, a more consistent interpretation of pos=None could be nice, but in that case, the default should not be "n", but rather "Any pos".

Please consider the morphy() wrapper in corpus/reader/wordnet.py: it uses itertools.chain to collect the lemmas from all the possible pos'es, and that is the behaviour users would normally expect when no particular pos is specified. On the contrary, a user who wants only nouns would specify pos="n".

@ekaf
Copy link
Contributor

ekaf commented Jan 12, 2024

Ideally, to get a consistent behaviour across the Wordnet Morphy-related wrappers, "WordNetLemmatizer.lemmatizer()" could just be an alias for the morphy() wrapper from wordnet.py.

Actually, I find that the name "WordNetLemmatizer" is not adequate, since this wrapper eventually undoes the WordNet filtering done by _morphy(), and ends up just accepting any garbage input. So although "WordNetLemmatizer" uses _morphy(), it is unfortunate if it is perceived as a canonical wrapper for it.

@ekaf
Copy link
Contributor

ekaf commented Jan 14, 2024

PR #3225 proposes to add two standard "morphy" modes to the WordNetLemmatizer class, for users who want a standard morphy lemmatizer with a more consistent pos argument.
On the other hand, lemmatize() is probably best left unchanged, to accomodate the many users who are accustomed to its non-standard features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants