Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lemmatization issue #1202

Open
MaJaPers opened this issue Jan 24, 2023 · 0 comments
Open

lemmatization issue #1202

MaJaPers opened this issue Jan 24, 2023 · 0 comments
Assignees
Labels

Comments

@MaJaPers
Copy link

MaJaPers commented Jan 24, 2023

I would expect the outcome of a lemmatization of a verb to be the dictionary form of it (i.e. (Mood: [indicative], Number: [singular], Person: [first], Tense: [present], VerbForm: [finite], Voice: [active]). However cltk does produce that outcome, even if it able to correctly determine the 'mood'. 'videtur' gives 'videtur' instead of 'video'.

from cltk import NLP
cltk_nlp = NLP(language="lat",suppress_banner=True)
cltk_nlp.analyze('videtur').lemmata

>>> 'videtur'

Similar applies to nouns - plurals, I guess, should be lemmatized to singulars. 'verba' gives 'verba' instead of 'verbum'.

from cltk import NLP
cltk_nlp = NLP(language="lat",suppress_banner=True)
cltk_nlp.analyze('verba').lemmata

>>> 'verba'

However, LatinBackoffLemmatizer works:

from cltk.lemmatize.lat import LatinBackoffLemmatizer

lemmatizer = LatinBackoffLemmatizer()
lemmatizer.lemmatize(['videtur'])

>>> [('videtur', 'video')]

lemmatizer.lemmatize(['verba'])

>>> [('verba', 'verbum')]


Other latin lemmatizers also manage to do it:
```python
from lamonpy import Lamon
lamon=Lamon()
score,tagged=lamon.tag('videtur')[0]
tagged[0][2]

>>> 'video'

import simplemma
simplemma.lemmatize('videtur',lang='la')

>>> 'video'

from lamonpy import Lamon
lamon=Lamon()
score,tagged=lamon.tag('verba')[0]
tagged[0][2]

>>> 'verbum'

import simplemma
simplemma.lemmatize('verba',lang='la')

>>> 'verbum'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants