You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
A clear and concise description of what the bug is.
In some cases, the tokenizer for the Latin pipeline doesn't properly separate ! as a token.
To Reproduce
Steps to reproduce the behavior:
Install Python version 3.8
Install CLTK version 1.1.6 with pip
In a script or REPL, run the following code … (include literal copy-paste)
(venv) nitin@nkprasad:~/Documents/code/morcus/morcus-net$ python
Python 3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cltk
>>> nlp = cltk.NLP('lat')
𐤀 CLTK version '1.1.6'.
Pipeline for language 'Latin' (ISO: 'lat'): `LatinNormalizeProcess`, `LatinStanzaProcess`, `LatinEmbeddingsProcess`, `StopsProcess`, `LatinLexiconProcess`.
See error (include literal copy-paste)
>>> doc = nlp.analyze('Cautus esto, mi fili! Iam sequere me!')
>>> doc.morphosyntactic_features[4]
{}
>>> doc.pos[4]
'PUNCT'
>>> doc.tokens[4]
'fili!'
Expected behavior fili should be a separate token from !; due to this, we are getting PoS tags for fili (I assume it's processed as PUNCT instead).
Desktop (please complete the following information):
OS and version: Ubuntu 20.04.05 LTS
The text was updated successfully, but these errors were encountered:
Describe the bug
A clear and concise description of what the bug is.
In some cases, the tokenizer for the Latin pipeline doesn't properly separate
!
as a token.To Reproduce
Steps to reproduce the behavior:
3.8
1.1.6
withpip
Expected behavior
fili
should be a separate token from!
; due to this, we are getting PoS tags forfili
(I assume it's processed asPUNCT
instead).Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: