Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] NER prediction : mismatch between number of input tokens and output labels #105

Open
fadhleryani opened this issue Nov 20, 2022 · 0 comments
Assignees
Labels

Comments

@fadhleryani
Copy link
Collaborator

fadhleryani commented Nov 20, 2022

Howdy owo!

check this out:
Given a string such as the following (note the misplaced shadda after ان):
'أوهاشي، أن ّ "لأجل تحقيق التنمية المرجوّة'
simple_word_tokenize returns 9 tokens. These are then passed to the ner predictor, which returns 8 labels.

Desktop (please complete the following information):

  • OS: macOS 11.5.2
  • Python version: 3.9.13
  • CAMeL Tools version as well as installation source (pip, conda, source). If installed from source, specify which branch [e.g. master] and/or commit hash: 1.4.1 (from pip)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants