Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Disambiguation using unfactored bert model does not yield same results as using the Camelira Web Interface #130

Open
amsu2 opened this issue Dec 16, 2023 · 1 comment
Assignees
Labels

Comments

@amsu2
Copy link

amsu2 commented Dec 16, 2023

I installed the project. Did all things.

Used the example code from https://camel-tools.readthedocs.io/en/stable/api/disambig/bert.html#examples.

Tried out various input sentences. In pretty much every sentence, often in verbs, the last letter remains without diacritization.

But more importantly, every so often, a word gets disambiguated completely different to what the Camlira Website would do, and the weightings are also different.

Example:
Input: وهي مدرسة
Output: وَهِيَ مَدْرَسَةٌ
Camelira Website Output: وَهِيَ مُدَرِّسَةٌ

For some words, not only are the weightings or the chosing between two 1.0 results different, but the analysis is completely different.

Example:
Input: مهمة
Output: مَهَمَّةً
Camelia Website Output: 30 versions of مُهِمَّةٌ; mahammah is not once included.

Thanks in advance for your help. I'm a CS Student and have been interested in linguistics and Arabic for a few years now; I'm a big fan of your work. This would really help me.

Windows 10, Python 3.9

@Hamed1Hamed
Copy link

I have the same issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants