Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Why is output form dialect id system different from the ADIDA online interface? #141

Open
fadhleryani opened this issue Apr 3, 2024 · 0 comments
Assignees
Labels

Comments

@fadhleryani
Copy link
Collaborator

fadhleryani commented Apr 3, 2024

camel_tools 1.5.2 on mac 14.1.1

Using the preloaded example sentences in the ADIDA interface, for instance:
"بدي دوب قلي قلي بجنون بحبك انا مجنون ما بنسى حبك يوم"
I get a score of 95.9% for Beirut
When I try to predict the same sentence using camel_tools, I get a different result. For example, using model26 which I assume is the same as in ADIDA

from camel_tools.dialectid import DIDModel26
did = DIDModel26.pretrained()
did.predict(['بدي دوب قلي قلي بجنون بحبك انا مجنون ما بنسى حبك يوم'])

I get the following scores: [DIDPred(top='ALE', scores={'ALE': 0.2744463749182225, 'ALG': 0.0019964477414507772, 'ALX': 0.0017124356871910278, 'AMM': 0.04793813798943018, ...

Similarly using model6, I also get different and lower scores than the online interface (but at least dialect is correct).

from camel_tools.dialectid import DIDModel6
did = DIDModel6.pretrained()
did.predict(['بدي دوب قلي قلي بجنون بحبك انا مجنون ما بنسى حبك يوم'])

I get the following scores: [DIDPred(top='BEI', scores={'BEI': 0.5475092868164938, 'CAI': 0.05423997031019218, 'DOH': 0.018378809169102468, 'MSA': 0.003793013408907513, 'RAB': 0.0018751946461352397, 'TUN': 0.37420372564916876})]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants