Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on Arabic language #115

Open
lecidhugo opened this issue Dec 3, 2020 · 3 comments
Open

Training on Arabic language #115

lecidhugo opened this issue Dec 3, 2020 · 3 comments

Comments

@lecidhugo
Copy link

Hello,
Is there any document or guide on how to train on Arabic ?
Is this possible ? if yes what are the requirements ?

Thanks in advance,

@kermitt2
Copy link
Owner

kermitt2 commented Dec 3, 2020

Hello @lecidhugo !

You can create the resources for a new language with https://github.com/kermitt2/grisp
The readme describes the process. It's an Hadoop process that is going to take a few hours.

Once done, you can start an environment for Arabic with entity-fishing, the knowledge base will be automatically build. Then you need to train a ranker and a selector model as described here -> https://nerd.readthedocs.io/en/latest/train.html#training-with-wikipedia

Loading the markupFull is the DB that is time consuming, it stores all the article text content.

You don't need to create embeddings if I remember well, it should work without them. However it improves a bit the disambiguation. This is also quite time consuming (it should be half day for Arabic given the number of articles).

There are 1,080,907 articles in Arabic, so it's a pretty big number, it should be doable and provide decent results.

@lecidhugo
Copy link
Author

lecidhugo commented Dec 8, 2020

Thank you very much for your kind reply!
I will try to do it

@kermitt2
Copy link
Owner

kermitt2 commented May 3, 2022

Note that Arabic is now supported by default, with already trained models and KB resources available, see the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants