Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hindi NER Support for Inltk #43

Open
avinsit123 opened this issue Apr 9, 2020 · 5 comments
Open

Hindi NER Support for Inltk #43

avinsit123 opened this issue Apr 9, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@avinsit123
Copy link

Currently we are working on research project for NER in Hindi. We would like to extend our code and work to add Support for Hindi-NER in NLTK. Our current model(Embeddings->LSTM->CRF) is trained on this dataset http://ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic=2 with 14 tags and has an accuracy around 70%. We are trying to increase the accuracy of model currently. Do you have any contribution guidelines to the project or any specifics which u would like in the NER model? Otherwise, we are really interested to contribute to the project.

@goru001
Copy link
Owner

goru001 commented Apr 10, 2020

@avinsit123 Thanks for reaching out. It would be great to integrate your work into the iNLTK library.

In order to add support for Hindi NER, it would be great if you can:

  1. Open source your work with Links to Train/Test Data, Approach, Trained Model and Scripts to reproduce the results. Once you do this, I would like to take a look at it and then we''ll take it from there.
  2. Do you also want to support training of the model through iNLTK on custom data in addition to exposing the static model trained on IJCNLP dataset? If we want to do this, we'll have to think through this a bit more - happy to hear what your thoughts are.

Let me know what you think.

@goru001 goru001 added the enhancement New feature or request label Apr 10, 2020
@avinsit123
Copy link
Author

@goru001 will mail you the required stuff mentioned above once we have completed the refining model. Currently we have trained our model using several embeddings for eg: fasttext, roberta , etc. using flair's NLP Library.
It would be also great to add support in inltk so that users to custom train their NER models.

@goru001
Copy link
Owner

goru001 commented Apr 12, 2020

@avinsit123 Sure, will wait for your mail. Thanks!

@octalpixel
Copy link

@avinsit123 Do you have any resources where I can get similar NER dataset for tamil ?

@anuragshas
Copy link
Contributor

@avinsit123 How about using word level inltk embedding and then xgboost to classify the tokens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants