Hindi NER Support for Inltk #43

avinsit123 · 2020-04-09T06:50:03Z

Currently we are working on research project for NER in Hindi. We would like to extend our code and work to add Support for Hindi-NER in NLTK. Our current model(Embeddings->LSTM->CRF) is trained on this dataset http://ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic=2 with 14 tags and has an accuracy around 70%. We are trying to increase the accuracy of model currently. Do you have any contribution guidelines to the project or any specifics which u would like in the NER model? Otherwise, we are really interested to contribute to the project.

goru001 · 2020-04-10T16:10:37Z

@avinsit123 Thanks for reaching out. It would be great to integrate your work into the iNLTK library.

In order to add support for Hindi NER, it would be great if you can:

Open source your work with Links to Train/Test Data, Approach, Trained Model and Scripts to reproduce the results. Once you do this, I would like to take a look at it and then we''ll take it from there.
Do you also want to support training of the model through iNLTK on custom data in addition to exposing the static model trained on IJCNLP dataset? If we want to do this, we'll have to think through this a bit more - happy to hear what your thoughts are.

Let me know what you think.

avinsit123 · 2020-04-11T06:30:44Z

@goru001 will mail you the required stuff mentioned above once we have completed the refining model. Currently we have trained our model using several embeddings for eg: fasttext, roberta , etc. using flair's NLP Library.
It would be also great to add support in inltk so that users to custom train their NER models.

goru001 · 2020-04-12T06:11:32Z

@avinsit123 Sure, will wait for your mail. Thanks!

octalpixel · 2020-04-18T16:58:11Z

@avinsit123 Do you have any resources where I can get similar NER dataset for tamil ?

anuragshas · 2020-04-19T05:24:43Z

@avinsit123 How about using word level inltk embedding and then xgboost to classify the tokens?

goru001 added the enhancement New feature or request label Apr 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hindi NER Support for Inltk #43

Hindi NER Support for Inltk #43

avinsit123 commented Apr 9, 2020

goru001 commented Apr 10, 2020 •

edited

avinsit123 commented Apr 11, 2020

goru001 commented Apr 12, 2020

octalpixel commented Apr 18, 2020

anuragshas commented Apr 19, 2020

Hindi NER Support for Inltk #43

Hindi NER Support for Inltk #43

Comments

avinsit123 commented Apr 9, 2020

goru001 commented Apr 10, 2020 • edited

avinsit123 commented Apr 11, 2020

goru001 commented Apr 12, 2020

octalpixel commented Apr 18, 2020

anuragshas commented Apr 19, 2020

goru001 commented Apr 10, 2020 •

edited