Training on nlp4j-ner model #25

saravanakumar1 · 2017-01-20T12:20:59Z

Hi,
I need to add some more dataset to pre-existing model(en-ner.xz), As it is not possible in emory nlp4j now i have trained my own model (en-sam.xz) using the files below!!
sam.zip

i have used the command to train model
java edu.emory.mathcs.nlp.bin.NLPTrain -mode ner -c home/config-train-sample.xml -t /home/sample-trn.tsv -d /home/sample-dev.tsv -m /home/en-sam.xz
New Model was created.
i need to know whether i have used correct files while training?

please help me how can i add this new model(en-sam.xz) along with en-ner.xz using config-decode-en.xml?
i need to load this new model in the code(nlp4j/cli/src/main/java/edu/emory/mathcs/nlp/bin/NLPDemo.java) and test it.
@jdchoi77 and team
Thanks in advance.

jdchoi77 · 2017-01-26T15:27:10Z

Sorry for the late reply. The sample files would not train any good model since they are tiny. You should get the OntoNotes data from LDC and use the entire dataset to train a meaningful model:

https://catalog.ldc.upenn.edu/LDC2013T19

Please let me know if you have trouble extracting NER tags from the original OntoNotes data once you get it. Thanks.

saravanakumar1 · 2017-01-30T07:51:00Z

Will OntoNotes provide me a dataset in nlp4j training format or i should create a training file in format using dataset provided?

can u please review my configuration files ?
sam.zip

I am trying to train a model in NER mode.

@jdchoi77 and team
Thanks in advance.

jdchoi77 · 2017-02-02T12:56:38Z

OntoNotes does not come with the format that you need. I actually made the conversion script available so please take a look at this page:

https://github.com/emorynlp/ddr/blob/master/md/conversion.md#merge

Please let me know if you have more questions. Thanks.

saravanakumar1 · 2017-02-03T05:29:34Z

Is my configuration files are correct or should i make any changes????
I am getting this error while loading my new model
java.io.StreamCorruptedException: unexpected reset; recursion depth: 2
at java.io.ObjectInputStream.handleReset(ObjectInputStream.java:2049)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1323)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at edu.emory.mathcs.nlp.common.util.NLPUtils.getComponent(NLPUtils.java:98)
at edu.emory.mathcs.nlp.decode.AbstractNLPDecoder.init(AbstractNLPDecoder.java:120)
at edu.emory.mathcs.nlp.decode.AbstractNLPDecoder.(AbstractNLPDecoder.java:83)
at edu.emory.mathcs.nlp.decode.NLPDecoder.(NLPDecoder.java:36)

@jdchoi77 and team
Thanks in advance.

saravanakumar1 · 2017-02-06T06:19:47Z

Any updates?

jdchoi77 · 2017-02-07T13:16:19Z

Are you trying to decode or train? These errors are coming from the decoder. If you are trying to decode, could you send me your configuration file, input file, and command you ran?

jdchoi77 · 2017-02-21T13:48:47Z

The sample file is there only for demo and is too small to be used for training. You should feed in your own data to train NER; you can obtain a large corpus from LDC for free:

https://catalog.ldc.upenn.edu/LDC2013T19

SakthivelAnand · 2017-02-22T04:14:28Z

Thank you @jdchoi77

SakthivelAnand · 2017-02-22T06:20:43Z

Hello ,
I trained NER with large LDC corpus , but still it can't train the data but it stores the model , here i enclosed my ner_config.xml file, training and developement data ,
NLP4JDATA.zip

the comment line status is,

sh /home/appassembler/bin/nlptrain -mode ner -c /home/config_ner_train.xml -t /home/Output.stv -d /home/Output1.stv -m /home/NLP4JMODEL/en-sam.xz

Loading ambiguity classes
Loading word clusters
Loading word embeddings
Loading named entity gazetteers
Name not implemented for OnlineComponent. Input name - en-sam.xz will be ignored.
AdaGrad Mini-batch

Max epoch: 0
Mini-batch: 5
Feature cutoff: 2
Learning rate: 0.02
LOLS: fixed = 0, decaying rate = 0.95
RDA: 1.0E-5
Training: 0
0: Best: 0.00, epoch = -1
Saving the model

SakthivelAnand · 2017-02-22T07:42:48Z

Hi
I found the problem with my config.xml file , Now it is working .
Thank you .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on nlp4j-ner model #25

Training on nlp4j-ner model #25

saravanakumar1 commented Jan 20, 2017 •

edited

jdchoi77 commented Jan 26, 2017

saravanakumar1 commented Jan 30, 2017 •

edited

jdchoi77 commented Feb 2, 2017

saravanakumar1 commented Feb 3, 2017 •

edited

saravanakumar1 commented Feb 6, 2017

jdchoi77 commented Feb 7, 2017

jdchoi77 commented Feb 21, 2017

SakthivelAnand commented Feb 22, 2017

SakthivelAnand commented Feb 22, 2017

SakthivelAnand commented Feb 22, 2017

Training on nlp4j-ner model #25

Training on nlp4j-ner model #25

Comments

saravanakumar1 commented Jan 20, 2017 • edited

jdchoi77 commented Jan 26, 2017

saravanakumar1 commented Jan 30, 2017 • edited

jdchoi77 commented Feb 2, 2017

saravanakumar1 commented Feb 3, 2017 • edited

saravanakumar1 commented Feb 6, 2017

jdchoi77 commented Feb 7, 2017

jdchoi77 commented Feb 21, 2017

SakthivelAnand commented Feb 22, 2017

SakthivelAnand commented Feb 22, 2017

SakthivelAnand commented Feb 22, 2017

saravanakumar1 commented Jan 20, 2017 •

edited

saravanakumar1 commented Jan 30, 2017 •

edited

saravanakumar1 commented Feb 3, 2017 •

edited