Preliminary entity topic model implementation + Migration to scala 2.10 + Factorie NER spotters (trainable + trained models) #306

dirkweissenborn · 2014-05-26T16:03:59Z

the entity topic model implementation is very simple and preliminary, but fast and working at the moment. It relies on the statistical backend for training (spotter+stores). It needs a lot of memory at the moment, though. Probabilistic count stores might be a nice idea to reduce those requirements.

I also migrated the whole project to 2.10, further testing would be nice, e.g., creating spotlight model (I don't have the raw counts).

Currently, disambiguation on CSAW has a precision of around 0.81 after 150 iterations of training, which is compared to our common sense baseline with 0.83 a little worse, which means that the context model actually hurts performance, compared to only using surface-form to resource probabilities.

Edit: Includes now new LinearChainCRFSpotter and a new spotter for pretrained factorie ner-models: see PretrainedFactorieNerSpotter (e.g., SimpleNerSpotter, BilouConllNerSpotter)

…-spotlight

Spark training for entity topic model increased performance for parsing wikipages into entity topic models bug fixes in training

* initial implementation of entity topic disambiguator

…raining and disambiguation works, but it requires a lot of memory at the moment. Probabilistic count stores might help here.

*Serialization and Deserialization working *Integration in SpotlightModel

* New spotter for pretrained ner models from factorie: see PretrainedFactorieNerSpotter, SimpleNerSpotter, BilouConllNerSpotter

…ribution

…ailes to build spotlight, because the log grows too big

dav009 · 2014-06-03T17:21:02Z

@dirkweissenborn would love to give this a try, any chance you can upload ner.factorie.model somewhere ?

dirkweissenborn · 2014-06-04T06:32:41Z

@dav009 if you want to use the pretrained models, just uncomment the factorie dependencies in the core pom.xml (see below). This should be enough to use either one of the the following spotters

/**
* Fast, but not as sophisticated as BilouConllPretrainedCRFSpotter
*/
object SimpleNerSpotter extends PretrainedFactorieNerSpotter(DocumentAnnotatorPipeline.apply[ner.NerTag])


/**
* Slow but uses many features for NERTagging and is thus much more sophisticated compared to the SimpleNerSpotter
*/
object BilouConllNerSpotter extends PretrainedFactorieNerSpotter(DocumentAnnotatorPipeline.apply[ner.BilouConllNerTag])

<!-- don't download if not used, these are models. Uncomment these dependencies if you use a PretrainedFactorieNerSpotter-->
<!--dependency>
<groupId>cc.factorie.app.nlp</groupId>
<artifactId>ner</artifactId>
</dependency>

<dependency>
<groupId>cc.factorie.app.nlp</groupId>
<artifactId>pos</artifactId>
</dependency-->

tgalery · 2014-06-04T11:38:21Z

Hi @dirkweissenborn we have been trying this locally and evaluate the use of the new spotters. We have uncommented these dependencies and set spotter=ner in the model.properties file. However, when the input is processed we get annotations generated by a Default Spotter. Looking at this line,
https://github.com/dirkweissenborn/dbpedia-spotlight/blob/entity_topic_clean/core/src/main/scala/org/dbpedia/spotlight/db/SpotlightModel.scala#L134 it seems that the code presupposes the spotter property to be set to ner in the model.properties file, but it also presupposes the existence of ner.factorie.model in the folder, no? So, i guess @dav009 s question is this. Would the factorie spotter be used without the existence of that file ? If not, where could we get it ?

tgalery · 2014-06-04T11:46:18Z

Also, when we set spotter=ner in the model.properties file, but we have an opennlp folder and lack a ner.factorie.model folder, we get:

Exception in thread "main" java.lang.IllegalArgumentException: no matching constructor found on class org.dbpedia.spotlight.db.concurrent.TokenizerActor for arguments []
    at akka.util.Reflect$.error$1(Reflect.scala:82)
    at akka.util.Reflect$.findConstructor(Reflect.scala:106)
    at akka.actor.NoArgsReflectConstructor.<init>(Props.scala:356)
    at akka.actor.IndirectActorProducer$.apply(Props.scala:305)
    at akka.actor.Props.producer(Props.scala:173)
    at akka.actor.Props.<init>(Props.scala:186)
    at akka.actor.Props$.apply(Props.scala:69)
    at org.dbpedia.spotlight.db.concurrent.TokenizerWrapper.<init>(TokenizerWrapper.scala:33)
    at org.dbpedia.spotlight.db.SpotlightModel$.fromFolder(SpotlightModel.scala:114)
    at org.dbpedia.spotlight.db.SpotlightModel.fromFolder(SpotlightModel.scala)
    at org.dbpedia.spotlight.web.rest.Server.initByModel(Server.java:297)
    at org.dbpedia.spotlight.web.rest.Server.main(Server.java:98)

dirkweissenborn · 2014-06-04T12:26:16Z

@tgalery yeah, the predefined spotters are not yet integrated within the SpotlightModel. The SpotlightModel integration only works for trained models from our own trainable model implementation. The pretrained model implementation is just a wrapper of factorie ner models. You can easily integrate the pretrained models though models. Just change the spotter property to something else (e.g.: pretrained-ner) and add a case to the SpotlightModel for that

tgalery · 2014-06-04T12:31:58Z

Cool, we'll do then.

dirkweissenborn · 2014-06-16T14:04:29Z

There is a trained entity topic model here, for whoever is interested in testing. This is a compressed file containing both the model and the necessary stores needed to created the EntityTopicDisambiguator. The model can be loaded through SimpleEntityTopicModel.fromFile(file). The stores can be loaded as it is done in the SpotlightModel. Should all be fairly easy.

jodaiber · 2014-07-13T14:48:44Z

Hey all, I would merge this as soon as the Scala version upgrade is tested. Let me know if any of you has the time to test this. The raw counts to try are here.

dirkweissenborn added 14 commits March 19, 2014 19:03

url decode redirects, disambiguations and incoming wiki titles

c9f0fcd

Merging branch master

e0ac7a8

Merge branch 'master' of https://github.com/dbpedia-spotlight/dbpedia…

7209c5b

…-spotlight

Migration to scala 2.10

6575c78

new entity topic model implementation in development

883507d

Local training for entity topic model

3969aeb

Spark training for entity topic model increased performance for parsing wikipages into entity topic models bug fixes in training

* local training for entity topic model working

449d488

* initial implementation of entity topic disambiguator

implementation of the entity topic model in first version finished. T…

25d6e74

…raining and disambiguation works, but it requires a lot of memory at the moment. Probabilistic count stores might help here.

remove relative paths

974d4c0

Minor bug fix + changing version to 0.7

bddb59e

simple test of linear chain ner spotter

95f1596

*Making ChainCRF generic, working also if no types are given.

3736535

*Serialization and Deserialization working *Integration in SpotlightModel

* migration of ChainCrf to factorie 1.0 and scala 2.10

54d2871

* New spotter for pretrained ner models from factorie: see PretrainedFactorieNerSpotter, SimpleNerSpotter, BilouConllNerSpotter

change scope to provided so factorie ner models do not end up in dist…

f2afe09

…ribution

dirkweissenborn changed the title ~~Preliminary entity topic model implementation + Migration to scala 2.10~~ Preliminary entity topic model implementation + Migration to scala 2.10 + Factorie NER spotters (trainable + trained models) May 27, 2014

dirkweissenborn added 2 commits May 27, 2014 12:09

bugfix

93e2218

commenting dependency of pretrained factorie models, because travis f…

1a51c71

…ailes to build spotlight, because the log grows too big

tgalery mentioned this pull request Jun 1, 2015

migrating to scala 2.10 #359

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preliminary entity topic model implementation + Migration to scala 2.10 + Factorie NER spotters (trainable + trained models) #306

Preliminary entity topic model implementation + Migration to scala 2.10 + Factorie NER spotters (trainable + trained models) #306

dirkweissenborn commented May 26, 2014

dav009 commented Jun 3, 2014

dirkweissenborn commented Jun 4, 2014

tgalery commented Jun 4, 2014

tgalery commented Jun 4, 2014

dirkweissenborn commented Jun 4, 2014

tgalery commented Jun 4, 2014

dirkweissenborn commented Jun 16, 2014

jodaiber commented Jul 13, 2014

Preliminary entity topic model implementation + Migration to scala 2.10 + Factorie NER spotters (trainable + trained models) #306

Are you sure you want to change the base?

Preliminary entity topic model implementation + Migration to scala 2.10 + Factorie NER spotters (trainable + trained models) #306

Conversation

dirkweissenborn commented May 26, 2014

dav009 commented Jun 3, 2014

dirkweissenborn commented Jun 4, 2014

tgalery commented Jun 4, 2014

tgalery commented Jun 4, 2014

dirkweissenborn commented Jun 4, 2014

tgalery commented Jun 4, 2014

dirkweissenborn commented Jun 16, 2014

jodaiber commented Jul 13, 2014