Skip to content
This repository has been archived by the owner on Oct 20, 2018. It is now read-only.

Preliminary entity topic model implementation + Migration to scala 2.10 + Factorie NER spotters (trainable + trained models) #306

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

dirkweissenborn
Copy link
Member

the entity topic model implementation is very simple and preliminary, but fast and working at the moment. It relies on the statistical backend for training (spotter+stores). It needs a lot of memory at the moment, though. Probabilistic count stores might be a nice idea to reduce those requirements.

I also migrated the whole project to 2.10, further testing would be nice, e.g., creating spotlight model (I don't have the raw counts).

Currently, disambiguation on CSAW has a precision of around 0.81 after 150 iterations of training, which is compared to our common sense baseline with 0.83 a little worse, which means that the context model actually hurts performance, compared to only using surface-form to resource probabilities.

Edit: Includes now new LinearChainCRFSpotter and a new spotter for pretrained factorie ner-models: see PretrainedFactorieNerSpotter (e.g., SimpleNerSpotter, BilouConllNerSpotter)

Spark training for entity topic model
increased performance for parsing wikipages into entity topic models
bug fixes in training
* initial implementation of entity topic disambiguator
…raining and disambiguation works, but it requires a lot of memory at the moment. Probabilistic count stores might help here.
*Serialization and Deserialization working
*Integration in SpotlightModel
* New spotter for pretrained ner models from factorie: see PretrainedFactorieNerSpotter, SimpleNerSpotter, BilouConllNerSpotter
@dirkweissenborn dirkweissenborn changed the title Preliminary entity topic model implementation + Migration to scala 2.10 Preliminary entity topic model implementation + Migration to scala 2.10 + Factorie NER spotters (trainable + trained models) May 27, 2014
@dav009
Copy link
Member

dav009 commented Jun 3, 2014

@dirkweissenborn would love to give this a try, any chance you can upload ner.factorie.model somewhere ?

@dirkweissenborn
Copy link
Member Author

@dav009 if you want to use the pretrained models, just uncomment the factorie dependencies in the core pom.xml (see below). This should be enough to use either one of the the following spotters

/**
* Fast, but not as sophisticated as BilouConllPretrainedCRFSpotter
*/
object SimpleNerSpotter extends PretrainedFactorieNerSpotter(DocumentAnnotatorPipeline.apply[ner.NerTag])


/**
* Slow but uses many features for NERTagging and is thus much more sophisticated compared to the SimpleNerSpotter
*/
object BilouConllNerSpotter extends PretrainedFactorieNerSpotter(DocumentAnnotatorPipeline.apply[ner.BilouConllNerTag])
<!-- don't download if not used, these are models. Uncomment these dependencies if you use a PretrainedFactorieNerSpotter-->
<!--dependency>
<groupId>cc.factorie.app.nlp</groupId>
<artifactId>ner</artifactId>
</dependency>

<dependency>
<groupId>cc.factorie.app.nlp</groupId>
<artifactId>pos</artifactId>
</dependency-->

@tgalery
Copy link
Member

tgalery commented Jun 4, 2014

Hi @dirkweissenborn we have been trying this locally and evaluate the use of the new spotters. We have uncommented these dependencies and set spotter=ner in the model.properties file. However, when the input is processed we get annotations generated by a Default Spotter. Looking at this line,
https://github.com/dirkweissenborn/dbpedia-spotlight/blob/entity_topic_clean/core/src/main/scala/org/dbpedia/spotlight/db/SpotlightModel.scala#L134 it seems that the code presupposes the spotter property to be set to ner in the model.properties file, but it also presupposes the existence of ner.factorie.model in the folder, no? So, i guess @dav009 s question is this. Would the factorie spotter be used without the existence of that file ? If not, where could we get it ?

@tgalery
Copy link
Member

tgalery commented Jun 4, 2014

Also, when we set spotter=ner in the model.properties file, but we have an opennlp folder and lack a ner.factorie.model folder, we get:

Exception in thread "main" java.lang.IllegalArgumentException: no matching constructor found on class org.dbpedia.spotlight.db.concurrent.TokenizerActor for arguments []
    at akka.util.Reflect$.error$1(Reflect.scala:82)
    at akka.util.Reflect$.findConstructor(Reflect.scala:106)
    at akka.actor.NoArgsReflectConstructor.<init>(Props.scala:356)
    at akka.actor.IndirectActorProducer$.apply(Props.scala:305)
    at akka.actor.Props.producer(Props.scala:173)
    at akka.actor.Props.<init>(Props.scala:186)
    at akka.actor.Props$.apply(Props.scala:69)
    at org.dbpedia.spotlight.db.concurrent.TokenizerWrapper.<init>(TokenizerWrapper.scala:33)
    at org.dbpedia.spotlight.db.SpotlightModel$.fromFolder(SpotlightModel.scala:114)
    at org.dbpedia.spotlight.db.SpotlightModel.fromFolder(SpotlightModel.scala)
    at org.dbpedia.spotlight.web.rest.Server.initByModel(Server.java:297)
    at org.dbpedia.spotlight.web.rest.Server.main(Server.java:98)

@dirkweissenborn
Copy link
Member Author

@tgalery yeah, the predefined spotters are not yet integrated within the SpotlightModel. The SpotlightModel integration only works for trained models from our own trainable model implementation. The pretrained model implementation is just a wrapper of factorie ner models. You can easily integrate the pretrained models though models. Just change the spotter property to something else (e.g.: pretrained-ner) and add a case to the SpotlightModel for that

@tgalery
Copy link
Member

tgalery commented Jun 4, 2014

Cool, we'll do then.

@dirkweissenborn
Copy link
Member Author

There is a trained entity topic model here, for whoever is interested in testing. This is a compressed file containing both the model and the necessary stores needed to created the EntityTopicDisambiguator. The model can be loaded through SimpleEntityTopicModel.fromFile(file). The stores can be loaded as it is done in the SpotlightModel. Should all be fairly easy.

@jodaiber
Copy link
Member

Hey all, I would merge this as soon as the Scala version upgrade is tested. Let me know if any of you has the time to test this. The raw counts to try are here.

@tgalery tgalery mentioned this pull request Jun 1, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants