Feature/adding relevance scores #300

tgalery · 2014-05-06T11:08:41Z

This PR adds a feature to dbpedia-spotlight, namely, weights associated with the annotations extracted.To do so, all you need to do is add a line containing relevance_scoring=default to your model.properties file in the model folder. If that line is not present, spotlight behaves as usual.

We use:

the context vector overlap
number a times a topic is spotted in the text
the overlap among context words for the topics. For example "Microsoft" being a context words for the topics.
We normalize the output score using min-max normalization (0-1) ( this step could be improved)

Given some toy data that we had manually annotated, we realized that this method gives results close to human judgements than those given by other topic extractors such as Zemanta or Alchemy. If you think this is a good idea, we could manually annotate some establish dataset, like the milne witten, and write a paper as an attempt to reproduce the results in a more formal way.

jodaiber · 2014-05-13T16:32:35Z

This looks good, any open issues?

tgalery · 2014-05-13T17:08:34Z

Hi @jodaiber no open issues that I know of. I could rebase on master, but I wonder whether the missing config would break the branch, as in the other pr.
Would be good for other people to test this to see if it's working as expected.

pablomendes · 2014-05-15T15:01:49Z

@tgalery, this is great! I have wanted to add relevance for a while now,
but it was always trumped by other more serious issues. Never got to it!

Luis Marujo had a paper and shared a number of datasets in his LREC2012
paper that we could try to use for evaluation if their definition of
relevancy is at all related to yours.

I would love to help with the paper however I can.
On May 13, 2014 10:08 AM, "tgalery" notifications@github.com wrote:

Hi @jodaiber https://github.com/jodaiber no open issues that I know of.
I could rebase on master, but I wonder whether the missing config would
break the branch, as in the other pr.
Would be good for other people to test this to see if it's working as
expected.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/300#issuecomment-42983399
.

tgalery · 2014-05-19T08:43:51Z

Thanks @pablomendes ! Glad to have such nice feedback. I'm gonna take a look at the Majuro paper when I have a chance and see if I can re-use his dataset. I will keep you posted on how things develop here (I guess I can find your email in google, yeah?). Meanwhile if you or @jodaiber want to test the branch more and give feedback on the scores you get back, feel free to do so.

tgalery · 2014-06-06T15:06:37Z

hi @pablomendes I was taking a look at the Majuro dataset and apparently it can be found here https://github.com/snkim/AutomaticKeyphraseExtraction . However, when I openned the data, it didn't seem to contain the relevance scores.

I just took a look at the Roder et al paper and it seems that they have some publically available datasets for keyword extraction and disambiguation. I was wondering if maybe using that in a mechanical turk interface to collect the data would be a good idea. Or else maybe contacting Majuro himself.

adding relevance scores to rest flow outputing the relevance scores adding field for relevance-score calculate relevance only if specified in properties fix for no spot spotted topic cases only one topic spotted fix more comments adding context text interect filter

dav009 and others added 2 commits July 10, 2014 16:34

Fist go at scoring

1b465c6

adding relevance scores to rest flow outputing the relevance scores adding field for relevance-score calculate relevance only if specified in properties fix for no spot spotted topic cases only one topic spotted fix more comments adding context text interect filter

Changing class structure, minor fixes and tests

b6d9ea6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/adding relevance scores #300

Feature/adding relevance scores #300

tgalery commented May 6, 2014

jodaiber commented May 13, 2014

tgalery commented May 13, 2014

pablomendes commented May 15, 2014

tgalery commented May 19, 2014

tgalery commented Jun 6, 2014

Feature/adding relevance scores #300

Are you sure you want to change the base?

Feature/adding relevance scores #300

Conversation

tgalery commented May 6, 2014

jodaiber commented May 13, 2014

tgalery commented May 13, 2014

pablomendes commented May 15, 2014

tgalery commented May 19, 2014

tgalery commented Jun 6, 2014