Disambiguation #33

ziodave · 2021-03-11T16:33:25Z

Can the following be improved or fixed by running the classifier on a larger dataset? or is it only based on the number of statements for each entity?

Cheers,
David

wetneb · 2021-03-12T09:13:46Z

It's hard to say looking at a single example! In general I would recommend training a model for a specific domain (which would also let you select more finely which entities can be annotated).

ziodave · 2021-03-22T13:12:39Z

@wetneb is there a reason you're not using the edges in OpenTapioca/Wikidata to run the classifier? Did you try already?

wetneb · 2021-03-22T16:54:19Z

How would you turn them into features?

ziodave · 2021-03-22T17:10:57Z

I am not sure, I am trying to understand how to solve the above reference case.

I can see the Apple Inc. (Q312) has an edge to Steve Jobs (Q19837) and viceversa. I am trying to understand if we can use these edges to perform the training.

Because the training is extremely complex on larger datasets, involving first the dbpedia.org to wikidata.org item id mapping and then the classification. And I am afraid that SVM is not suitable for classification of large datasets.

wetneb · 2021-03-22T17:13:43Z

Yes edges are used, not directly as features of the classifier, but to foster the score of collections of mentions that are interlinked. It's something I tried to explain in the paper but things might well be unclear there, let me know if I can clarify aspects of that.

ziodave · 2021-03-22T17:37:24Z

I see. I think there some tuning to do then, take for example this scenario:

"Apple is a fruit", works nicely:

"Apple Inc has been founded by Steve Jobs", works nicely:

but "Apple has been founded by Steve Jobs", doesn't work nicely:

The only thing that's changing is that "Apple" in the second case is an alias of "Apple Inc.". As far as I understand from the paper though aliases have the same score as labels? Should the "Apple has been founded by Steve Jobs" actually yield the same results as "Apple Inc has been founded by Steve Jobs"?

wetneb · 2021-03-22T17:40:24Z

Yes it's a curious result, I would definitely expect "Apple has been founded by Steve Jobs" to be understood correctly. I think the training set probably did not contain extremely famous companies or people like those, so they are outliers that aren't scored well by the classifier.

ziodave · 2021-03-22T17:57:48Z

Yes, and this is the path that I followed, I thought I could find a better dataset for training, but this increases complexity. And then I thought is there a way to optimize cases similar to this one? at the end 'Apple' is just an alias of 'Apple Inc.' :-)

wetneb · 2021-03-22T18:03:39Z

I don't know exactly what you mean by optimize cases similar to this one - do you mean tweaking the classifier parameters manually to adjust for those cases?

ziodave · 2021-03-22T18:48:31Z

Yes, sorry. But is this a classifier issue? "Apple Inc" is detected while "Apple" as alias isn't.

Sorry, I might not be to expert with the classifier. I am asking a colleague of mine to join this thread too.

wetneb · 2021-03-22T20:31:49Z

"Apple Inc" is more specific than "Apple" so it is normal that is detected more easily.

sareaghaei · 2021-03-23T07:27:42Z

I don't know exactly what you mean by optimize cases similar to this one - do you mean tweaking the classifier parameters manually to adjust for those cases?

I think the reason of not working fine for some cases is not necessarily classification. Changing the parameters may further reduce the evaluation metrics and as a result, be less accurate about things that already work well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disambiguation #33

Disambiguation #33

ziodave commented Mar 11, 2021

wetneb commented Mar 12, 2021

ziodave commented Mar 22, 2021

wetneb commented Mar 22, 2021

ziodave commented Mar 22, 2021

wetneb commented Mar 22, 2021

ziodave commented Mar 22, 2021

wetneb commented Mar 22, 2021

ziodave commented Mar 22, 2021

wetneb commented Mar 22, 2021

ziodave commented Mar 22, 2021

wetneb commented Mar 22, 2021

sareaghaei commented Mar 23, 2021

Disambiguation #33

Disambiguation #33

Comments

ziodave commented Mar 11, 2021

wetneb commented Mar 12, 2021

ziodave commented Mar 22, 2021

wetneb commented Mar 22, 2021

ziodave commented Mar 22, 2021

wetneb commented Mar 22, 2021

ziodave commented Mar 22, 2021

wetneb commented Mar 22, 2021

ziodave commented Mar 22, 2021

wetneb commented Mar 22, 2021

ziodave commented Mar 22, 2021

wetneb commented Mar 22, 2021

sareaghaei commented Mar 23, 2021