Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter support to get_dbpedia_uris() #30

Open
ChristophLeonhardt opened this issue Feb 22, 2024 · 4 comments
Open

Add parameter support to get_dbpedia_uris() #30

ChristophLeonhardt opened this issue Feb 22, 2024 · 4 comments
Assignees

Comments

@ChristophLeonhardt
Copy link
Collaborator

get_dbpedia_uris() currently passes the text and the confidence parameter to DBpedia Spotlight. However, there are more parameters which influence the results of the service. These are described in the paper by Mendes et al. (2011) and shown in examples on the DBpedia Spotlight GitHub wiki (https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Web-service and https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/User's-manual).

One of those parameters is "support" which sets a threshold of the minimum prominence of an entity in Wikipedia (pp. 3-4). The inclusion of support might be useful. If I am not mistaken, support could be added to the query parameter created for the GET in get_dbpedia_uris().

@ablaette
Copy link
Contributor

Adding this argument is not a problem, see the implementation for types. In the examples at I see the values -1 and 20: What does it mean, what would a reasonable default value be?

@ablaette
Copy link
Contributor

I have implemented the argument. What would a telling example be to be able to explain the effects of using the parameter? Is the (preliminary) documentation sufficient?

ablaette pushed a commit that referenced this issue Feb 26, 2024
@ChristophLeonhardt
Copy link
Collaborator Author

According to Mendes et al. (2011: 3-4), support refers to the minimum number of inlinks of a resource. I do not think that it is explained further, but I assume that this refers to the number of other pages linking to the resource? It is used to determine the "Prominence" of a resource, according to the paper (Mendes et al. 2011: 3).

I am not sure what -1 means in the examples. I assume that no filtering is applied here, but I am not sure why this would not be the case with support = 0.

In my trials with more prominent concepts such as city or county names, this number can be a lot higher. A support value of 500 seemed plausible to me, but I assume this was due to the selection of specific entities.

I would assume that 20, as suggested in the examples liked above, might be reasonable when used on ordinary entities.

@ablaette
Copy link
Contributor

So ... we should include Mendes et al. 2011 as a reference in the package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants