Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on TestGtRDFReader #11

Open
franklarryx opened this issue May 25, 2018 · 10 comments
Open

Error on TestGtRDFReader #11

franklarryx opened this issue May 25, 2018 · 10 comments

Comments

@franklarryx
Copy link

Hi, I'm tried some tests with JedAI tool.
This tool is useful for my job and I think that it has big potentiality.
I've downloaded the attached file in nt format: source.nt, target.nt.
In the firts step I have successfully executed TestRdfReader class presents in the test package for both datasets. After that I've tried to execute TestGtRDFReader class with the same datasets used before, but I have the following error:
Exception in thread "main" java.lang.IllegalArgumentException: loops not allowed at org.jgrapht.graph.AbstractBaseGraph.addEdge(AbstractBaseGraph.java:203) at org.scify.jedai.datareader.groundtruthreader.GtRDFReader.performReading(GtRDFReader.java:236) at org.scify.jedai.datareader.groundtruthreader.GtRDFReader.getDuplicatePairs(GtRDFReader.java:92) at org.scify.jedai.datareader.groundtruthreader.AbstractGtReader.getDuplicatePairs(AbstractGtReader.java:57) at org.scify.jedai.datareader.TestGtRDFReader.main(TestGtRDFReader.java:39)

datasets.zip

Thanks in advance!

@gpapadis
Copy link
Collaborator

Hi, we are glad you are interested in JedAI!

I didn't have the time to reproduce the error you mention. It is probably caused because there is a same-as statement that connects an entity to itself. I guess you have modified TestGtRDFReader.java so that it reads both datasets. Which of the two datasets do you use as input for the GtRDFReader?

Kind regards,
George

@franklarryx
Copy link
Author

Hi, these datasets come from silkframework.org.
I have already checked that both datasets not contain any "sameAs" property.
I have debugged the code and seems that the issue is derivated from the following part of the code presents in the GtRDFReader (line 229-234) class:

` final String sub = stmt.getSubject().toString();
final String obj = stmt.getObject().toString();

        // add a new edge for every pair of duplicate entities
        int entityId1 = urlToEntityId1.get(sub);
        int entityId2 = urlToEntityId1.get(obj) + datasetLimit;`

Thanks !

@gpapadis
Copy link
Collaborator

Hi,

for some reason, I see lots of sameAs statements in the datasets you have uploaded.
I created here a class that tries to reproduce the error you are mentioning:
https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/datareader/TestSilkData.java
So, my question is which file does gtFilePath point to in your case (Line 21)?
On my computer, I run TestSilkData.java without getting any exception.
The problem I see with setting
gtFilePath = mainDir + "source.nt";
is that I only get sameAs statements like the following:
[http://dbpedia.org/resource/Karma_%28film%29, http://www.w3.org/2002/07/owl#sameAs, http://data.linkedmdb.org/resource/film/7632]
where http://data.linkedmdb.org/resource/film/7632 is not included in any of the given datasets and causes problems.
I would be happy to help you if you clarified which dataset you use for groundtruth, provided of course that this groundtruth file contains correct links of the form
URL_from_Dataset_1 sameAs URL_from_Dataset_2.

Kind regards,
George

@franklarryx
Copy link
Author

Many thanks for the reply!
I had written wrong code.
My goal is to check how JedAI links the two datasets (source.nt and target.nt) in order to replace the silk tool!

kind regards,
Frank

@gpapadis
Copy link
Collaborator

You are welcome Frank! Let us know if we can assist you in any other way.

@franklarryx
Copy link
Author

Hi George,

attached you can find the java class that you have provided to me, modified with block management and similarity process.
I can't understand very well the result obtain from the class (result.txt in attached). I think that the percentages of similarity are not highly, but the linkage between datasets are present!

What do you think? Any suggestions?

Thanks in advance,
Frank

classAndResult.zip

@gpapadis
Copy link
Collaborator

gpapadis commented Jun 6, 2018

Hi Frank,

I am sorry for the late response.

I updated the TestSilkData.java class (https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/datareader/TestSilkData.java) with a more complete version of the code. The code you sent me didn't perform Entity Clustering, which is necessary for yielding the final results. The absolute values of the similarities might be low, but what matters is their relative values. In the Clean-Clean ER scenario you are considering, Unique Mapping Clustering should be applied in the end so that for every entity, the best match is selected (i.e., the pair with the highest similarity), as long as this similarity exceeds a certain threshold.

Note that the new code tests a large number of configurations in order to find the one with the highest performance. As a result, it will take some time to complete. I ran it, but no meaningful results were produced, because the ground-truth reader cannot extract any pair of duplicates from the source.nt file that is used as the source of the groundtruth.

Kind regards,
George

@franklarryx
Copy link
Author

Thank you for your time!
Where can I find a simple example in rdf in order to better understand your tool?

Kind regards,
Frank

@mthanos
Copy link
Contributor

mthanos commented Jun 11, 2018

Hi Frank,

You can find many relevant datasets here
http://oaei.ontologymatching.org/2009/ ,
where we have also taken many of our benchmarks from.

You can also check the following datasets along with the expected mappings:
oaeiIMidentity.zip
They were used for OAEI instance matching track (http://oaei.ontologymatching.org/2014/im/index.html)

Best regards,
Manos.

@franklarryx
Copy link
Author

Many thanks for the indications and suggestions!

Kind regards,
Frank

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants