Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DESCRIBE of two resources causes Cartesian Product #4860

Open
VladimirAlexiev opened this issue Dec 19, 2023 · 0 comments
Open

DESCRIBE of two resources causes Cartesian Product #4860

VladimirAlexiev opened this issue Dec 19, 2023 · 0 comments
Labels
🐞 bug issue is a bug

Comments

@VladimirAlexiev
Copy link

VladimirAlexiev commented Dec 19, 2023

Current Behavior

We're trying to get a description of an ontology and all terms within with this query:

PREFIX swapi: <https://swapi.co/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
DESCRIBE swapi: ?term { ?term rdfs:isDefinedBy swapi: }

The result is attached: query-result.txt (rename to .nt):

  • wc -l query-result.nt
    15087 total lines
  • sort query-result.nt|uniq|wc -l
    2280 unique lines (7x duplication on average)
  • sort query-result.nt|uniq -c|grep -v " 1 "|sort -r -n|less
    to view the top repeated lines
  • sort query-result.nt|uniq -c|grep " 114 "|grep -v isDefinedBy
    to see that the most repeated lines (114x) are all rdfs:isDefinedBy
  • sort query-result.nt|uniq -c|grep " 114 "|sort|uniq|wc -l
    113: to see that there are 113 terms, and so the duplication is caused by some Cartesian Product that produces o(N^2) results
  • sort query-result.nt|uniq -c|grep " 2 "|less
    to see that the other duplicated lines are of the form
    <https://swapi.co/vocabulary/Clawdite> a <https://swapi.co/vocabulary/Species>

Note: The ontology is a bit unusual since it has a "metaclass" Character and then per-species classes that describe typical characteristics of that species, see below.
But I think that's beside the point since the major duplication comes from DESCRIBE swapi: ?term

# grep -i clawdite query-result.nt|riot --formatted ttl
<https://swapi.co/resource/clawdite/70>
        a       <https://swapi.co/vocabulary/Clawdite> , <https://swapi.co/vocabulary/Character> .

<https://swapi.co/vocabulary/Clawdite>
        a       <https://swapi.co/vocabulary/Species> , <http://www.w3.org/2002/07/owl#Class> ;
        <http://www.w3.org/2000/01/rdf-schema#isDefinedBy>
                <https://swapi.co/ontology/> ;
        <http://www.w3.org/2000/01/rdf-schema#label>
                "Clawdite" ;
        <http://www.w3.org/2000/01/rdf-schema#subClassOf>
                <https://swapi.co/vocabulary/Sentient> , <https://swapi.co/vocabulary/Reptilian> ;
        <https://swapi.co/vocabulary/averageHeight>
                180.0 ;
        <https://swapi.co/vocabulary/averageLifespan>
                "70" ;
        <https://swapi.co/vocabulary/character>
                <https://swapi.co/resource/clawdite/70> ;
        <https://swapi.co/vocabulary/eyeColor>
                "yellow" ;
        <https://swapi.co/vocabulary/film>
                <https://swapi.co/resource/film/5> , <https://swapi.co/resource/film/6> ;
        <https://swapi.co/vocabulary/language>
                "Clawdite" ;
        <https://swapi.co/vocabulary/planet>
                <https://swapi.co/resource/planet/54> ;
        <https://swapi.co/vocabulary/skinColor>
                "yellow" , "green" .

Expected Behavior

No duplicates, or some triples duplicated only twice (eg each isDefinedBy from the point of view of subject and object)

Steps To Reproduce

  • load SWAPI turtle: swapi.txt
  • Run the above query
  • Examine the results, or save to ntriples and repeat the analysis above using shell commands

Version

rdf4j 4.3.8 (GraphDB 10.4.2)

Are you interested in contributing a solution yourself?

None

Anything else?

No response

@VladimirAlexiev VladimirAlexiev added the 🐞 bug issue is a bug label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug issue is a bug
Projects
None yet
Development

No branches or pull requests

1 participant