New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate bindings on construct query #1300
Comments
Thanks for reporting! |
Strictly speaking, this is not really a bug, since CONSTRUCT queries return RDF graphs, which are sets of triples. The reason why Comunica produces these duplicates is because CONSTRUCT queries are built on top of SELECT queries which instantiates triple templates, and may therefore produce duplicate triples. That being said, I agree that it may sometimes be inconvenient to have these duplicates. So I'm converting this issue to a feature request where CONSTRUCT queries may optionally run in a DISTINCT-mode where duplicate triples will explicitly be removed, at the cost of an increase in memory usage for large results. |
I ran into this issue as well recently. My solution was to put the resulting triples to N3.Store and then iterate the data from there. Not efficient, but it got the job done. |
Just as a suggestion based on experience with jacoscaz/quadstore#155: when implementing the removal of duplicate triples consider giving users a way to set the maximum size of the set so that comunica may have a chance to throw an error and recover instead of potentially crashing the process by making it run out of memory. |
A bounty has been placed on this issue via the Comunica Association (see original post). |
I have done a bit of work toward this on this branch. Changes to packages/actor-init-query/lib/ActorInitQuery.ts were a first step to implementing deduplication using N3.Store. |
Issue type:
Description:
I get unexpected results with
comunica-sparql https://lab.coret.org/rdf/c1.jsonld -q 'CONSTRUCT WHERE { <https://lab.coret.org/id/comunica_testcase_1> ?p ?o ; <http://schema.org/distribution> ?d . ?d ?e ?f }'
. The output contains a lot of duplicate triples (like some kind of Cartesian product):The source graph in Turtle being:
The expected result (as given by Apache Jena and GraphDB):
The issue is not with the comunica-sparql CLI tool, but with the comunica core. This issue is a slimmed down version of the issue as we encounter with the NDE Dataset Register - netwerk-digitaal-erfgoed/dataset-register#831 - where we use Comunica a lot. In this particular case the number of bindings explodes above our set maximum of 50000.
Environment:
Bounty
A bounty has been placed on this issue by:
Click here to learn more if you're interested in claiming this bounty by resolving this issue.
The text was updated successfully, but these errors were encountered: