Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blank node in algebra can break query #1234

Open
Peeja opened this issue Jul 26, 2023 · 2 comments
Open

Blank node in algebra can break query #1234

Peeja opened this issue Jul 26, 2023 · 2 comments

Comments

@Peeja
Copy link
Contributor

Peeja commented Jul 26, 2023

Issue type:

  • 🐛 Bug

Description:

Given the data:

@prefix swapi: <http://swapi.dev/documentation#>.

_:b0 swapi:eye_color "blue";
     swapi:name "Luke Skywalker".

the query:

PREFIX swapi: <http://swapi.dev/documentation#>
SELECT ?eye_color WHERE {
  [] swapi:name "Luke Skywalker";
    swapi:eye_color ?eye_color .
}

should return a single binding of ?eye_color to "blue". It should return the same binding when the equivalent SPARQL algebra is given instead of the string query. However, in the following tests, while Comunica always returns the binding for the string query, it returns 0 results for the algebra query when:

  • The data was generated by constructing quads manually, rather than by parsing Turtle, or
  • Something else has match()ed the source and read some data from it before the query is executed.

Presumably there's still something to this that I haven't grokked yet, but that's the bizarre behavior I'm seeing right now.

Importantly (and demonstrated below), if you replace the blank node in the query with a variable, the binding is always returned.

Tests
import { Readable } from "node:stream";

import { QueryEngine } from "@comunica/query-sparql-rdfjs";
import { describe, it, expect } from "@jest/globals";
import * as N3 from "n3";
import { translate, type Algebra } from "sparqlalgebrajs";

import type * as RDF from "@rdfjs/types";

/**
 * Read all results from an RDF Stream and return them as a promise of an array.
 */
export const readAll = <R>(stream: RDF.ResultStream<R>) =>
  new Promise<R[]>((resolve) => {
    const quads: R[] = [];
    stream
      .on("data", (result: R) => {
        quads.push(result);
      })
      .on("end", () => {
        resolve(quads);
      });
  });

describe("querying with a string or with algebra", () => {
  const df = N3.DataFactory;
  const engine = new QueryEngine();

  const bindings = async (
    source: RDF.Source,
    sparql: string | Algebra.Operation
  ) =>
    readAll(
      await engine.queryBindings(sparql, {
        sources: [source],
      })
    );

  (
    [
      [
        "blank node",
        /* sparql */ `
          PREFIX swapi: <http://swapi.dev/documentation#>
          SELECT ?eye_color WHERE {
            [] swapi:name "Luke Skywalker";
               swapi:eye_color ?eye_color .
          }
        `,
      ],
      [
        "variable",
        /* sparql */ `
        PREFIX swapi: <http://swapi.dev/documentation#>
        SELECT ?eye_color WHERE {
          ?luke swapi:name "Luke Skywalker";
                swapi:eye_color ?eye_color .
        }
      `,
      ],
    ] as const
  ).forEach(([thing, sparqlString]) => {
    const sparqlAlgebra = translate(sparqlString);

    describe(`using a ${thing} in the query`, () => {
      describe("starting with Quads", () => {
        it("should be the same result", async () => {
          const store = new N3.Store();

          store.addQuads([
            df.quad(
              df.blankNode("b0"),
              df.namedNode("http://swapi.dev/documentation#eye_color"),
              df.literal("blue")
            ),
            df.quad(
              df.blankNode("b0"),
              df.namedNode("http://swapi.dev/documentation#name"),
              df.literal("Luke Skywalker")
            ),
          ]);

          expect(await bindings(store, sparqlAlgebra)).toStrictEqual(
            await bindings(store, sparqlString)
          );
        });
      });

      describe("starting with Turtle", () => {
        const data = /* turtle */ `
          @prefix swapi: <http://swapi.dev/documentation#>.
    
          [] swapi:eye_color "blue";
             swapi:name "Luke Skywalker".
        `;

        it("should be the same result", async () => {
          const store = new N3.Store();
          const parser = new N3.StreamParser();
          store.import(parser.import(Readable.from(data)));

          expect(await bindings(store, sparqlAlgebra)).toStrictEqual(
            await bindings(store, sparqlString)
          );
        });

        it("should still be the same result when an unrelated `match()` is read", async () => {
          const store = new N3.Store();
          const parser = new N3.StreamParser();
          store.import(parser.import(Readable.from(data)));

          // Why does this make the test fail?
          await readAll(store.match());

          expect(await bindings(store, sparqlAlgebra)).toStrictEqual(
            await bindings(store, sparqlString)
          );
        });
      });
    });
  });
});
Full test output
  querying with a string or with algebra
    using a blank node in the query
      starting with Quads
        ✕ should be the same result (31 ms)
      starting with Turtle
        ✓ should be the same result (17 ms)
        ✕ should still be the same result when an unrelated `match()` is read (10 ms)
    using a variable in the query
      starting with Quads
        ✓ should be the same result (9 ms)
      starting with Turtle
        ✓ should be the same result (4 ms)
        ✓ should still be the same result when an unrelated `match()` is read (8 ms)

  ● querying with a string or with algebra › using a blank node in the query › starting with Quads › should be the same result

    expect(received).toStrictEqual(expected) // deep equality

    - Expected  - 20
    + Received  +  1

    - Array [
    -   Bindings {
    -     "dataFactory": DataFactory {
    -       "blankNodeCounter": 0,
    -       "blankNodePrefix": "df_47_",
    -     },
    -     "entries": Immutable.Map {
    -       "eye_color": Object {
    -         "datatype": Object {
    -           "termType": "NamedNode",
    -           "value": "http://www.w3.org/2001/XMLSchema#string",
    -         },
    -         "language": "",
    -         "termType": "Literal",
    -         "value": "blue",
    -       },
    -     },
    -     "type": "bindings",
    -   },
    - ]
    + Array []

      81 |           ]);
      82 |
    > 83 |           expect(await bindings(store, sparqlAlgebra)).toStrictEqual(
         |                                                        ^
      84 |             await bindings(store, sparqlString)
      85 |           );
      86 |         });

      at Object.<anonymous> (src/repro.test.ts:83:56)

  ● querying with a string or with algebra › using a blank node in the query › starting with Turtle › should still be the same result when an unrelated `match()` is read

    expect(received).toStrictEqual(expected) // deep equality

    - Expected  - 20
    + Received  +  1

    - Array [
    -   Bindings {
    -     "dataFactory": DataFactory {
    -       "blankNodeCounter": 0,
    -       "blankNodePrefix": "df_47_",
    -     },
    -     "entries": Immutable.Map {
    -       "eye_color": Object {
    -         "datatype": Object {
    -           "termType": "NamedNode",
    -           "value": "http://www.w3.org/2001/XMLSchema#string",
    -         },
    -         "language": "",
    -         "termType": "Literal",
    -         "value": "blue",
    -       },
    -     },
    -     "type": "bindings",
    -   },
    - ]
    + Array []

      113 |           await readAll(store.match());
      114 |
    > 115 |           expect(await bindings(store, sparqlAlgebra)).toStrictEqual(
          |                                                        ^
      116 |             await bindings(store, sparqlString)
      117 |           );
      118 |         });

      at Object.<anonymous> (src/repro.test.ts:115:56)

Environment:

"@comunica/query-sparql-rdfjs": "2.8.1",
"sparqlalgebrajs": "4.2.0"
@github-actions
Copy link

Thanks for reporting!

@rubensworks rubensworks added this to Triage in Maintenance Jul 26, 2023
@rubensworks
Copy link
Member

This issue occurs because we assume blank nodes in queries to have been rewritten to variables.
(See https://github.com/comunica/comunica/blob/a3b1dbdf14cb0485310fc2fcc271e6361a21c8de/packages/actor-query-parse-sparql/lib/ActorQueryParseSparql.ts#L32C49-L32C64)

I see two solutions to this problem:

  1. We document that algebra objects must not contain blank nodes, and use variables instead.
  2. We add a translation step before query execution in actor-init-query that translates bnodes into variables.

Option 1 is the easiest, but option 2 is probably the cleanest.

@rubensworks rubensworks moved this from Triage to To Do (prio:low) in Maintenance Apr 19, 2024
@rubensworks rubensworks moved this from To Do (prio:low) to To Do (prio:medium) in Maintenance Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Maintenance
  
To Do (prio:medium)
Development

No branches or pull requests

2 participants