Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-or-more paths with variables slow due to lack of distinct subjects index #1286

Open
ericprud opened this issue Oct 31, 2023 · 12 comments
Open

Comments

@ericprud
Copy link

Issue type:

  • 🐌 Performance issue

Description:

Running a query on 804 tree-ish Triples takes 150-180 seconds (vs. about 1 second in Jena).

time ./node_modules/.bin/comunica-sparql \
  http://localhost/checkouts/ericprud/AMIA-2023/playground/manifests/{Observation/smoker-1_smoking-2023-06-20.ttl,Patient/smoker-1.ttl,Observation/smoker-1_smoking-2022-05-19.ttl,Observation/smoker-1_A1c.ttl,DiagnosticReport/smoker-1_history_1924-11-23.ttl} \
  -f ~/checkouts/ericprud/AMIA-2023/playground/manifests/smoker-codings.rq 
[
{ "patient":"http://localhost/checkouts/ericprud/AMIA-2023/playground/manifests/Patient/smoker-1",
  "smokingYears":"\"31\"^^http://www.w3.org/2001/XMLSchema#decimal",
  "birthDate":"\"1924-11-23\"^^http://www.w3.org/2001/XMLSchema#date",
  "packsPerDay":"\"0.25\"^^http://www.w3.org/2001/XMLSchema#decimal",
  "patientAge":"\"98.93856947296372\"^^http://www.w3.org/2001/XMLSchema#decimal",
  "obsId":"\"smoker-1_smoking-2023-06-20\"",
  "smokingAssessmentDate":"\"2023-06-20T00:00:00-07:00\"^^http://www.w3.org/2001/XMLSchema#dateTime",
  "packYears":"\"7\"^^http://www.w3.org/2001/XMLSchema#decimal",
  "dateQuit":"\"2014-07-07T00:14:15-07:00\"^^http://www.w3.org/2001/XMLSchema#dateTime"
}
]

real	2m49,022s
user	3m18,134s
sys	0m3,375s

I replicated this in a webapp for your clicking pleasure; pick ❨a smoker❩ on the left and then ❨smoker-coding❩ on the right. The ⟦Run⟧ button will be updated with the run time so no need to pull out a stopwatch.

That query lightly exercised a sub-select because it included both Observation/smoker-1_smoking-2022-05-19 and Observation/smoker-1_smoking-2023-06-20. If I eliminate one of those, the query takes ~4s. You can replicate this in the webapp by selecting Observation/smoker-1_smoking-2022-05-19 in the drop-down to the right of "Input Data", clearing it out, and running the smoker-coding query. (If you clear out Observation/smoker-1_smoking-2023-06-20 instead, it'll still run in 4s, but you'll get no results 'cause of a filter looking for recent observations).

Running in the debugger gave me the impression that the default config has a lot of actors (Federated, Hypermedia, StringSource). custom_config_cli suggested to me that I could strip some of that out in a custom config. I fail whenever I try to point at a different config:

/tmp/toy/comunica-sparql-query-rdfjs-toy/node_modules/@comunica/actor-init-query/lib/QueryEngineBase.js:72
            if (this.actorInitQuery.contextKeyShortcuts[key]) {
                                                       ^
TypeError: Cannot read properties of undefined (reading 'sources')
...

with e.g.

new QueryEngine({ configPath: Path.join(__dirname, './query-sparql-rdfjs-default-config.json')}```

Environment:

@github-actions
Copy link

Thanks for reporting!

@rubensworks rubensworks added this to Triage in Maintenance Oct 31, 2023
@rubensworks
Copy link
Member

Ouch, that's a bit too slow indeed.

I suspect that it will caused by one of the following:

  • The subquery with limit might not actually get limited
  • The optional block is causing issues
  • The BIND is causing issues with query planning
  • The filter is not pushed down

@ericprud
Copy link
Author

ericprud commented Nov 1, 2023

Ouch, that's a bit too slow indeed.

I suspect that it will caused by one of the following:

  • The subquery with limit might not actually get limited
  • The optional block is causing issues
  • The BIND is causing issues with query planning
  • The filter is not pushed down

I banged around in the webapp and eliminated all of the above bullets except for the OPTIONAL from the query and got a result in 92s:

PREFIX fhir: <http://hl7.org/fhir/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?obsId ?patient ?birthDate ?smokingAssessmentDate ?packYears
       ?smokingYears ?packsPerDay ?dateQuit {

      # Observations with code of Tobacco smoking status
      ?obs a fhir:Observation;
        fhir:nodeRole fhir:treeRoot;
        fhir:id [ fhir:v ?obsId ];
        fhir:status [ fhir:v "final" ];
        fhir:code [
          fhir:coding [ rdf:rest*/rdf:first [
            a <http://loinc.org/rdf#72166-2>;
          ] ];
        ];
        fhir:subject [ fhir:link ?patient ];
        fhir:effective [ fhir:v ?smokingAssessmentDate ].
    
      # Ex-smoker (finding)
      ?obs fhir:value [
          fhir:coding [ rdf:rest*/rdf:first [
            a <http://snomed.info/id/8517006> ;
          ] ]
        ].

      # Pack years
      ?obs fhir:component [ rdf:rest*/rdf:first [
        fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
          a <http://snomed.info/id/401201003>
        ] ] ];
        fhir:value [
          fhir:value [ fhir:v ?packYears ];
          fhir:unit [ fhir:v "{PackYears}" ];
          fhir:system [ fhir:v "http://unitsofmeasure.org"^^xsd:anyURI ];
          fhir:code [ fhir:v "{PackYears}" ]
        ];
      ] ] .

      # Smoking years
      ?obs fhir:component [ rdf:rest*/rdf:first [
        fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
          a <http://loinc.org/rdf#88029-4>
        ] ] ];
        fhir:value [
          fhir:value [ fhir:v ?smokingYears ];
          fhir:unit [ fhir:v "Years Used" ];
          fhir:system [ fhir:v "http://unitsofmeasure.org"^^xsd:anyURI ];
          fhir:code [ fhir:v "a" ]
        ];
      ] ].

      # Packs per day
      ?obs fhir:component [ rdf:rest*/rdf:first [
        fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
          a <http://loinc.org/rdf#8663-7>
        ] ] ];
        fhir:value [
          fhir:value [ fhir:v ?packsPerDay ];
          fhir:unit [ fhir:v "Packs/Day" ];
          fhir:system [ fhir:v "http://snomed.info/sct"^^xsd:anyURI ];
          fhir:code [ fhir:v "228963008" ]
        ];
      ] ].

      # Date quit
      OPTIONAL {
        ?obs fhir:component [ rdf:rest*/rdf:first [
          fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
            a <http://loinc.org/rdf#74010-0>
          ] ] ];
          fhir:value [ fhir:v ?dateQuit ];
        ] ].
      }
  ?patient fhir:birthDate [ fhir:v ?birthDate ] .
}

patient smokingYears birthDate packsPerDay obsId smokingAssessmentDate packYears dateQuit
0 https://ericprud.github.io/AMIA-2023/playground/manifests/Patient/smoker-1 31 1924-11-23 0.25 smoker-1_smoking-2023-06-20 2023-06-20T00:00:00-07:00 7
1 https://ericprud.github.io/AMIA-2023/playground/manifests/Patient/smoker-1 31 1924-11-23 0.25 smoker-1_smoking-2022-05-19 2022-05-19T00:00:00-07:00 7

Getting rid of the OPTIONAL didn't change anything, nor did removing the join on patient.

Eliminating the part of the query that matches the last component (conveniently at the bottom) reduced it to 24s

      # Packs per day
      ?obs fhir:component [ rdf:rest*/rdf:first [
        fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
          a <http://loinc.org/rdf#8663-7>
        ] ] ];
        fhir:value [
          fhir:value [ fhir:v ?packsPerDay ];
          fhir:unit [ fhir:v "Packs/Day" ];
          fhir:system [ fhir:v "http://snomed.info/sct"^^xsd:anyURI ];
          fhir:code [ fhir:v "228963008" ]
        ];
      ] ].

Getting rid of the previous one didn't change anything but the one before that (pack years) brought it down to 1.3s:

      # Pack years
      ?obs fhir:component [ rdf:rest*/rdf:first [
        fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
          a <http://snomed.info/id/401201003>
        ] ] ];
        fhir:value [
          fhir:value [ fhir:v ?packYears ];
          fhir:unit [ fhir:v "{PackYears}" ];
          fhir:system [ fhir:v "http://unitsofmeasure.org"^^xsd:anyURI ];
          fhir:code [ fhir:v "{PackYears}" ]
        ];
      ] ] .

These numbers (especially the shorter ones which aren't so tedious to cycle) are stable to within 15%.

Finally, removing the 1st component got us to .4s:

      # Pack years
      ?obs fhir:component [ rdf:rest*/rdf:first [
        fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
          a <http://snomed.info/id/401201003>
        ] ] ];
        fhir:value [
          fhir:value [ fhir:v ?packYears ];
          fhir:unit [ fhir:v "{PackYears}" ];
          fhir:system [ fhir:v "http://unitsofmeasure.org"^^xsd:anyURI ];
          fhir:code [ fhir:v "{PackYears}" ]
        ];
      ] ] .

Re-adding the bottom OPTIONAL (Date quit) brought it up to 1s; re-adding the 2nd to last (packs/day) got it up to 4.4s.
There's something exponential here. Any ideas?

@rubensworks
Copy link
Member

I suspect the rdf:rest*/rdf:first property path expression may be the main cause of the issue here.
Some optimization effort should be invested in that part.

Concretely, executing this part requires knowledge of all distinct subjects in the triple store.
But Comunica's default indexes only enable triple pattern queries, so requesting distinct subjects is quite expensive.
So the bottleneck is probably here: https://github.com/comunica/comunica/blob/master/packages/actor-query-operation-path-zero-or-more/lib/ActorQueryOperationPathZeroOrMore.ts#L35-L41

@ericprud
Copy link
Author

ericprud commented Nov 6, 2023

Can you think of some horrible hack I can try in order to get through a demo on the 10th? I'm currently using

<script src="//rdf.js.org/comunica-browser/versions/v2/engines/query-sparql/comunica-browser.js"></script>

but could build locally or fiddle with confs (though I've not successfully followed the instructions to customize the config yet).

@rubensworks
Copy link
Member

There's no easy solution atm I'm afraid, besides avoiding zero-or-more property path expressions.

@ericprud
Copy link
Author

ericprud commented Nov 6, 2023

Many thanks for spending cycles on this. I'll see if I can push the magic into Java in time. Barring that, some hand-waving and slide-ware...

@ericprud
Copy link
Author

ericprud commented Nov 6, 2023

I picked a shorter (~5s) query:

PREFIX fhir: <http://hl7.org/fhir/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?obsId ?patient ?birthDate ?smokingAssessmentDate
 ?packYears
# ?smokingYears
# ?packsPerDay
 ?dateQuit
 ?patientAge {

  {
    SELECT ?obsId ?patient ?smokingAssessmentDate
 ?packYears
# ?smokingYears
# ?packsPerDay
 ?dateQuit
 {

      # Observations with code of Tobacco smoking status
      ?obs a fhir:Observation;
        fhir:nodeRole fhir:treeRoot;
        fhir:id [ fhir:v ?obsId ];
        fhir:status [ fhir:v "final" ];
        fhir:code [
          fhir:coding [ rdf:rest*/rdf:first [
            a <http://loinc.org/rdf#72166-2>;
          ] ];
        ];
        fhir:subject [ fhir:link ?patient ];
        fhir:effective [ fhir:v ?smokingAssessmentDate ].

      # Pack years
      ?obs fhir:component [ rdf:rest*/rdf:first [
        fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
          a <http://snomed.info/id/401201003>
        ] ] ];
        fhir:value [
          fhir:value [ fhir:v ?packYears ];
          fhir:unit [ fhir:v "{PackYears}" ];
          fhir:system [ fhir:v "http://unitsofmeasure.org"^^xsd:anyURI ];
          fhir:code [ fhir:v "{PackYears}" ]
        ];
      ] ] .

      # Date quit
      OPTIONAL {
        ?obs fhir:component [ rdf:rest*/rdf:first [
          fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
            a <http://loinc.org/rdf#74010-0>
          ] ] ];
          fhir:value [ fhir:v ?dateQuit ];
        ] ].
      }

    }
    ORDER BY DESC(?smokingAssessmentDate)
    LIMIT 1
  }

  # Assessment within last year
  BIND( day(NOW()) - day(?smokingAssessmentDate)
        + 365.25/12*(month(NOW()) - month(?smokingAssessmentDate)
        + 12*(year(NOW()) - year(?smokingAssessmentDate)))
       AS ?smokingAssessmentAge)
  FILTER ( ?smokingAssessmentAge < 365.25 )

  # Patient in age range
  ?patient fhir:birthDate [ fhir:v ?birthDate ] .
  BIND( (day(NOW()) - day(?birthDate)
         + 365.25/12*(month(NOW()) - month(?birthDate)
         + 12*(year(NOW()) - year(?birthDate)))
        )/365.25
       AS ?patientAge)
  FILTER (?patientAge > 50 && ?patientAge < 100)

  # ... Diagnosis, Service, ServiceRequest
}
and ran Chromium's profiler. I picked a longish block at random:

codings-mini-profile
and expanded everything longer than 2ms:
codings-mini-profile-expanded

@ericprud
Copy link
Author

ericprud commented Nov 6, 2023

I got all obsessive and chased down all those rolled-up lines above (you'll want ~280 columns or more to see this; observing a wide screen from low earth orbit should do the job):

0.0ms 0.0 % 21.9ms 100.0% Timer Fired                                    comunica-browser.js:1538697	taskscheduler:9					createTaskScheduler	queueMicrotask : (task) => resolved.then(task);
1.7ms 7.6 % 21.9ms 100.0%   Run Microtasks                                                                                      			
0.0ms 0.0 %  8.6ms  39.2%     mediate                                    comunica-browser.js:373613	Mediator.ts:86					mediate			return actor.runObservable(action);
0.2ms 0.7 %  8.1ms  36.9%       runObservable                            comunica-browser.js:370161	Actor.ts:83					runObservable		const output: Promise<O> = this.run(action);
0.0ms 0.0 %  7.9ms  36.1%         run                                    comunica-browser.js:336814	ActorQueryOperationTyped.ts:56			run			const output: IQueryOperationResult = await this.runOperation(operation, subContext);
0.0ms 0.0 %  5.2ms  23.5%           runOperation                         comunica-browser.js:135780	ActorQueryOperationQuadpattern.ts:221		runOperation		const result = await this.mediatorResolveQuadPattern.mediate({ pattern: patternInner, context });
0.0ms 0.0 %  2.8ms  12.9%             mediate                            comunica-browser.js:373613	Mediator.ts:85					mediate			const actor: A = await this.mediateActor(action);
0.0ms 0.0 %  2.8ms  12.9%               mediateActor                     comunica-browser.js:373547	Mediator.ts:70					mediateActor		return await this.mediateWith(action, this.publish(action));
0.0ms 0.0 %  2.7ms  12.2%                 publish                        comunica-browser.js:373384	Mediator.ts:52					publish			const actors: IActorReply<A, I, T, O>[] = this.bus.publish(action);
0.0ms 0.0 %  2.7ms  12.2%                   publish                      comunica-browser.js:371384	Bus.ts:109					publish			return this.actors.map((actor: A): IActorReply<A, I, T, O> => ({ actor, reply: actor.test(action) }));
0.0ms 0.0 %  2.7ms  12.2%                     (anonymous)                comunica-browser.js:371412	
2.0ms 9.2 %  2.0ms   9.2%                       test                     comunica-browser.js:305138	ActorRdfResolveQuadPatternStringSource.ts:34	test			const source = getContextSource(action.context);
0.7ms 3.0 %  0.7ms   3.0%                       test                     comunica-browser.js:290624	ActorRdfResolveQuadPatternHypermedia.ts:63	test			const sources = hasContextSingleSource(action.context);
0.0ms 0.0 %  0.2ms   0.7%               mediateWith                      comunica-browser.js:473577	MediatorRace:14					mediateWith		return new Promise((resolve, reject) => {...
0.2ms 0.7 %  0.2ms   0.7%                 (anonymous)                    comunica-browser.js:473603	                                		
0.0ms 0.0 %  2.2ms   9.8%             (anonymous)                        comunica-browser.js:135826	ActorQueryOperationJoin.ts:31			runOperation λ		output: await this.mediatorQueryOperation.mediate({ operation: subOperation, context }),
0.0ms 0.0 %  2.2ms   9.8%               mediate                          comunica-browser.js:373613	Mediator.ts:85					mediate			const actor: A = await this.mediateActor(action);
0.0ms 0.0 %  2.2ms   9.8%                 mediateActor                   comunica-browser.js:373547	Mediator.ts:70					mediateActor		return await this.mediateWith(action, this.publish(action));
0.0ms 0.0 %  2.0ms   9.1%                   publish                      comunica-browser.js:373384	Mediator.ts:52					publish			const actors: IActorReply<A, I, T, O>[] = this.bus.publish(action);
0.0ms 0.0 %  2.0ms   9.1%                     publish                    comunica-browser.js:372691	Busindexed.ts:64				publish			return actors.map((actor: A): IActorReply<A, I, T, O> => ({ actor, reply: actor.test(action) }));
0.0ms 0.0 %  2.0ms   9.1%                       (anonymous)              comunica-browser.js:372810	Busindexed.ts:64				publish λ		return actors.map((actor: A): IActorReply<A, I, T, O> => ({ actor, reply: actor.test(action) }));
1.0ms 4.6 %  2.0ms   9.1%                         test                   comunica-browser.js:168404	ActorQueryOperationSparqlEndpoint.ts:60	test			
1.0ms 4.5 %  1.0ms   4.5%                           testOperation        comunica-browser.js:86735 	ActorAbstractPath.ts:41				testOperation		operation.predicate.type !== this.predicateType
0.0ms 0.0 %  0.2ms   0.8%                   mediateWith                  comunica-browser.js:472522	MediatorNumber.ts:49				mediateWith		
0.0ms 0.0 %  0.2ms   0.8%             isPathArbitraryLengthDistinct      comunica-browser.js:87060 	ActorAbstractPath.ts:67				isPathArbitraryLengthDistinct
0.0ms 0.0 %  1.8ms   8.1%           runOperation                         comunica-browser.js:123559	ActorQueryOperationDistinctHash.ts:27
0.0ms 0.0 %  0.7ms   3.0%           runOperation                         comunica-browser.js:161434	ActorQueryOperationQuadpattern.ts:204
0.2ms 0.8 %  0.2ms   0.8%           getSource                            comunica-browser.js:285313	ActorRdfResolveQuadPatternFederated.ts:78
0.1ms 0.7 %  0.2ms   0.7%           runOperation                         comunica-browser.js:145925	ActorQueryOperationPathLink.ts:18
0.0ms 0.0 %  0.5ms   2.3%       (anonymous)                              comunica-browser.js:468952	MediatorCombineUnion.ts:31			mediate λ		const results: O[] = await Promise.all(testResults.map(result => result.actor.runObservable(action)));
0.0ms 0.0 %  6.7ms  30.5%     run                                        comunica-browser.js:356101	ActorRdfResolveQuadPatternSource.ts:27					return await this.getOutput(source, action.pattern, action.context);
0.0ms 0.0 %  6.7ms  30.5%       getOutput                                comunica-browser.js:356221	ActorRdfResolveQuadPatternSource.ts:41					const data = source.match(pattern.subject, pattern.predicate, pattern.object, pattern.graph, context);
0.0ms 0.0 %  4.6ms  21.0%         match                                  comunica-browser.js:287470	FederatedQuadSource.ts:230			match			const proxyIt: Promise<AsyncIterator<RDF.Quad>[]> = Promise.all(this.sources.map(async(source, sourceIndex) => {
0.0ms 0.0 %  2.1ms   9.8%           (anonymous)                          comunica-browser.js:287997	FederatedQuadSource.ts:262			match λ			output = await this.mediatorResolveQuadPattern.mediate({ pattern, context });
0.0ms 0.0 %  2.0ms   9.2%             mediate                            comunica-browser.js:373613	Mediator.ts:85					mediate			const actor: A = await this.mediateActor(action);
0.0ms 0.0 %  2.0ms   9.2%               mediateActor                     comunica-browser.js:373547	Mediator.ts:70					mediateActor		return await this.mediateWith(action, this.publish(action));
0.0ms 0.0 %  2.0ms   9.2%                 publish                        comunica-browser.js:373384	Mediator.ts:52					publish			const actors: IActorReply<A, I, T, O>[] = this.bus.publish(action);
0.0ms 0.0 %  2.0ms   9.2%                   publish                      comunica-browser.js:371384	Bus.ts:109					publish			return this.actors.map((actor: A): IActorReply<A, I, T, O> => ({ actor, reply: actor.test(action) }));
0.0ms 0.0 %  2.0ms   9.2%                     (anonymous)                comunica-browser.js:371412	
1.9ms 8.5 %  1.9ms   8.5%                       test                     comunica-browser.js:285145	ActorRdfResolveQuadPatternFederated.ts:70	test			const sources = getContextSources(action.context);
0.2ms 0.7 %  0.2ms   0.7%                       test                     comunica-browser.js:305138	ActorRdfResolveQuadPatternStringSource.ts:34	test			const source = getContextSource(action.context);
0.1ms 0.6 %  0.1ms   0.6%             test                               comunica-browser.js:302007	ActorRdfResolveQuadPatternRdfJsSource.ts:19	test
0.0ms 0.0 %  1.8ms   8.0%           setMetadata                          comunica-browser.js:303522	RdfJsQuadSource.ts:74				test			cardinality = await this.source.countQuads(
0.2ms 0.7 %  1.8ms   8.0%             countQuads                         n3.min.js:49264           	<no sourcemap>
1.6ms 7.3 %  1.6ms   7.3%               _countInIndex                    n3.min.js:44912           	
0.0ms 0.0 %  0.7ms   3.1%           i                                    comunica-browser.js:340636	ClosableTransformIterator.ts:14	constructor
0.0ms 0.0 %  2.1ms   9.5%         match                                  comunica-browser.js:303097	RdfJsQuadSource.ts:55				match			this.setMetadata(it, subject, predicate, object, graph)
0.0ms 0.0 %  1.9ms   8.8%           setMetadata                          comunica-browser.js:303522	RdfJsQuadSource.ts:74				test			cardinality = await this.source.countQuads(
0.0ms 0.0 %  1.9ms   8.8%             countQuads                         n3.min.js:49264           	
1.9ms 8.8 %  1.9ms   8.8%               _countInIndex                    n3.min.js:44912           	
0.0ms 0.0 %  0.1ms   0.7%           match                                n3.min.js:49144           	
1.4ms 6.6 %  2.7ms  12.1%     Function Call                                                        	
0.0ms 0.0 %  0.9ms   4.3%       (anonymous)                              comunica-browser.js:1520267	
0.0ms 0.0 %  0.9ms   4.3%         a.emit                                 comunica-browser.js:621787	
0.0ms 0.0 %  0.2ms   1.0%       (anonymous)                              comunica-browser.js:1520127	
0.0ms 0.0 %  0.0ms   0.2%       _init                                    comunica-browser.js:1525843	
0.0ms 0.0 %  0.0ms   0.0%       (anonymous)                              comunica-browser.js:1525573	
0.0ms 0.0 %  0.0ms   0.0%       a.emit                                   comunica-browser.js:621787	
0.4ms 1.7 %  1.2ms   5.3%     (anonymous)                                comunica-browser.js:153731	ActorQueryOperationPathZeroOrMore.ts:75 OR 92	 			await this.getSubjectAndObjectBindingsPredicateStar(
0.0ms 0.0 %  0.7ms   3.0%       getSubjectAndObjectBindingsPredicateStar comunica-browser.js:89125 	ActorAbstractPath.ts:311
0.0ms 0.0 %  0.5ms   2.3%         (anonymous)                            comunica-browser.js:89449 	ActorAbstractPath.ts:317						await this.mediatorQueryOperation.mediate({ operation: path, context }),
0.0ms 0.0 %  0.5ms   2.3%           mediate                              comunica-browser.js:373613	Mediator.ts:85					mediate			const actor: A = await this.mediateActor(action);
0.0ms 0.0 %  0.5ms   2.3%             mediateActor                       comunica-browser.js:373547	Mediator.ts:70					mediateActor		return await this.mediateWith(action, this.publish(action));
0.0ms 0.0 %  0.5ms   2.3%               publish                          comunica-browser.js:373384	Mediator.ts:52					publish			const actors: IActorReply<A, I, T, O>[] = this.bus.publish(action);
0.0ms 0.0 %  0.5ms   2.3%                 publish                        comunica-browser.js:372691	Busindexed.ts:67				publish			return actors.map((actor: A): IActorReply<A, I, T, O> => ({ actor, reply: actor.test(action) }));
0.0ms 0.0 %  0.5ms   2.3%                   (anonymous)                  comunica-browser.js:372810	Busindexed.ts:67				publish λ		                  (actor: A): IActorReply<A, I, T, O> => ({ actor, reply: actor.test(action) }
-.2ms 1.1 %  0.5ms   2.3%                     test                       comunica-browser.js:336497	ActorQueryOperationTyped.ts:35						return this.testOperation(operation, action.context);
0.2ms 0.8 %  0.2ms   0.8%                       testOperation            comunica-browser.js:86735 	ActorAbstractPath.ts:41				testOperation		operation.predicate.type !== this.predicateType
0.1ms 0.4 %  0.1ms   0.4%                       Function Call                                      	
0.2ms 0.7 %  0.2ms   0.7%           i                                    comunica-browser.js:1036184	TermUtil.js:25					termToString
0.1ms 0.6 %  0.1ms   0.6%         Function Call                                                    	
0.0ms 0.0 %  0.3ms   1.5%       (anonymous)                              comunica-browser.js:1528044	
0.0ms 0.0 %  0.3ms   1.2%     runOperation                               comunica-browser.js:135780	ActorQueryOperationJoin.ts:29			runOperation		const entries: IJoinEntry[] = (await Promise.all(operationOriginal.input
0.0ms 0.0 %  0.2ms   0.7%     setMetadata                                comunica-browser.js:303522	
0.0ms 0.0 %  0.2ms   0.7%     runOperation                               comunica-browser.js:152942	
0.0ms 0.0 %  0.2ms   0.7%     (anonymous)                                comunica-browser.js:287997	
0.0ms 0.2 %  0.1ms   0.3%     Optimize Code
ActorAbstractPath.ts:41 testOperation operation.predicate.type !== this.predicateType

appears to use about 4.5% of the time. Around twice that is used in any of:

ActorRdfResolveQuadPatternStringSource.ts:34 test const source = getContextSource(action.context);
ActorRdfResolveQuadPatternFederated.ts:70 test const sources = getContextSources(action.context);
N3 countQuads

@rubensworks
Copy link
Member

Thanks for the additional details @ericprud!
This seems to confirm my earlier hypothesis that the bottleneck is determining the distinct subjects in the datasource.

@rubensworks rubensworks changed the title query exponentially slower on subselect/limit 1 with two matches Zero-or-more paths with variables slow due to lack of distinct subjects index Nov 7, 2023
@RickBioInf
Copy link

@rubensworks were you able to work on this issue? We have it as well.

@rubensworks
Copy link
Member

@RickBioInf No. But you're welcome to submit a PR :-)

@rubensworks rubensworks moved this from Triage to To Do (prio:medium) in Maintenance Apr 19, 2024
@rubensworks rubensworks removed this from To Do (prio:medium) in Maintenance Apr 19, 2024
@rubensworks rubensworks added this to To do (prio:low) in Development via automation Apr 19, 2024
@rubensworks rubensworks moved this from To do (prio:low) to To do (prio:high) in Development Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development
  
To do (prio:high)
Development

No branches or pull requests

3 participants