New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero-or-more paths with variables slow due to lack of distinct subjects index #1286
Comments
Thanks for reporting! |
Ouch, that's a bit too slow indeed. I suspect that it will caused by one of the following:
|
I banged around in the webapp and eliminated all of the above bullets except for the OPTIONAL from the query and got a result in 92s: PREFIX fhir: <http://hl7.org/fhir/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?obsId ?patient ?birthDate ?smokingAssessmentDate ?packYears
?smokingYears ?packsPerDay ?dateQuit {
# Observations with code of Tobacco smoking status
?obs a fhir:Observation;
fhir:nodeRole fhir:treeRoot;
fhir:id [ fhir:v ?obsId ];
fhir:status [ fhir:v "final" ];
fhir:code [
fhir:coding [ rdf:rest*/rdf:first [
a <http://loinc.org/rdf#72166-2>;
] ];
];
fhir:subject [ fhir:link ?patient ];
fhir:effective [ fhir:v ?smokingAssessmentDate ].
# Ex-smoker (finding)
?obs fhir:value [
fhir:coding [ rdf:rest*/rdf:first [
a <http://snomed.info/id/8517006> ;
] ]
].
# Pack years
?obs fhir:component [ rdf:rest*/rdf:first [
fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
a <http://snomed.info/id/401201003>
] ] ];
fhir:value [
fhir:value [ fhir:v ?packYears ];
fhir:unit [ fhir:v "{PackYears}" ];
fhir:system [ fhir:v "http://unitsofmeasure.org"^^xsd:anyURI ];
fhir:code [ fhir:v "{PackYears}" ]
];
] ] .
# Smoking years
?obs fhir:component [ rdf:rest*/rdf:first [
fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
a <http://loinc.org/rdf#88029-4>
] ] ];
fhir:value [
fhir:value [ fhir:v ?smokingYears ];
fhir:unit [ fhir:v "Years Used" ];
fhir:system [ fhir:v "http://unitsofmeasure.org"^^xsd:anyURI ];
fhir:code [ fhir:v "a" ]
];
] ].
# Packs per day
?obs fhir:component [ rdf:rest*/rdf:first [
fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
a <http://loinc.org/rdf#8663-7>
] ] ];
fhir:value [
fhir:value [ fhir:v ?packsPerDay ];
fhir:unit [ fhir:v "Packs/Day" ];
fhir:system [ fhir:v "http://snomed.info/sct"^^xsd:anyURI ];
fhir:code [ fhir:v "228963008" ]
];
] ].
# Date quit
OPTIONAL {
?obs fhir:component [ rdf:rest*/rdf:first [
fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
a <http://loinc.org/rdf#74010-0>
] ] ];
fhir:value [ fhir:v ?dateQuit ];
] ].
}
?patient fhir:birthDate [ fhir:v ?birthDate ] .
}
Getting rid of the OPTIONAL didn't change anything, nor did removing the join on patient. Eliminating the part of the query that matches the last component (conveniently at the bottom) reduced it to 24s # Packs per day
?obs fhir:component [ rdf:rest*/rdf:first [
fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
a <http://loinc.org/rdf#8663-7>
] ] ];
fhir:value [
fhir:value [ fhir:v ?packsPerDay ];
fhir:unit [ fhir:v "Packs/Day" ];
fhir:system [ fhir:v "http://snomed.info/sct"^^xsd:anyURI ];
fhir:code [ fhir:v "228963008" ]
];
] ]. Getting rid of the previous one didn't change anything but the one before that (pack years) brought it down to 1.3s: # Pack years
?obs fhir:component [ rdf:rest*/rdf:first [
fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
a <http://snomed.info/id/401201003>
] ] ];
fhir:value [
fhir:value [ fhir:v ?packYears ];
fhir:unit [ fhir:v "{PackYears}" ];
fhir:system [ fhir:v "http://unitsofmeasure.org"^^xsd:anyURI ];
fhir:code [ fhir:v "{PackYears}" ]
];
] ] . These numbers (especially the shorter ones which aren't so tedious to cycle) are stable to within 15%. Finally, removing the 1st component got us to .4s: # Pack years
?obs fhir:component [ rdf:rest*/rdf:first [
fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
a <http://snomed.info/id/401201003>
] ] ];
fhir:value [
fhir:value [ fhir:v ?packYears ];
fhir:unit [ fhir:v "{PackYears}" ];
fhir:system [ fhir:v "http://unitsofmeasure.org"^^xsd:anyURI ];
fhir:code [ fhir:v "{PackYears}" ]
];
] ] . Re-adding the bottom OPTIONAL (Date quit) brought it up to 1s; re-adding the 2nd to last (packs/day) got it up to 4.4s. |
I suspect the Concretely, executing this part requires knowledge of all distinct subjects in the triple store. |
Can you think of some horrible hack I can try in order to get through a demo on the 10th? I'm currently using <script src="//rdf.js.org/comunica-browser/versions/v2/engines/query-sparql/comunica-browser.js"></script> but could build locally or fiddle with confs (though I've not successfully followed the instructions to customize the config yet). |
There's no easy solution atm I'm afraid, besides avoiding zero-or-more property path expressions. |
Many thanks for spending cycles on this. I'll see if I can push the magic into Java in time. Barring that, some hand-waving and slide-ware... |
I picked a shorter (~5s) query: PREFIX fhir: <http://hl7.org/fhir/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?obsId ?patient ?birthDate ?smokingAssessmentDate
?packYears
# ?smokingYears
# ?packsPerDay
?dateQuit
?patientAge {
{
SELECT ?obsId ?patient ?smokingAssessmentDate
?packYears
# ?smokingYears
# ?packsPerDay
?dateQuit
{
# Observations with code of Tobacco smoking status
?obs a fhir:Observation;
fhir:nodeRole fhir:treeRoot;
fhir:id [ fhir:v ?obsId ];
fhir:status [ fhir:v "final" ];
fhir:code [
fhir:coding [ rdf:rest*/rdf:first [
a <http://loinc.org/rdf#72166-2>;
] ];
];
fhir:subject [ fhir:link ?patient ];
fhir:effective [ fhir:v ?smokingAssessmentDate ].
# Pack years
?obs fhir:component [ rdf:rest*/rdf:first [
fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
a <http://snomed.info/id/401201003>
] ] ];
fhir:value [
fhir:value [ fhir:v ?packYears ];
fhir:unit [ fhir:v "{PackYears}" ];
fhir:system [ fhir:v "http://unitsofmeasure.org"^^xsd:anyURI ];
fhir:code [ fhir:v "{PackYears}" ]
];
] ] .
# Date quit
OPTIONAL {
?obs fhir:component [ rdf:rest*/rdf:first [
fhir:code [ fhir:coding [ rdf:rest*/rdf:first [
a <http://loinc.org/rdf#74010-0>
] ] ];
fhir:value [ fhir:v ?dateQuit ];
] ].
}
}
ORDER BY DESC(?smokingAssessmentDate)
LIMIT 1
}
# Assessment within last year
BIND( day(NOW()) - day(?smokingAssessmentDate)
+ 365.25/12*(month(NOW()) - month(?smokingAssessmentDate)
+ 12*(year(NOW()) - year(?smokingAssessmentDate)))
AS ?smokingAssessmentAge)
FILTER ( ?smokingAssessmentAge < 365.25 )
# Patient in age range
?patient fhir:birthDate [ fhir:v ?birthDate ] .
BIND( (day(NOW()) - day(?birthDate)
+ 365.25/12*(month(NOW()) - month(?birthDate)
+ 12*(year(NOW()) - year(?birthDate)))
)/365.25
AS ?patientAge)
FILTER (?patientAge > 50 && ?patientAge < 100)
# ... Diagnosis, Service, ServiceRequest
}
and ran Chromium's profiler. I picked a longish block at random: |
I got all obsessive and chased down all those rolled-up lines above (you'll want ~280 columns or more to see this; observing a wide screen from low earth orbit should do the job):
appears to use about 4.5% of the time. Around twice that is used in any of:
|
Thanks for the additional details @ericprud! |
subselect/limit 1
with two matches
@rubensworks were you able to work on this issue? We have it as well. |
@RickBioInf No. But you're welcome to submit a PR :-) |
Issue type:
Description:
Running a query on 804 tree-ish Triples takes 150-180 seconds (vs. about 1 second in Jena).
I replicated this in a webapp for your clicking pleasure; pick ❨a smoker❩ on the left and then ❨smoker-coding❩ on the right. The ⟦Run⟧ button will be updated with the run time so no need to pull out a stopwatch.
That query lightly exercised a sub-select because it included both Observation/smoker-1_smoking-2022-05-19 and Observation/smoker-1_smoking-2023-06-20. If I eliminate one of those, the query takes ~4s. You can replicate this in the webapp by selecting
Observation/smoker-1_smoking-2022-05-19
in the drop-down to the right of "Input Data", clearing it out, and running thesmoker-coding
query. (If you clear outObservation/smoker-1_smoking-2023-06-20
instead, it'll still run in 4s, but you'll get no results 'cause of a filter looking for recent observations).Running in the debugger gave me the impression that the default config has a lot of actors (Federated, Hypermedia, StringSource). custom_config_cli suggested to me that I could strip some of that out in a custom config. I fail whenever I try to point at a different config:
with e.g.
Environment:
The text was updated successfully, but these errors were encountered: