Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic inner joins can create excessive number of requests #1196

Open
jeswr opened this issue Apr 13, 2023 · 3 comments
Open

Dynamic inner joins can create excessive number of requests #1196

jeswr opened this issue Apr 13, 2023 · 3 comments

Comments

@jeswr
Copy link
Member

jeswr commented Apr 13, 2023

Issue type:

As shown here performing dynamic inner joins can result in orders of magnitude more requests to a TPF endpoint than required to select the entire dataset. This can cause major performance degradations as network latency is the main bottleneck in such situations.

We need to find a way to make sure that cases such as those in the example repo perform the join by just doing an inner join on the patterns ?s ex:worksFor ?o1 and ?o1 ex:name ?o; rather than doing a dynamic inner join.

  • 🐌 Performance issue

Description:


Environment:

@github-actions
Copy link

Thanks for reporting!

@rubensworks rubensworks added this to Triage in Maintenance Apr 13, 2023
@rubensworks
Copy link
Member

Looks similar to #548.

Note that this is mainly a research problem (for which some solutions exist already), not really an implementation problem.

Adding it to the list in #846

@jeswr
Copy link
Member Author

jeswr commented Apr 14, 2023

Note that this is mainly a research problem (for which some solutions exist already), not really an implementation problem.

Yup - my 2c on what seems to be a key missing heuristic is the estimated time required to retrieve all the data for a quad pattern (which would be Math.ceil(cardinality / pagesize)); and for dynamic joins estimating the time required to do the join in terms of number of request required (which would be (cardinality of first stream) * Math.ceil( (approx cardinality of each stream requested as part of the join) / (pageSize) )).

Is there a way of achieving this as part of the addition of Dataset cardinality work in #1194?

@rubensworks rubensworks moved this from Triage to To Do (prio:low) in Maintenance Apr 19, 2024
@rubensworks rubensworks removed this from To Do (prio:low) in Maintenance Apr 19, 2024
@rubensworks rubensworks added this to To do (prio:low) in Development via automation Apr 19, 2024
@rubensworks rubensworks removed this from To do (prio:low) in Development Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants