-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Support lateral joins and parent_as in association's joins_query #4401
base: master
Are you sure you want to change the base?
[Discussion] Support lateral joins and parent_as in association's joins_query #4401
Conversation
Without having too much time to look at it right now, my feeling is we can probably support something like this but in a simpler way. Since we already generate queries for preloaded associations, we should be able to do wrap it in a subquery when we detect a lateral join. And I believe we already have the binding since this will have to be a single query preload so we should be able to use that in parent_as. |
For preloads I'm not so worried about it, for me it's mainly about the |
So the idea is that, if someone does lateral_join: x in assoc(x, ...), we automatically write it as lateral join? Or what if we introduce a lateral_assoc helper? |
The usage of my custom assoc works like this: in the schema: defmodule Book do
import Book.CommentStatsAssoc
schema "book" do
comment_stats :comment_stats
end
end and in the query: from b in Book, join: s in assoc(b, :comment_stats) then the query would be planned like this: from b in Book,
as: :book, #alias is actually a ref added by the planner
inner_lateral_join: s in subquery(
from c in Comment,
where: c.book_id == parent_as(:book).id,
select: %{count: count(c.id)}
),
on: true For me, it would be nice if I didn't have to write inner_lateral_join in the query, since it's deeply embedded in our codebase already. However, I can see how having your join "upgraded" to a lateral join could also be confusing for developers. To be able to define a lateral joined association directly in the schema would be great too! |
I also thought about the problem with the parent_as, maybe instead of adding an alias to the parent table all the time, the planner could transform an alias found in the query returned by def joins_query(assoc) do
from b in Book, as: :__lateral_assoc_book__,
inner_lateral_join: s in subquery(
from c in Comment,
where: c.book_id == parent_as(:__lateral_assoc_book__).id,
select: %{count: count(c.id)}
)
end This is what I initially tried to write, but then I saw that the planner only takes the joins from the returned query and discards anything else. If the aliases were transformed as well, it could work (also if the lateral join references another join instead of the root table). Then we don't need to introduce another callback with more arity. |
The issue is that the Ecto.Association API is private, we use it internally, but it isn't public. So I am worried about adding new callbacks that ourselves don't use, it would even be hard to test them. So I would prefer to surface the feature using proper APIs with some docs. :) |
Yeah adding a new callback isn't my favorite solution either. I think transforming the aliases is a much nicer and more readable solution. I can also look into how a lateral helper could work which uses the new capabilities internally. schema do
lateral_one :comment_stats do
from b in Book,
as: :book,
inner_lateral_join: s in subquery(
from c in Comment,
where: c.book_id == parent_as(:book).id,
select: %{count: count(c.id)}
),
on: true
end
end and a similar |
It feels off to me to force the join type in the schema. It's so common to want to use the same table for many different joins that I think people will almost always have to duplicate the association just to do what they want. |
As soon as you have this association that uses a subquery with a filter that references some column from other queries, you can't use anything but lateral joins in SQL anyway. It should be clear to users that a lateral association can only be used with lateral joins. |
Right, which is why I proposed |
For example:
sometimes I want the association to be joined in an inner join sometimes I want the association to be joined in a lateral join If I define it in the schema then I need 2 associations. If the join is specified in the query then I only need 1 association and I can have a helper that wraps it in a subquery and uses parent_as when I want the lateral join. |
How would you join a subquery that can only be used in a lateral join with an inner join? I feel like we're not on the same page. |
Something you might not be aware of is that today Ecto from p in Post, preload: :comments So what I am saying is that there is a helper that lets you leverage this query when you want to use the association in a lateral join. It should wrap it in a subquery and deal with I think you are very fixed on the subquery being part of the association definition. That is exactly what I am saying shouldn't happen. |
Now you can create a custom Ecto.Association that uses lateral joins:
Motivation
Today I've been playing around with a custom Ecto.Association (as a module behaviour) to solve a performance issue.
The gist of the performance issue is that we've been using a postgres view which has a schema in elixir, and we define a simple has_one association from another schema to that view, to be able to preload this view or sometimes add joins to it.
The problem with this approach is that due to the nature of the view's query, the postgresql planner doesn't filter the rows by the foreign key written in the ON-clause of the join, but enumerates all the records of the view in memory (120k on our machine) and only then applies the filter from the on clause (leaving only one record).
Of course, this takes a lot of time, about 1 seconds for this query vs. about 1ms for an optimized query that uses only index scans.
The optimized query uses
JOIN LATERAL
and applies the filter inside the subquery, so the planner can use indexes on that table.It would be a very simple fix for us, if custom associations could support lateral joins and parent_as, because we'd only have to write the assoc and leave the rest of the source code untouched.
In my opinion, only 2 changes are necessary, and I've already written some code in ecto that shows promising results.
Both changes are in the query planner, when it calls the joins_query callback of Ecto.Association:
First, the planner needs to merge a desired lateral join from the association with the desired join direction from the query. Only left and inner lateral joins are supported, so if the query wants an outer or right join, while the assoc wants a lateral join, we can raise some error.
Secondly, to support referencing a column from the assoc source, there needs to be some way to tell the joins_query callback the index of the assoc source. I solved this by adding a second argument to the callback, for backwards compatibility, the unary joins_query callback will be called when the binary isn't defined. But maybe there's a better solution for this that doesn't need to change the API so much.
I submit this PR after 3 hours of work, before putting in more effort, to see if it would be an acceptable way of solving this problem first.