Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Support lateral joins and parent_as in association's joins_query #4401

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

narrowtux
Copy link
Contributor

@narrowtux narrowtux commented Apr 10, 2024

Now you can create a custom Ecto.Association that uses lateral joins:

defmodule MyAssoc do
  @behaviour Ecto.Association

  def joins_query(%{owner: owner, owner_key: owner_key, related: related} = assoc, join_parent) do
    import Ecto.Query
    subquery = from r in related,
      where: field(r, ^owner_key) == field(parent_as(^join_parent), :id),
      join: p in assoc(r, :profile),
      select: %{
        data: fragment("jsonb_object_agg(?, ?)", p.slug, r.data)
      },
      group_by: field(r, ^owner_key)

    from o in owner, inner_lateral_join: ^subquery(subquery), on: true
  end

  # ...
end

Motivation

Today I've been playing around with a custom Ecto.Association (as a module behaviour) to solve a performance issue.

The gist of the performance issue is that we've been using a postgres view which has a schema in elixir, and we define a simple has_one association from another schema to that view, to be able to preload this view or sometimes add joins to it.
The problem with this approach is that due to the nature of the view's query, the postgresql planner doesn't filter the rows by the foreign key written in the ON-clause of the join, but enumerates all the records of the view in memory (120k on our machine) and only then applies the filter from the on clause (leaving only one record).
Of course, this takes a lot of time, about 1 seconds for this query vs. about 1ms for an optimized query that uses only index scans.

The optimized query uses JOIN LATERAL and applies the filter inside the subquery, so the planner can use indexes on that table.

It would be a very simple fix for us, if custom associations could support lateral joins and parent_as, because we'd only have to write the assoc and leave the rest of the source code untouched.

In my opinion, only 2 changes are necessary, and I've already written some code in ecto that shows promising results.

Both changes are in the query planner, when it calls the joins_query callback of Ecto.Association:

First, the planner needs to merge a desired lateral join from the association with the desired join direction from the query. Only left and inner lateral joins are supported, so if the query wants an outer or right join, while the assoc wants a lateral join, we can raise some error.

Secondly, to support referencing a column from the assoc source, there needs to be some way to tell the joins_query callback the index of the assoc source. I solved this by adding a second argument to the callback, for backwards compatibility, the unary joins_query callback will be called when the binary isn't defined. But maybe there's a better solution for this that doesn't need to change the API so much.

I submit this PR after 3 hours of work, before putting in more effort, to see if it would be an acceptable way of solving this problem first.

@greg-rychlewski
Copy link
Member

Without having too much time to look at it right now, my feeling is we can probably support something like this but in a simpler way.

Since we already generate queries for preloaded associations, we should be able to do wrap it in a subquery when we detect a lateral join. And I believe we already have the binding since this will have to be a single query preload so we should be able to use that in parent_as.

@narrowtux
Copy link
Contributor Author

For preloads I'm not so worried about it, for me it's mainly about the join: x in assoc(y, :z) syntax, that we use all over the codebase for this association to the view.

@josevalim
Copy link
Member

So the idea is that, if someone does lateral_join: x in assoc(x, ...), we automatically write it as lateral join? Or what if we introduce a lateral_assoc helper?

@narrowtux
Copy link
Contributor Author

narrowtux commented Apr 11, 2024

The usage of my custom assoc works like this:

in the schema:

defmodule Book do
  import Book.CommentStatsAssoc 

  schema "book" do
    comment_stats :comment_stats    
  end 
end

and in the query:

from b in Book, join: s in assoc(b, :comment_stats)

then the query would be planned like this:

from b in Book, 
  as: :book, #alias is actually a ref added by the planner
  inner_lateral_join: s in subquery(
    from c in Comment, 
      where: c.book_id == parent_as(:book).id, 
      select: %{count: count(c.id)}
  ),
  on: true

For me, it would be nice if I didn't have to write inner_lateral_join in the query, since it's deeply embedded in our codebase already. However, I can see how having your join "upgraded" to a lateral join could also be confusing for developers.

To be able to define a lateral joined association directly in the schema would be great too!

@narrowtux
Copy link
Contributor Author

narrowtux commented Apr 11, 2024

I also thought about the problem with the parent_as, maybe instead of adding an alias to the parent table all the time, the planner could transform an alias found in the query returned by joins_query into an alias from the assoc parent ix to the same name.

def joins_query(assoc) do
  from b in Book, as: :__lateral_assoc_book__,
    inner_lateral_join: s in subquery(
      from c in Comment, 
        where: c.book_id == parent_as(:__lateral_assoc_book__).id, 
        select: %{count: count(c.id)}
    )
end

This is what I initially tried to write, but then I saw that the planner only takes the joins from the returned query and discards anything else. If the aliases were transformed as well, it could work (also if the lateral join references another join instead of the root table).

Then we don't need to introduce another callback with more arity.

@josevalim
Copy link
Member

The issue is that the Ecto.Association API is private, we use it internally, but it isn't public. So I am worried about adding new callbacks that ourselves don't use, it would even be hard to test them. So I would prefer to surface the feature using proper APIs with some docs. :)

@narrowtux
Copy link
Contributor Author

narrowtux commented Apr 11, 2024

Yeah adding a new callback isn't my favorite solution either. I think transforming the aliases is a much nicer and more readable solution.

I can also look into how a lateral helper could work which uses the new capabilities internally.

schema do
  lateral_one :comment_stats do
    from b in Book, 
      as: :book,
      inner_lateral_join: s in subquery(
        from c in Comment, 
          where: c.book_id == parent_as(:book).id, 
          select: %{count: count(c.id)}
      ),
      on: true
  end
end

and a similar lateral_many macro as well.

@greg-rychlewski
Copy link
Member

It feels off to me to force the join type in the schema. It's so common to want to use the same table for many different joins that I think people will almost always have to duplicate the association just to do what they want.

@narrowtux
Copy link
Contributor Author

As soon as you have this association that uses a subquery with a filter that references some column from other queries, you can't use anything but lateral joins in SQL anyway.

It should be clear to users that a lateral association can only be used with lateral joins.

@josevalim
Copy link
Member

Right, which is why I proposed lateral_assoc(...) (which we can validate as part of a lateral join). :)

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Apr 11, 2024

For example:

posts has an association on comments

sometimes I want the association to be joined in an inner join

sometimes I want the association to be joined in a lateral join

If I define it in the schema then I need 2 associations. If the join is specified in the query then I only need 1 association and I can have a helper that wraps it in a subquery and uses parent_as when I want the lateral join.

@narrowtux
Copy link
Contributor Author

sometimes I want the association to be joined in an inner join

How would you join a subquery that can only be used in a lateral join with an inner join? I feel like we're not on the same page.

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Apr 11, 2024

Something you might not be aware of is that today Ecto will store has a default query for each association. It's how separate query preloads work when you do something like this

from p in Post, preload: :comments

So what I am saying is that there is a helper that lets you leverage this query when you want to use the association in a lateral join. It should wrap it in a subquery and deal with parent_as.

I think you are very fixed on the subquery being part of the association definition. That is exactly what I am saying shouldn't happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants