Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiencing slow leaves call on a model with closure tree #370

Open
cthornbe opened this issue Dec 29, 2020 · 2 comments
Open

Experiencing slow leaves call on a model with closure tree #370

cthornbe opened this issue Dec 29, 2020 · 2 comments

Comments

@cthornbe
Copy link

cthornbe commented Dec 29, 2020

I have a model Product which is using closure tree on an application backed with Postgres. When we attempt to call Product.leaves the returning request is EXTREMELY slow on our production environment. Making a call to the same Product.self_and_descendants is very speedy and is much faster to iterate over self_and_descendants and next if leaf? is false.

The information below is run on my local development environment with significantly less rows.

Here is the database model with the appropriate indexes listed.

class Product < ApplicationRecord
  include Trees
  acts_as_tree with_advisory_lock: false, order: 'sort_order', dependent: :nullify, numeric_order: true
end

# == Schema Information
#
# Table name: products
#
#  id                   :integer          not null, primary key
#  sort_order           :integer
#
# Indexes
#
#  index_products_on_parent_id      (parent_id)
#

When loading a product and doing a Product.leaves.explain this is what is returned:

=> EXPLAIN for: SELECT "products".* FROM "products" INNER JOIN "product_hierarchies" ON "products"."id" = "product_hierarchies"."descendant_id" INNER JOIN ( SELECT ancestor_id FROM "product_hierarchies" GROUP BY ancestor_id HAVING MAX("product_hierarchies".generations) = 0 ) AS leaves ON ("products".id = leaves.ancestor_id) WHERE "product_hierarchies"."ancestor_id" = $1 ORDER BY "product_hierarchies".generations ASC, sort_order [["ancestor_id", 78281]]
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=6390.01..6390.02 rows=3 width=334)
   Sort Key: product_hierarchies.generations, products.sort_order
   ->  Hash Join  (cost=6015.84..6389.98 rows=3 width=334)
         Hash Cond: (product_hierarchies_1.ancestor_id = products.id)
         ->  HashAggregate  (cost=5882.84..6060.99 rows=14252 width=8)
               Group Key: product_hierarchies_1.ancestor_id
               Filter: (max(product_hierarchies_1.generations) = 0)
               ->  Seq Scan on product_hierarchies product_hierarchies_1  (cost=0.00..4475.56 rows=281456 width=8)
         ->  Hash  (cost=132.84..132.84 rows=13 width=338)
               ->  Nested Loop  (cost=0.71..132.84 rows=13 width=338)
                     ->  Index Only Scan using product_anc_desc_idx on product_hierarchies  (cost=0.42..24.68 rows=13 width=8)
                           Index Cond: (ancestor_id = 78281)
                     ->  Index Scan using products_pkey on products  (cost=0.29..8.31 rows=1 width=330)
                           Index Cond: (id = product_hierarchies.descendant_id)
(14 rows)

Looking at the SQL it appears the INNER JOIN on the leaves call is not scoped at all. It is doing a SELECT ancestor_id FROM "product_hierarchies" GROUP BY ancestor_id HAVING MAX("product_hierarchies".generations) = 0.

Which results in searching the entire table in the explain.

-> Seq Scan on product_hierarchies product_hierarchies_1 (cost=0.00..4475.56 rows=281456 width=8)

To compare, the number of rows being evaluated on my production environment are 42,149,108 instead of 281,456.

Is anyone else having any similar issues? My thought is there should be a way to tweak the sub-select to make it a little more performant either by scoping to somewhere else in the tree or potentially able to utilize an index.

Any help would be greatly appreciated.

@cthornbe
Copy link
Author

cthornbe commented Jan 4, 2021

We ended up overriding the leaves call on our Product model and used the following to resolve some of the slowness. Anecdotally, we were running at appropriately 35+ seconds down to 23 milliseconds utilizing the optimized SQL below.

Does it make sense to look at implementing scoping similar to this in the gem?

  def leaves
    s = self_and_descendants.joins(<<-SQL.squish)
          INNER JOIN (
            SELECT ancestor_id
            FROM #{_ct.quoted_hierarchy_table_name}
            WHERE ancestor_id IN (
              SELECT id
                FROM "products"
                  INNER JOIN "product_hierarchies" ON "products"."id" = "product_hierarchies"."descendant_id"
                WHERE "product_hierarchies"."ancestor_id" = #{id} AND ("products"."id" != #{id})
                ORDER BY "product_hierarchies".generations ASC, sort_order
            )
            GROUP BY ancestor_id
            HAVING MAX(#{_ct.quoted_hierarchy_table_name}.generations) = 0
          ) #{_ct.t_alias_keyword} leaves ON (#{_ct.quoted_table_name}.id = leaves.ancestor_id)
    SQL
    _ct.scope_with_order(s.readonly(false))
  end

@cthornbe
Copy link
Author

cthornbe commented Jan 4, 2021

@garrinmf ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant