Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved polymorphic and STI index suggestions #109

Open
Epigene opened this issue Jun 20, 2019 · 3 comments
Open

Improved polymorphic and STI index suggestions #109

Epigene opened this issue Jun 20, 2019 · 3 comments
Assignees

Comments

@Epigene
Copy link

Epigene commented Jun 20, 2019

Hi!

I am using v2.1.8 and polymorphic indexes are still suggested despite them already being present in different order, see #107

Furthermore, I believe the suggestion should be in the format [:poly_type, :poly_id]. This is somewhat of a controversial topic, but the reasoning for putting the type first is that it is more useful:

Pros:

  • Index can be used for searches only by type (in case we want a count of how many records by types there are)

Cons:

  • Index can not be used to search by just :poly_id, but this should be a non-problem since it can be argued that searching by just polymorphic ID does not guarantee that the correct record is selected.

Additionally, there are false positives when using Rails' STI pattern. If a table has a :type column loldba proposes an [:id, :type] index, which is redundant, especially if an index on just :type is present (and :id is indexed by default).

@plentz plentz self-assigned this Jul 1, 2019
@plentz
Copy link
Owner

plentz commented Mar 29, 2020

Sorry for taking so long to reply @Epigene. I just merged #107 which fixes the order problem.

About the ordering of the index itself, can you provide some references for it? I'm pretty sure that modern databases can use only to search by type even if it's on the second position of the index.

@Epigene
Copy link
Author

Epigene commented Jul 29, 2020

Sorry for taking so long to get back to you on this topic, @plentz. I believe the order of index is as relevant as ever and will probably never be an issue of DB engine modernity.

MySQL docs state:

The name index is an index over the last_name and first_name columns. The index can be used for lookups in queries that specify values in a known range for combinations of last_name and first_name values. It can also be used for queries that specify just a last_name value because that column is a leftmost prefix of the index (as described later in this section). Therefore, the name index is used for lookups in the following queries[...]

Similarly, PSQL docs state:

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.

This is easy to see using the [:last_name, :first_name] index example and these four records, in order:

Doe, John
Doe, Ziggy
Example, Fred
Example, John

A query WHERE first_name = 'John' can not (efficiently) use the index because to get all Johns the DB engine must read through all last names anyway.

@Epigene
Copy link
Author

Epigene commented Jul 29, 2020

To be clear, I'm arguing that an index of order [:poly_type, :poly_id] is better than [:poly_id, :poly_type] because it supports an efficient WHERE poly_type = 'MyModel', which is a common use-case, whereas [:poly_id, :poly_type] index supports an efficient WHERE poly_id = '1', which is unlikely to be a use-case.

Both indexes fare equally well when queries have both terms like WHERE poly_type = 'MyModel' AND poly_id = '1'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants