Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine which indices are needed for efficient execution of a query #3343

Open
mitar opened this issue Apr 17, 2024 · 5 comments
Open

Determine which indices are needed for efficient execution of a query #3343

mitar opened this issue Apr 17, 2024 · 5 comments
Labels
enhancement New feature or request triage New issues that hasn't been reviewed

Comments

@mitar
Copy link

mitar commented Apr 17, 2024

What do you want to change?

Currently using SQL databases generally operate in the mode that every query is possible and they create an execution plan at runtime, based on data statistics and available indices. You have prepared statement you can use, but they are generally still done per-connection.

But in many cases, the set of queries one uses in apps is limited. sqlc is realizing that as well and takes an advantage to build client-side type-safe code to interface with SQL. In a way it compiles SQL into target code (e.g., Go) and abstracts SQL away.

I would like to suggest that sqlc could go a step further. During compilation time it could use data statistics to a) optimize the query itself b) create a SQL migration to add/remove indices to effectively execute those queries.

I know that doing this would be huge effort, but I think it is also very powerful. In this way I would bundle with a version of my app not just (compiled and optimized) queries, but also the set of indices (and data schema as well) those queries need for performant execution.

One approach to do this would be that during compilation time the sqlc would connect to a database with data, install hypothetical indices, and EXPLAIN all queries available (maybe a developer should provide test parameters), determine which indices are needed, and then create also a migration to install (or optionally remove) indices.

What database engines need to be changed?

PostgreSQL

What programming language backends need to be changed?

Go

@mitar mitar added enhancement New feature or request triage New issues that hasn't been reviewed labels Apr 17, 2024
@mitar
Copy link
Author

mitar commented Apr 17, 2024

There seems to exist something similar: https://github.com/ankane/dexter But it is not built-in into a compilation step.

@lyda
Copy link

lyda commented Apr 25, 2024

Can I suggest a slightly different process?

How about generating a second function for each SELECT that does an EXPLAIN? So for example:

-- name: GetAuthor :one
SELECT * FROM authors
WHERE id = $1 LIMIT 1;

would generate a GetAuthor function but also an ExplainGetAuthor that would return explain results for that database. You could then add a test program that would run all those Explain statements and give results in your build process - whenever that was appropriate. I can imagine that folks won't always have DB access at compile time but a build pipeline would/could.

In addition this would allow people to track changes over time, decide which queries needed to be optimal, etc.

@mitar
Copy link
Author

mitar commented Apr 27, 2024

@lyda An interesting proposal. In a way it makes sense, it is like the benchmarks you would have in a test suite where tests would fail if expected performance would get too low.

It would mean that developers would have to manually add indices and then during testing you could check:

  • Are there indices which are missing and do not exist.
  • Are there indices where are there but are not ever used.

This could also be something similar to "coverage" - do all your tests at the end cover the indices you have and are there indices they could over but are missing. So you would run whole test suite and check that (instead of having an explicit tests to validate indices - I worry that then calls you might be doing would be repeated with calls you do during regular tests).

Which brings me then to the final point:

This might best live outside of a particular program/test suite/sqlc. You have test cases to test your code (which can include running against a realistic dump of data). Then you:

  • Create database instance. Instrument it with extensions to store all queries made.
  • You run your test suite.
  • You check all queries - where there slow queries, where there queries which used scans, are there indices no query ever used.
  • You repeat, this time creating all possible hypothetical queries.
  • Now you check if any query would use any of the hypothetical queries.

Sqlc would then only have to support some hook to be able to give you information about the schema to create all the hypothetical queries. But one could probably find a way to parse schema from sql also without sqlc to figure that out.

@lyda
Copy link

lyda commented Apr 30, 2024

Exactly. sqlc will provide the Explain* functions but then it's up to you to run one or more of them in whatever environment you see fit. I think the real win here is if we can parse the response to EXPLAIN and provide meaningful feedback to the developer.

I might try and do it for cockroachdb if I get some time. Bank holiday coming up in a few weeks.

@lyda
Copy link

lyda commented Apr 30, 2024

Note: This could even be run in production systems to track how certain statements are being optimised over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage New issues that hasn't been reviewed
Projects
None yet
Development

No branches or pull requests

2 participants