Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent autoparameterization for selected literals #1209

Open
ethan-readyset opened this issue Apr 4, 2024 · 0 comments
Open

Prevent autoparameterization for selected literals #1209

ethan-readyset opened this issue Apr 4, 2024 · 0 comments
Assignees

Comments

@ethan-readyset
Copy link
Contributor

Description

Currently, our autoparameterization rewrite pass always replaces every literal it can with a parameter. Although this is desirable in many cases, there are some cases where we might want to fine tune which literals are turned into parameters. For example, consider this query:

SELECT t.c1 FROM t JOIN s ON s.c2 = t.c2 WHERE s.c3 = 'some string' AND t.c3 = true

If we know that this query will always be invoked with t.c3 always set to true, we may want to keep that as a literal, which would allow our push filters MIR rewrite pass to move that filter above the join, which has the potential to vastly improve replay performance.

I think there are two parts to this work:

First, we'd need to make updates to the autoparameterization rewrite pass such that it only replaces select literals with parameters. We could invent some new syntax for this, but I think the easiest way to accomplish this would be to skip autoparameterization entirely if a query already has at least one parameter.

Second, we'd need to ensure that incoming queries are routed to the proper cache. Any query coming in will necessarily have all of its literals (or at least the ones in supported positions) converted to parameters, since we'd have no way of knowing which literals to autoparameterize. Consider the following example:

# We have the following table
CREATE TABLE t (x int, y int, z bool);
CREATE TABLE s (x int, y int, z bool);

# We have this cache
SELECT t.x FROM t JOIN s ON t.y = s.y WHERE t.z = true AND s.z = $1

# We receive this query
SELECT t.x FROM t JOIN s ON t.y = s.y WHERE t.z = true AND s.z = true

Upon receiving that query, we'd replace both literals in the where clause with parameters, which (I think) means that it would not match the existing cache in the query status cache (which has only one parameter). We'd need to do something here to ensure that the incoming query is matched to the existing cache.

This also opens up the possibility to incoming queries matching more than one cache. Consider the situation where, in addition to the above, we also have the following cache:

SELECT t.x FROM t JOIN s ON t.y = s.y WHERE t.z = $1 AND s.z = $2

The incoming query SELECT t.x FROM t JOIN s ON t.y = s.y WHERE t.z = true AND s.z = true could match either of these two caches, so we'd need to pick which one to actually use. One strategy would be to try to use the matching cache with the fewest number of parameters. Alternatively, we could disallow such "overlapping" caches entirely, which would prevent this from happening. Offhand, I can't imagine many use cases for overlapping caches like this, so that might be the way to go.

Change in user-visible behavior

Yes

Requires documentation change

Yes

@ethan-readyset ethan-readyset self-assigned this Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant