You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, our autoparameterization rewrite pass always replaces every literal it can with a parameter. Although this is desirable in many cases, there are some cases where we might want to fine tune which literals are turned into parameters. For example, consider this query:
SELECT t.c1 FROM t JOIN s ON s.c2 = t.c2 WHERE s.c3 = 'some string' AND t.c3 = true
If we know that this query will always be invoked with t.c3 always set to true, we may want to keep that as a literal, which would allow our push filters MIR rewrite pass to move that filter above the join, which has the potential to vastly improve replay performance.
I think there are two parts to this work:
First, we'd need to make updates to the autoparameterization rewrite pass such that it only replaces select literals with parameters. We could invent some new syntax for this, but I think the easiest way to accomplish this would be to skip autoparameterization entirely if a query already has at least one parameter.
Second, we'd need to ensure that incoming queries are routed to the proper cache. Any query coming in will necessarily have all of its literals (or at least the ones in supported positions) converted to parameters, since we'd have no way of knowing which literals to autoparameterize. Consider the following example:
# We have the following tableCREATETABLEt (x int, y int, z bool);
CREATETABLEs (x int, y int, z bool);
# We have this cacheSELECTt.xFROM t JOIN s ONt.y=s.yWHEREt.z= true ANDs.z= $1# We receive this querySELECTt.xFROM t JOIN s ONt.y=s.yWHEREt.z= true ANDs.z= true
Upon receiving that query, we'd replace both literals in the where clause with parameters, which (I think) means that it would not match the existing cache in the query status cache (which has only one parameter). We'd need to do something here to ensure that the incoming query is matched to the existing cache.
This also opens up the possibility to incoming queries matching more than one cache. Consider the situation where, in addition to the above, we also have the following cache:
SELECTt.xFROM t JOIN s ONt.y=s.yWHEREt.z= $1ANDs.z= $2
The incoming query SELECT t.x FROM t JOIN s ON t.y = s.y WHERE t.z = true AND s.z = true could match either of these two caches, so we'd need to pick which one to actually use. One strategy would be to try to use the matching cache with the fewest number of parameters. Alternatively, we could disallow such "overlapping" caches entirely, which would prevent this from happening. Offhand, I can't imagine many use cases for overlapping caches like this, so that might be the way to go.
Change in user-visible behavior
Yes
Requires documentation change
Yes
The text was updated successfully, but these errors were encountered:
Description
Currently, our autoparameterization rewrite pass always replaces every literal it can with a parameter. Although this is desirable in many cases, there are some cases where we might want to fine tune which literals are turned into parameters. For example, consider this query:
SELECT t.c1 FROM t JOIN s ON s.c2 = t.c2 WHERE s.c3 = 'some string' AND t.c3 = true
If we know that this query will always be invoked with
t.c3
always set totrue
, we may want to keep that as a literal, which would allow our push filters MIR rewrite pass to move that filter above the join, which has the potential to vastly improve replay performance.I think there are two parts to this work:
First, we'd need to make updates to the autoparameterization rewrite pass such that it only replaces select literals with parameters. We could invent some new syntax for this, but I think the easiest way to accomplish this would be to skip autoparameterization entirely if a query already has at least one parameter.
Second, we'd need to ensure that incoming queries are routed to the proper cache. Any query coming in will necessarily have all of its literals (or at least the ones in supported positions) converted to parameters, since we'd have no way of knowing which literals to autoparameterize. Consider the following example:
Upon receiving that query, we'd replace both literals in the where clause with parameters, which (I think) means that it would not match the existing cache in the query status cache (which has only one parameter). We'd need to do something here to ensure that the incoming query is matched to the existing cache.
This also opens up the possibility to incoming queries matching more than one cache. Consider the situation where, in addition to the above, we also have the following cache:
The incoming query
SELECT t.x FROM t JOIN s ON t.y = s.y WHERE t.z = true AND s.z = true
could match either of these two caches, so we'd need to pick which one to actually use. One strategy would be to try to use the matching cache with the fewest number of parameters. Alternatively, we could disallow such "overlapping" caches entirely, which would prevent this from happening. Offhand, I can't imagine many use cases for overlapping caches like this, so that might be the way to go.Change in user-visible behavior
Yes
Requires documentation change
Yes
The text was updated successfully, but these errors were encountered: