-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance BQ Connector to Pushdown Predicates while Materializing View #10523
Comments
I believe the reason for the current impl is that the view materialization cache is keyed on the SELECT SQL generated. Once we start pushing down predicates then the materialised cache will probably not be used at all unless the values passed to the predicates match exactly. However the current behaviour is also suboptimal. As long as the sum of data returned from various queries on the view < size of entire view then the proposed behaviour might be cheaper. I believe the only "perfect" solution to this would be for the Storage Read API to support reading views. cc: @davidrabinowitz any information if that's something that can be done? cc: @ebyhr Do you have more ideas? Maybe we can bypass the Storage Read API for views and use the BigQuery client directly. But it's a very big lift since we'd need to implement something to convert the Avro data returned by the client and plug it into the page source. |
I agree, but there is always a chance of serving the stale data from the cache to the client if the underlying data keeps on changing. If we keep the TTL too low, we might end up refreshing the cache all the time. Well, as far as I know, the Google BQ team is already at work for adding support for reading simple views using Storage Read API. But there won't be any support for complex views. |
@hashhar If the view materialization cache key is the main issue, can we modify the logic to read cache based on the "original" query instead of the "optimized" (i.e. predicate pushdown'd) query? Is that too complicated of a code change |
@ayushbilala Are you still working on this? We added support for |
I tried the query function, this is my query sql : SELECT * FROM TABLE(
bigquery.system.query(
query => 'SELECT block_timestamp, transaction_hash
FROM `my.table`
where block_timestamp >= "2022-10-09"
LIMIT 1000'
)
); and it ran with an error::
|
@myfjdthink See #12346 |
Let me close this issue because |
I also encount this error, how can it be resolved? 2023-09-19T10:15:37.275Z DEBUG stage-scheduler io.trino.execution.QueryStateMachine Query 20230919_101533_00021_cdqm7 failed |
BQ Connector in the current state does not make use of predicates during view materialization. So, if the table is of size 10 TB, the connector will try to materialize 10 TB of data even when the SQL query submitted by the client has certain query filters.
Example:
SELECT col1, col2, col3 from someHugeBQTable where col1='someVal1';
BQ Connector as part of the materialization step will submit a BQ job with the below query without the where clause. (SQL query Creation happens here).
SELECT col1, col2, col3 from someHugeBQTable ;
This is going to materialize the entire table and is going to add a lot to the performance and cost.
The text was updated successfully, but these errors were encountered: