Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance BQ Connector to Pushdown Predicates while Materializing View #10523

Closed
ayushbilala opened this issue Jan 10, 2022 · 9 comments
Closed
Assignees
Labels
enhancement New feature or request

Comments

@ayushbilala
Copy link
Contributor

BQ Connector in the current state does not make use of predicates during view materialization. So, if the table is of size 10 TB, the connector will try to materialize 10 TB of data even when the SQL query submitted by the client has certain query filters.

Example:
SELECT col1, col2, col3 from someHugeBQTable where col1='someVal1';

BQ Connector as part of the materialization step will submit a BQ job with the below query without the where clause. (SQL query Creation happens here).

SELECT col1, col2, col3 from someHugeBQTable ;

This is going to materialize the entire table and is going to add a lot to the performance and cost.

@ayushbilala ayushbilala changed the title Enhance BQ Connector to Pushdown Predicates while Materialising View Enhance BQ Connector to Pushdown Predicates while Materializing View Jan 10, 2022
@ayushbilala ayushbilala self-assigned this Jan 10, 2022
@hashhar
Copy link
Member

hashhar commented Jan 10, 2022

I believe the reason for the current impl is that the view materialization cache is keyed on the SELECT SQL generated. Once we start pushing down predicates then the materialised cache will probably not be used at all unless the values passed to the predicates match exactly.

However the current behaviour is also suboptimal. As long as the sum of data returned from various queries on the view < size of entire view then the proposed behaviour might be cheaper.

I believe the only "perfect" solution to this would be for the Storage Read API to support reading views. cc: @davidrabinowitz any information if that's something that can be done?

cc: @ebyhr Do you have more ideas? Maybe we can bypass the Storage Read API for views and use the BigQuery client directly. But it's a very big lift since we'd need to implement something to convert the Avro data returned by the client and plug it into the page source.

@hashhar hashhar added the enhancement New feature or request label Jan 10, 2022
@ayushbilala
Copy link
Contributor Author

I agree, but there is always a chance of serving the stale data from the cache to the client if the underlying data keeps on changing. If we keep the TTL too low, we might end up refreshing the cache all the time.

Well, as far as I know, the Google BQ team is already at work for adding support for reading simple views using Storage Read API. But there won't be any support for complex views.

@cozos
Copy link

cozos commented Jan 12, 2022

@hashhar If the view materialization cache key is the main issue, can we modify the logic to read cache based on the "original" query instead of the "optimized" (i.e. predicate pushdown'd) query? Is that too complicated of a code change

@ebyhr
Copy link
Member

ebyhr commented Sep 28, 2022

@ayushbilala Are you still working on this? We added support for query pass-through function and query based reader for views. Are they insufficient in your use case?

@myfjdthink
Copy link

I tried the query function, this is my query sql :

SELECT * FROM TABLE(
    bigquery.system.query(
      query => 'SELECT block_timestamp, transaction_hash
FROM `my.table` 
where block_timestamp >= "2022-10-09"
LIMIT 1000'
    )
  );

and it ran with an error::

Error executing query:
SQL Error [65536]: Query failed (#20221010_085346_00084_9haf5): Unhandled type for LongTimestampWithTimeZone: timestamp(6) with time zone

@ebyhr

@ebyhr
Copy link
Member

ebyhr commented Oct 10, 2022

@myfjdthink See #12346

@ebyhr
Copy link
Member

ebyhr commented Dec 7, 2022

Let me close this issue because query pass-through function and query based reader for views are sufficient in my opinion. Please feel free to reopen if it's still required.

@ebyhr ebyhr closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2022
@cw1427
Copy link

cw1427 commented Sep 20, 2023

I also encount this error, how can it be resolved?

2023-09-19T10:15:37.275Z DEBUG stage-scheduler io.trino.execution.QueryStateMachine Query 20230919_101533_00021_cdqm7 failed
io.trino.spi.TrinoException: Unhandled type for LongTimestampWithTimeZone: timestamp(6) with time zone
at io.trino.plugin.bigquery.BigQueryQueryPageSource.appendTo(BigQueryQueryPageSource.java:215)
at io.trino.plugin.bigquery.BigQueryQueryPageSource.getNextPage(BigQueryQueryPageSource.java:148)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:299)
at io.trino.operator.Driver.processInternal(Driver.java:395)
at io.trino.operator.Driver.lambda$process$8(Driver.java:298)
at io.trino.operator.Driver.tryWithLock(Driver.java:694)
at io.trino.operator.Driver.process(Driver.java:290)
at io.trino.operator.Driver.processForDuration(Driver.java:261)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:887)
at io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
at io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:565)
at io.trino.$gen.Trino_424____20230919_085020_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)

@ebyhr
Copy link
Member

ebyhr commented Sep 20, 2023

@cw1427 The type is unsupported. See #12346. You need to convert to other types in query table function or BigQueryr views.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

6 participants