Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor to use more logic from google-cloud-bigquery #339

Closed
1 of 2 tasks
tswast opened this issue Nov 6, 2020 · 2 comments
Closed
1 of 2 tasks

refactor to use more logic from google-cloud-bigquery #339

tswast opened this issue Nov 6, 2020 · 2 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: process A process-related concern. May include testing, release, or the like.

Comments

@tswast
Copy link
Collaborator

tswast commented Nov 6, 2020

The backend made some improvements to the performance of time-to-first-byte for query results. I'm working on implementing these changes here: googleapis/python-bigquery#362

pandas-gbq will not be able to take advantage of many of these performance improvements unless it uses the return value from QueryJob.result() or even QueryJob.to_dataframe() directly.

Related: #149

@tswast tswast added type: process A process-related concern. May include testing, release, or the like. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: process A process-related concern. May include testing, release, or the like. labels Nov 9, 2020
@tswast
Copy link
Collaborator Author

tswast commented Dec 17, 2020

Even if we're not using the jobs.query method, I think it still makes sense to re-use much of the code which has been recently added.

read_gbq:: Using QueryJob.to_dataframe() directly should be possible for all except when max_results is set now that progress bar support has been added in googleapis/python-bigquery#343 After googleapis/python-bigquery#296 is implemented, it can even be used in that case.

to_gbq:: Using Client.load_table_from_dataframe() (and with streaming support Client.insert_rows_from_dataframe #300) should now be possible, because google-cloud-bigquery added CSV serialization as an option in googleapis/python-bigquery#383

@tswast tswast changed the title Take advantage of query performance optimations refactor to use more logic from google-cloud-bigquery Dec 17, 2020
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Jul 17, 2021
@tswast tswast added type: process A process-related concern. May include testing, release, or the like. and removed type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Nov 19, 2021
@tswast
Copy link
Collaborator Author

tswast commented Jan 19, 2022

Closing as a duplicate of #118

@tswast tswast closed this as completed Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: process A process-related concern. May include testing, release, or the like.
Projects
None yet
Development

No branches or pull requests

1 participant