Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase the default chunk size for uploads #775

Closed
tswast opened this issue Jul 15, 2021 · 3 comments
Closed

increase the default chunk size for uploads #775

tswast opened this issue Jul 15, 2021 · 3 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Contributor

tswast commented Jul 15, 2021

See: googleapis/google-resumable-media-python#238

The BigQuery client library currently defaults to 1MB chunks.

_DEFAULT_CHUNKSIZE = 1048576 # 1024 * 1024 B = 1 MB

The bq CLI currently defaults to 100MB chunks (via googleapiclient). https://github.com/googleapis/google-api-python-client/blob/11d78e0fe9290d1a8c516a0e2f2e019dbd1877f9/googleapiclient/http.py#L69

Perhaps this could be the cause of the 10x performance difference for file uploads?

CC @KevinTydlacka

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jul 15, 2021
@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jul 15, 2021
@tswast
Copy link
Contributor Author

tswast commented Jul 15, 2021

Looks like the Python GCS client defaults to 40MB, so this might not be the whole story. https://github.com/googleapis/python-storage/blob/c027ccf4279fb05e041754294f10744b7d81beea/google/cloud/storage/fileio.py#L27

@tseaver
Copy link
Contributor

tseaver commented Jul 15, 2021

That is the default for the BlobReader / BlobWriter wrappers: the default for the underlying blob methods is 100 Mb:

_DEFAULT_CHUNKSIZE = 104857600  # 1024 * 1024 B * 100 = 100 MB

@tswast
Copy link
Contributor Author

tswast commented Jul 26, 2021

fixed by #799

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

3 participants