Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove num_retries parameter from load_table_from_*() methods #1071

Open
plamut opened this issue Nov 18, 2021 · 12 comments
Open

Remove num_retries parameter from load_table_from_*() methods #1071

plamut opened this issue Nov 18, 2021 · 12 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. semver: major Hint for users that this is an API breaking change. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@plamut
Copy link
Contributor

plamut commented Nov 18, 2021

When dealing with missing timeout for resumable upload requests, I noticed several comments that the num_retries parameter is deprecated and is planned to be removed. If this is still the case, we can use the opportunity and remove it in v3.

Examples:

@plamut plamut added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Nov 18, 2021
@plamut plamut self-assigned this Nov 18, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Nov 18, 2021
@plamut plamut added the semver: major Hint for users that this is an API breaking change. label Nov 18, 2021
@plamut
Copy link
Contributor Author

plamut commented Nov 18, 2021

@tswast Those comments were added at least two years ago, and we haven't actually been issuing any deprecation warnings.

Are the comments still valid?

@tswast
Copy link
Contributor

tswast commented Nov 18, 2021

Looks like they were there even 4 years ago. googleapis/google-cloud-python#4136

Edit: even before that: googleapis/google-cloud-python#3555

Edit: I think we actually inherited it from GCS: googleapis/google-cloud-python#3378

Edit: This was the PR with original deprecation/removal: googleapis/google-cloud-python#3362

@tswast
Copy link
Contributor

tswast commented Nov 18, 2021

I imagine the intent was to deprecate it in favor of a retry object? Doesn't look like we actually did that, though.

Maybe in v3 we add the retry logic -- potentially similar to job_retry in query, seeing as this involves starting the stream from the beginning, which is slightly different from the per-request retry object?

If we get that working, then I think in v3 we just start warning for num_retries rather than remove it, seeing as we haven't actually had a replacement available.

@tswast
Copy link
Contributor

tswast commented Nov 18, 2021

@tswast
Copy link
Contributor

tswast commented Nov 18, 2021

Looks like google-cloud-storage has a converter from Retry -> RetryStrategy.

https://github.com/googleapis/python-storage/blob/af9c9dc83d8582167b74105167af17c9809455de/google/cloud/storage/blob.py#L1912

We might want to do something similar.

Edit: See where this was added in googleapis/python-storage#447

@plamut
Copy link
Contributor Author

plamut commented Nov 19, 2021

I imagine the intent was to deprecate it in favor of a retry object? Doesn't look like we actually did that, though.

That's quite possible. A retry strategy is much more expressive than a plain timeout value.

I agree that issuing deprecation warnings first would be a more user-friendly approach. I wouldn't necessarily include this in the 3.0 release, though, maybe just the deprecation warning? IMHO there's more value in finalizing the first v3 release by the end of the year, and only include this feature in 3.0.0 if the capacity allows for it, i.e. if it doesn't cause delays.

@tswast
Copy link
Contributor

tswast commented Nov 23, 2021

Agreed that it's a low priority and that we'd probably only include the deprecation warning + retry feature in v3. A stable v3 is definitely more important, and this could happen post-v3 release.

@plamut plamut removed their assignment Feb 15, 2022
@chalmerlowe
Copy link
Contributor

Gonna mark this as a P3 priority based on the above conversation.

@chalmerlowe chalmerlowe added the priority: p3 Desirable enhancement or fix. May not be included in next release. label Aug 24, 2022
@chalmerlowe
Copy link
Contributor

chalmerlowe commented Sep 18, 2023

This is still on the back burner. If we intend to do this, the first steps are:

  • identify where the num_retries parameter is still being used AND needs to be removed
  • add deprecation warnings to the code relative to those uses

@Linchin
Copy link
Contributor

Linchin commented Feb 5, 2024

I wonder if we still need to remove num_retries, as it is passed as a parameter max_retries into google resumable media, which doesn't seem to be deprecated any time soon. We might unify and clarify the multiple levels of retries we use in the load functions, but that would be its own issue.

@tswast @chalmerlowe
Please let me know what you think about this :)

@tswast
Copy link
Contributor

tswast commented Feb 6, 2024

@chalmerlowe , @Linchin is correct. There are at least three types of retries that are relevant to load jobs:

  1. Individual API request retries. Our client uses the naming convention retry for these.
  2. Resumable media retries. There is a multi-api request protocol for uploading files to Google, shared by Drive, BigQuery, and Cloud Storage. Errors can occur mid-upload, in which case we have to seek back to the start of the file and try the upload again. This is currently managed by the num_retries parameter. Which is oddly named because it predates the retries in (1)
  3. Job level retries. These get a new job ID and try the whole job again if the job failed for a retryable reason (such as rate limiting). This is currently only implemented for query jobs via the job_retry parameter. It would be very beneficial to do this for load jobs also.

I would be in favor of deprecating num_retries in favor of a new name that reflects the purpose of being specific to resumable media, but not removing it.

I do recommend reaching out to the GCS team to see if they have plans for improving retry support in resumable media. I recall there was a project along those lines in 2020.

@tswast
Copy link
Contributor

tswast commented Feb 6, 2024

#1071 (comment)

Just re-read the whole thread and this comment appears to be the most relevant. I don't know if it makes sense to add a new parameter or reuse the same retry object for two different purposes. Would be good to sync with GCS in that regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. semver: major Hint for users that this is an API breaking change. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

4 participants