Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: apply timeout to all resumable upload requests #1070

Merged
merged 5 commits into from Nov 24, 2021

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Nov 18, 2021

Fixes #1067.

PR checklist:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

@plamut plamut requested a review from a team November 18, 2021 10:20
@plamut plamut requested a review from a team as a code owner November 18, 2021 10:20
@plamut plamut requested a review from tswast November 18, 2021 10:20
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 18, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Nov 18, 2021
@tswast
Copy link
Contributor

tswast commented Nov 18, 2021

Test failure:

____________________________ test_upload_chunksize _____________________________

client = <google.cloud.bigquery.client.Client object at 0x7f73ab1c0710>

    def test_upload_chunksize(client):
        with mock.patch("google.cloud.bigquery.client.ResumableUpload") as RU:
            upload = RU.return_value

            upload.finished = False

            def transmit_next_chunk(transport):
                upload.finished = True
                result = mock.MagicMock()
                result.json.return_value = {}
                return result

            upload.transmit_next_chunk = transmit_next_chunk
            f = io.BytesIO()
>           client.load_table_from_file(f, "foo.bar")

tests/unit/test_client.py:8469:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
google/cloud/bigquery/client.py:2443: in load_table_from_file
    file_obj, job_resource, num_retries, timeout, project=project
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <google.cloud.bigquery.client.Client object at 0x7f73ab1c0710>
stream = <_io.BytesIO object at 0x7f73ab14f048>
metadata = {'configuration': {'load': {'destinationTable': {'datasetId': 'foo', 'projectId': 'PROJECT', 'tableId': 'bar'}}}, 'jobReference': {'jobId': 'c2ad6c61-6202-4c0e-a6af-8310c4aabd61', 'projectId': 'PROJECT'}}
num_retries = 6, timeout = None, project = 'PROJECT'

    def _do_resumable_upload(
        self, stream, metadata, num_retries, timeout, project=None
    ):
        """Perform a resumable upload.

        Args:
            stream (IO[bytes]): A bytes IO object open for reading.

            metadata (Dict): The metadata associated with the upload.

            num_retries (int):
                Number of upload retries. (Deprecated: This
                argument will be removed in a future release.)

            timeout (float):
                The number of seconds to wait for the underlying HTTP transport
                before using ``retry``.

            project (Optional[str]):
                Project ID of the project of where to run the upload. Defaults
                to the client's project.

        Returns:
            requests.Response:
                The "200 OK" response object returned after the final chunk
                is uploaded.
        """
        upload, transport = self._initiate_resumable_upload(
            stream, metadata, num_retries, timeout, project=project
        )

        while not upload.finished:
>           response = upload.transmit_next_chunk(transport, timeout=timeout)
E           TypeError: transmit_next_chunk() got an unexpected keyword argument 'timeout'

google/cloud/bigquery/client.py:2853: TypeError
=============================== warnings summary ===============================

I think we'll have to bump the minimum version of the resumable media library.

@tseaver
Copy link
Contributor

tseaver commented Nov 18, 2021

@tswast The feature was added (by @plamut) in June 2020, and released with version 0.6.0 of google-resumable-media, which is our minimum version.

The test failure comes from the broken stub version in the testcase.

may be repeated several times using the same timeout each time.

Can also be passed as a tuple (connect_timeout, read_timeout).
See :meth:`requests.Session.request` documentation for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, does the same apply to our other API requests? I know we use requests library for those too, but via google-cloud-core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, it does. The timeout argument gets passed down to google.cloud._http.JSONConnection, which accepts a two-tuple.

I think we did not advertise this second option in BigQuery and just went with Optional[Union[int, float]] to avoid leaking transport implementation details, but I'm not 100%.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google-cloud-core definitely exposes both float and tuple for timeout.

Note that its current docs link (under cloud.google.com) doesn't show those details.

@plamut plamut merged commit 3314dfb into googleapis:main Nov 24, 2021
@plamut plamut deleted the iss-1067 branch November 24, 2021 15:12
abdelmegahedgoogle pushed a commit to abdelmegahedgoogle/python-bigquery that referenced this pull request Apr 17, 2023
* fix: apply timeout to all resumable upload requests

* Fix stub in test case

* Improve timeout type and other type annotations

* Annnotate return type of _do_resumable_upload()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provided timeout should be applied to all resumable upload requests
3 participants