Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement initial "waiting" logs with tqdm? #327

Closed
max-sixty opened this issue Sep 11, 2020 · 4 comments
Closed

Implement initial "waiting" logs with tqdm? #327

max-sixty opened this issue Sep 11, 2020 · 4 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@max-sixty
Copy link
Contributor

Currently the initial logs are every ~second. Could we instead implement this as a tqdm "progress bar", albeit without progress? That would be more elegant.

We could also have a hiearachical progress bar, with each of the two steps being a descendent of the parent. This would screen off the final log messages; since the total time would be left by tqdm.

INFO:pandas_gbq.gbq:  Elapsed 6.71 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 7.88 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 9.05 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 10.23 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 11.42 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 12.6 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 13.6 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 14.61 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 15.78 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 16.95 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 18.11 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 19.28 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 20.44 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 21.61 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 22.76 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 23.91 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 25.1 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 26.25 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 27.41 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 28.6 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 29.8 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 30.99 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 32.01 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 33.2 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 34.36 s. Waiting...
Downloading: 100%|████████████████████████████████████████████| 2373289/2373289 [00:29<00:00, 79844.08rows/s]
INFO:pandas_gbq.gbq:Total time taken 66.53 s.
Finished at 2020-09-11 14:17:54.
@tswast
Copy link
Collaborator

tswast commented Sep 11, 2020

We don't know ahead of time how long a query will take, so the current tqdm logic won't work.

Looks like there are a couple of open issues for indefinite progress bars at: tqdm/tqdm#427 tqdm/tqdm#925

A couple of options:

  • Add "spinner" feature to tqdm and use that.
  • Do some kind of exponential backoff on "waiting..." logging. 1/s at first, ramping up to 1/min?
  • Use print statements with the right console codes to rewrite lines in-place to update the elapsed time instead of logging.

@max-sixty
Copy link
Contributor Author

Good point @tswast

I just looked back on some code I wrote a few years ago and found this — coincidentally for waiting for jobs from Google Cloud! — though now I see the links above, maybe I should be more cautious about suggesting it's easy to have an indefinite progress bar...

def wait_for_job(job, timeout_in_seconds=None):
    # https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/bigquery/cloud-client/snippets.py
    if timeout_in_seconds:
        start = datetime.datetime.now()
        timeout = start + datetime.timedelta(0, timeout_in_seconds)

    with tqdm(
        bar_format="Waiting for {desc} Elapsed: {elapsed}", total=10000
    ) as progress:
        while True:
            job.reload()  # Refreshes the state via a GET request.
            progress.set_description(str(job))
            if job.state == "DONE":
                if job.error_result:
                    raise RuntimeError(job.errors)
                progress.bar_format = "Completed {desc}. Elapsed: {elapsed}"
                return
            if timeout_in_seconds:
                if datetime.datetime.now() > timeout:
                    raise SystemError(f"Timed out after {timeout_in_seconds} seconds")
            time.sleep(1)

@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Oct 6, 2020
@tswast
Copy link
Collaborator

tswast commented Oct 26, 2020

I had some more thoughts about this. The UI uses the job statistics to show progression through the various stages. Filed googleapis/python-bigquery#343 to see if we can implement this in google-cloud-bigquery, since it'll be relevant for the %%bigquery magics, too.

@tswast
Copy link
Collaborator

tswast commented Nov 16, 2020

I just merged googleapis/python-bigquery#352 which will be available in google-cloud-bigquery 2.4.0 googleapis/python-bigquery#381

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Jul 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

3 participants