Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to follow progress of query download when stdout is not available #1654

Open
JakeSummers opened this issue Sep 5, 2023 · 0 comments
Open
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@JakeSummers
Copy link

JakeSummers commented Sep 5, 2023

Issue Summary

I would like to follow progress of query downloads.

Currently I am doing:

    query_result: QueryJob = client.query(query)

    df = query_result.result().to_dataframe(
            progress_bar_type="tqdm" 
    )

But this only supports stdout.

I would like to have some kind of mechanism to follow progress when stdout is not available.

Possible Solution 1 - Logs

A minimal solution could be to add a log statement into the code.

Maybe this would work:

Line 1819 of google/cloud/bigquery/table.py

        try:
            progress_bar = get_progress_bar(
                progress_bar_type, "Downloading", self.total_rows, "rows"
            )

            record_batches = []
            for record_batch in self.to_arrow_iterable(
                bqstorage_client=bqstorage_client
            ):
                record_batches.append(record_batch)
                
                # NEW LINE
                logger.debug("Downloaded data", completed=record_batch.num_rows, total_items=progress_bar.total or self.total_rows)

Possible Solution 2 - Callback

A better solution would be to add a call-back function to the to_dataframe function, like this:

    def log_progress(completed_items: int, total_items:int) -> None:
            # This lets me do whatever I want here :)
            logger.debug("Downloaded data", completed=completed_items, total_items=total_items)

    query_result: QueryJob = client.query(query)

    df = query_result.result().to_dataframe(
            progress_callback=log_progress
    )
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Sep 5, 2023
@chalmerlowe chalmerlowe added priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants