Add support for Server Side Cursors (a.k.a. stream results) #407

jlynchMicron · 2022-02-01T19:10:05Z

Is your feature request related to a problem? Please describe.
Please support the sqlalchemy stream_results cursor feature to break up large queries that would overwhelm a systems memory.

sqlalchemy documentation: https://docs.sqlalchemy.org/en/14/core/connections.html#using-server-side-cursors-a-k-a-stream-results
bigquery similar function: https://cloud.google.com/bigquery/docs/paging-results

Describe the solution you'd like
An implementation similar to this post: https://pythonspeed.com/articles/pandas-sql-chunking/

import pandas as pd
from sqlalchemy import create_engine

def process_sql_using_pandas():
    engine = create_engine(
        "postgresql://postgres:pass@localhost/example"
    )
    conn = engine.connect().execution_options(
        stream_results=True)

    for chunk_dataframe in pd.read_sql(
            "SELECT * FROM users", conn, chunksize=1000):
        print(f"Got dataframe w/{len(chunk_dataframe)} rows")
        # ... do something with dataframe ...

if __name__ == '__main__':
    process_sql_using_pandas()

Describe alternatives you've considered
Currently I have to perform an upload after every SELECT DISTINCT query I do to make sure my next query "chunk" will not grab the same row elements.

Related Issue in Pandas: pandas-dev/pandas#35689

The text was updated successfully, but these errors were encountered:

jlynchMicron · 2022-02-09T22:41:19Z

It appears that this feature actually works and is implemented by this function:

python-bigquery-sqlalchemy/sqlalchemy_bigquery/base.py

Line 117 in 4b05d21

def create_cursor(self):

Someone please correct me if im wrong. I believe I have it working in a project I am using, but I do not have a good way to verify that my machine is not pulling down an entire query first, then just iterating through it.

tswast · 2022-02-28T22:30:36Z

The DB-API works as you suggest, pulling down only a page at a time as needed, but I'm unsure how this interacts with the SQLAlchemy connector. Can keep this open for further investigation.

balajivenkatesh · 2022-03-12T01:35:00Z

@jlynchMicron Would you be able to share an example of using create_cursor to read in chunks?

jlynchMicron · 2022-04-13T18:55:38Z

@balajivenkatesh This is what I think you roughly need to do and create_cursor happens under the hood, in this pandas example:

conn:sa.engine.Connection
    with engine.connect() as conn:
        conn.execution_options(stream_results=True)
        df:pd.DataFrame
        for df in pd.read_sql(sql, engine, chunksize=query_chunksize):
            #Read chunks here

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Feb 1, 2022

yoshi-automation added the triage me I really want to be triaged. label Feb 2, 2022

meredithslota added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Feb 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Server Side Cursors (a.k.a. stream results) #407

Add support for Server Side Cursors (a.k.a. stream results) #407

jlynchMicron commented Feb 1, 2022 •

edited

jlynchMicron commented Feb 9, 2022 •

edited

tswast commented Feb 28, 2022

balajivenkatesh commented Mar 12, 2022

jlynchMicron commented Apr 13, 2022

Add support for Server Side Cursors (a.k.a. stream results) #407

Add support for Server Side Cursors (a.k.a. stream results) #407

Comments

jlynchMicron commented Feb 1, 2022 • edited

jlynchMicron commented Feb 9, 2022 • edited

tswast commented Feb 28, 2022

balajivenkatesh commented Mar 12, 2022

jlynchMicron commented Apr 13, 2022

jlynchMicron commented Feb 1, 2022 •

edited

jlynchMicron commented Feb 9, 2022 •

edited