Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: pandas connector #155

Open
tswast opened this issue Oct 14, 2020 · 6 comments · May be fixed by #226
Open

feature request: pandas connector #155

tswast opened this issue Oct 14, 2020 · 6 comments · May be fixed by #226
Assignees
Labels
api: spanner Issues related to the googleapis/python-spanner API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Contributor

tswast commented Oct 14, 2020

Is your feature request related to a problem? Please describe.

I'd like to be able to run a query against a Spanner database and download (possibly large-ish -- MBs to GBs) results to a pandas DataFrame. Specifically, I'd like to eventually use this as a component in an ibis connector, but it'd also be useful for general data processing pipelines.

Describe the solution you'd like

It seems that StreamedResultSet is the most natural place to put a to_dataframe method, similar to the RowIterator.to_dataframe method in the BigQuery client library.

Since pandas needn't be required to use this client library, the import should be conditional

https://github.com/googleapis/python-bigquery/blob/fb401bd94477323bba68cf252dd88166495daf54/google/cloud/bigquery/table.py#L29-L32

and the dependency listed in "extras".

https://github.com/googleapis/python-bigquery/blob/fb401bd94477323bba68cf252dd88166495daf54/setup.py#L50

Describe alternatives you've considered

It's possible this is simpler than realized, so maybe could just be a code sample.

If there were a SQLAlchemy connector (a much bigger project than read-only pandas dataframe), then pandas support is basically free via pandas.read_sql.

Additional context

Related StackOverflow questions:

@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/python-spanner API. label Oct 14, 2020
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Oct 16, 2020
@larkee larkee added priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Oct 21, 2020
@larkee
Copy link
Contributor

larkee commented Dec 15, 2020

Thank you for your patience!

We've started work on an SQLAlchemy connector and will continue with work for a couple of quarters. As mentioned, this should cover your use case.

@tswast tswast linked a pull request Feb 17, 2021 that will close this issue
4 tasks
@daniellehanks
Copy link
Contributor

Thank you for your patience!

We've started work on an SQLAlchemy connector and will continue with work for a couple of quarters. As mentioned, this should cover your use case.

I just stumbled across this. I've wanted this for a long time and it would be a game changer for my company. Is there an issue we can track for updates on this development?

@larkee
Copy link
Contributor

larkee commented Mar 9, 2021

@daniellehanks Thank you for sharing your interest in this work! The SQLAlchemy connector work is being done here. You can follow the progress there. As noted in the README, it is still under production and is not ready for production use.

@ansh0l
Copy link
Member

ansh0l commented Jan 17, 2022

@larkee @vi3k6i5 : Given python-spanner-SQLAlchemy is now GA, is this use case covered?

@tswast
Copy link
Contributor Author

tswast commented Jan 18, 2022

It'd be good to check that it does indeed work with pandas.read_sql in a code sample or something. I would expect it to work, though.

Since Spanner is row-oriented, I don't see there being all that much of a performance reason to avoid SQLAlchemy (compared to BigQuery which is column-oriented).

@asthamohta
Copy link
Contributor

@IlyaFaer Can you check for this?

@asthamohta asthamohta assigned IlyaFaer and unassigned larkee Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/python-spanner API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants