Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Handle passing of arrays to in statements more efficiently in SQLAlchemy 1.4 and higher #253

Merged
merged 24 commits into from Aug 25, 2021

Conversation

jimfulton
Copy link
Contributor

@jimfulton jimfulton commented Aug 17, 2021

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #194 🦕

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Aug 17, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 17, 2021
@jimfulton jimfulton changed the title feat: Handle passing of arrays to in statements more efficient. feat: Handle passing of arrays to in statements more efficient in SQLAlchemy 1.4 and higher Aug 17, 2021
@jimfulton jimfulton changed the title feat: Handle passing of arrays to in statements more efficient in SQLAlchemy 1.4 and higher feat: Handle passing of arrays to in statements more efficiently in SQLAlchemy 1.4 and higher Aug 17, 2021
@jimfulton jimfulton marked this pull request as ready for review August 17, 2021 18:29
@jimfulton jimfulton requested a review from a team as a code owner August 17, 2021 18:29
@jimfulton
Copy link
Contributor Author

jimfulton commented Aug 17, 2021

For reviewers:

When dealing with an in expression, like sqlalchemy.literal(-1).in_(list(range(99999))) (see test_huge_in), SQLAlchemy wants to turn each element of the passed list into a separate parameter, which is inefficient and breaks for large lists.

BigQuery lets you pass array parameters, which is much better for this case.

See the comment here: https://github.com/googleapis/python-bigquery-sqlalchemy/pull/253/files#diff-efe1987a73ae5b54aa817dc8c70541025df89d9d08601c8ab33fa84ad8c09de7R421-R441

For SQLAlchemy 1.4, it's fairly easy to disable this behavior. (I beat my head against earlier versions and gave up. :)) The only trick is to make sure we call UNNEST on the parameter in SQL. We have to do that in the non-array case too, although it's more complicated.

Finally, in-related tests:

  • Have to expect different SQL for SQLAlchemy 1.4 and earlier versions.
  • For unit tests, because we're using sqlite, we don't get correct results, because sqlite doesn't support arrays, so we only check the SQL generated.

@jimfulton jimfulton requested a review from plamut August 17, 2021 18:37
def test_select_in_lit(faux_conn, last_query):
faux_conn.execute(sqlalchemy.select([sqlalchemy.literal(1).in_([1, 2, 3])]))
last_query(
"SELECT %(param_1:INT64)s IN UNNEST(%(param_2:INT64)s) AS `anon_1`",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the type of param_2 be ARRAY<INT64>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the BQ DB API does special handling of arrays.

It sees that we have a scalar type of INT64 and that we have a sequence, and then creates a ArrayQueryParameter.

It happens that since I added struct support, passing ARRAY<INT64> would probably work (because I have to handle structs of arrays). But just usng INT64 works too.

@jimfulton jimfulton merged commit 7692704 into googleapis:master Aug 25, 2021
@jimfulton jimfulton deleted the no-expand branch August 25, 2021 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support the passing of arrays as single arguments to BigQuery
2 participants