Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Avoid implicit join when using join with unnest #924

Merged
merged 4 commits into from Dec 19, 2023

Conversation

snapiri
Copy link
Contributor

@snapiri snapiri commented Nov 26, 2023

When using JOIN with UNNEST statements, and then creating a SELECT statement based on it, the UNNESTed table will appear twice in the FROM clause, causing an implicit join of the table with itself

Sample of problematic code:

from sqlalchemy import ARRAY, Column, create_engine, func, Integer, MetaData, Table, column, select

db_metadata = MetaData()
engine = create_engine("bigquery://", future=True)

column_defs1 = [Column("id", Integer, nullable=True)]
column_defs2 = [Column("ids", ARRAY(Integer), nullable=True), Column("dummy", Integer, nullable=True)]

table1 = Table("table1", db_metadata, *column_defs1)
table2 = Table("table2", db_metadata, *column_defs2)

unnested_col_name = "unnested_ids"
unnested_ids = func.unnest(table2.c.ids).alias(unnested_col_name)
unnested_id_col = column(unnested_col_name)

q = select(table1.c.id, table2.c.dummy).select_from(unnested_ids.join(table1, table1.c.id == unnested_id_col))

# This is where things go bad
q = select("*").select_from(q.subquery())
compiled = q.compile(engine)
print(str(compiled))

Generates the following output:

SELECT *
FROM (SELECT `table1`.`id` AS `id`, `table2`.`dummy` AS `dummy`
FROM `table2`, `table2` `table2_1`, unnest(`table2_1`.`ids`) AS `unnested_ids` JOIN `table1` ON `table1`.`id` = `unnested_ids`) AS `anon_1`

Co-Authored-By: @yuval-bavli

Fixes #368

@snapiri snapiri requested review from a team as code owners November 26, 2023 14:30
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. labels Nov 26, 2023
@snapiri
Copy link
Contributor Author

snapiri commented Nov 26, 2023

@tswast / @prash-mi Would appreciate your review if possible.

@snapiri snapiri changed the title fix: avoid implicit join when using join with unnest fix: Avoid implicit join when using join with unnest Nov 28, 2023
@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 28, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 28, 2023
@snapiri
Copy link
Contributor Author

snapiri commented Dec 7, 2023

@chalmerlowe / @Linchin,
kind reminder - could you please review?

@Linchin Linchin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 7, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 7, 2023
@Linchin Linchin added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 11, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 11, 2023
@Linchin
Copy link
Contributor

Linchin commented Dec 11, 2023

Hi @snapiri, cover test is also failing

nox > coverage report --show-missing --fail-under=100
Name                                     Stmts   Miss Branch BrPart  Cover   Missing
------------------------------------------------------------------------------------
sqlalchemy_bigquery/__init__.py             12      0      0      0   100%
sqlalchemy_bigquery/_helpers.py             43      0     14      0   100%
sqlalchemy_bigquery/_struct.py              47      0     16      0   100%
sqlalchemy_bigquery/_types.py               53      0     18      0   100%
sqlalchemy_bigquery/base.py                497      0    152      2    99%   275->281, 276->281
sqlalchemy_bigquery/geography.py            51      0     14      0   100%
sqlalchemy_bigquery/parse_url.py           125      0     56      0   100%
sqlalchemy_bigquery/version.py               1      0      0      0   100%
tests/unit/__init__.py                       0      0      0      0   100%
tests/unit/conftest.py                      49      0      4      0   100%
tests/unit/fauxdbi.py                      214      0     70      0   100%
tests/unit/test__struct.py                  33      0      0      0   100%
tests/unit/test_catalog_functions.py        77      0      6      0   100%
tests/unit/test_comments.py                 28      0      0      0   100%
tests/unit/test_compiler.py                 68      0      0      0   100%
tests/unit/test_compliance.py               61      0      8      0   100%
tests/unit/test_dialect_types.py             7      0      2      0   100%
tests/unit/test_engine.py                   30      0      0      0   100%
tests/unit/test_geography.py                58      0      0      0   100%
tests/unit/test_helpers.py                  94      0      0      0   100%
tests/unit/test_like_reescape.py            17      0      0      0   100%
tests/unit/test_parse_url.py                90      0     10      0   100%
tests/unit/test_select.py                  157      0     22      0   100%
tests/unit/test_sqlalchemy_bigquery.py      76      0      4      0   100%
tests/unit/test_version.py                   6      0      0      0   100%
tests/unit/test_view.py                      7      0      0      0   100%
------------------------------------------------------------------------------------
TOTAL                                     1901      0    396      2    99%
Coverage failure: total of 99 is less than fail-under=100

When using JOIN with UNNEST statements, and then creating a SELECT statement
based on it, the UNNESTed table will appear twice in the FROM clause,
causing an implicit join of the table with itself
@snapiri
Copy link
Contributor Author

snapiri commented Dec 12, 2023

Hi @Linchin,

Thanks for the feedback.
I fixed cover and added more tests.
Would appreciate any feedback

@snapiri
Copy link
Contributor Author

snapiri commented Dec 18, 2023

@tswast / @Linchin / @chalmerlowe
Anyone care to review?

@tswast tswast added the kokoro:run Add this label to force Kokoro to re-run the tests. label Dec 19, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Dec 19, 2023
@snapiri
Copy link
Contributor Author

snapiri commented Dec 19, 2023

@tswast - Not sure I know how to read the Kokoro results - I see that the system-3.8 tests failed with the following error:

google.api_core.exceptions.ServiceUnavailable: 503 GET https://bigquery.googleapis.com/bigquery/v2/projects/precise-truck-742/jobs/72b3d73c-939c-4d7f-ab50-06f073ecac58?location=US&prettyPrint=false: The service is currently unavailable.

Also I see that there were 6 runs where the first failed and the 2-6 succeeded.

Anything I should change?

@tswast
Copy link
Collaborator

tswast commented Dec 19, 2023

@snapiri That looks like a backend issue, not anything related to this code. I'll trigger a re-run.

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 19, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 19, 2023
@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Dec 19, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Dec 19, 2023
@tswast tswast merged commit ac74a34 into googleapis:main Dec 19, 2023
14 of 15 checks passed
@snapiri snapiri deleted the 368-handle-inner-join-unnest branch December 19, 2023 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnest Function In BigQuery SQLAlchemy Plugin Joins Twice When Selecting Other Columns in CTE or Subquery
4 participants