Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add MERGE DDL #429

Closed
wants to merge 4 commits into from
Closed

feat: Add MERGE DDL #429

wants to merge 4 commits into from

Conversation

munro
Copy link

@munro munro commented Mar 17, 2022

help wanted!! ⛑

Throwing over some code I'm currently using to generate MERGE statements.

Overall I really like that API, and it feels really slick when I've been using it.

One thing that is missing is ThenUpdate / ThenInsert doesn't check if the key actually exists on the into table—which may be desirable in the instance the developer didn't describe the table in SQLAlchemy (I often use it like that). But it would be nice to check if the column does exist if the schema is defined.

Fixes #428 🦕🙌🆒

@munro munro requested a review from a team as a code owner March 17, 2022 18:57
@munro munro requested review from a team and shollyman March 17, 2022 18:57
@google-cla
Copy link

google-cla bot commented Mar 17, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Mar 17, 2022
@munro munro force-pushed the feature/merge_ddl branch 2 times, most recently from 9b3090a to 1e0f2a6 Compare March 17, 2022 19:07
@shollyman shollyman requested a review from tswast March 17, 2022 19:12
A MERGE statement is a DML statement that can combine INSERT, UPDATE, and DELETE
operations into a single statement and perform the operations atomically.

This commit introduces support for generating MERGE statements programmatically
with SQLAlchemy.

Refs: googleapis#428
@tswast
Copy link
Collaborator

tswast commented Mar 22, 2022

@munro Thanks so much for the contribution. I hope to review soon. Unfortunately, I cannot merge this without the account associated with your commit (I see a kw.com email) signing the Google CLA at https://cla.developers.google.com/clas

Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! A few initial comments, though the CLA issue will also need to be resolved.

sqlalchemy_bigquery/merge.py Show resolved Hide resolved

@dataclass(frozen=True)
class MergeThenDelete(MergeThen):
...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is meant as a TODO, can we add a comment indicating that?

sqlalchemy_bigquery/merge.py Outdated Show resolved Hide resolved
sqlalchemy_bigquery/merge.py Show resolved Hide resolved
from sqlalchemy.sql.selectable import Alias, Select, TableClause


class MergeThen:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to get some docstrings for the classes in this module explaining some of the purpose of the classes and a bit about the overall design.

tests/unit/test_merge.py Outdated Show resolved Hide resolved
tests/unit/test_merge.py Show resolved Hide resolved
@tswast tswast added kokoro:run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Mar 22, 2022
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Mar 22, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Mar 22, 2022
@parthea parthea changed the title [Fixes #428] Add MERGE DDL feat: Add MERGE DDL Mar 30, 2022
@pykenny
Copy link

pykenny commented Sep 23, 2022

Any updates to this PR?

Also, merge statement in BigQuery allows more complicated case branching like:

MERGE INTO target AS t
USING source AS s
ON t.id == s.id
-- Special cases
WHEN MATCHED AND s.expire_at <= CURRENT_DATETIME() THEN DELETE
WHEN NOT MATCHED BY SOURCE AND t.expire_at <= CURRENT_DATETIME() THEN DELETE
WHEN NOT MATCHED AND s.flag_a = 2 THEN INSERT ...
WHEN NOT MATCHED AND s.flag_a = 3 THEN INSERT ...
-- Defaults
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED BY SOURCE THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT ...
-- Unreachable cases
WHEN NOT MATCHED AND s.flag_a = 2 THEN INSERT ...
WHEN NOT MATCHED AND s.flag_a = 3 THEN INSERT ...
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED BY SOURCE THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT ...

Which means:

  • Filtering condition not only is applicable to "matched" but also can be applied to the other two cases ("not matched by source/target").
  • All three merge situation clauses, no matter having additional filtering condition or not, can be passed for multiple times as long as they're having valid syntax (and parser doesn't care about unreachable cases)

@pykenny
Copy link

pykenny commented Sep 23, 2022

NOT MATCHED BY SOURCE and MATCHED cases allows delete and update (but not insert) operations, while NOT MATCHED (BY TARGET) only allows insert operation. I think that constraint is absent in current implementation as well.

@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Sep 23, 2022
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Sep 23, 2022
@munro
Copy link
Author

munro commented Sep 23, 2022

@pykenny Oh wow, thank you, I didn't even know you could have multiple of the same when conditions. I read the documentation [1] and made it match.

I don't like the syntax (currently).

merge = Merge(
    target=target,
    source=source,
    on=into.c.b == using.c.b
    when=[
        Merge.WhenMatched(Merge.ThenInsert({"foo": source.c.foo}), condition=target.c.foo > 10),
    ]
)

I really wanted to avoid 'piping', but IDK, it may be nicer...?

merge = Merge(
    target=target,
    source=source,
    on=into.c.b == using.c.b
)

merge = merge.when_matched(target.foo > 10).then_insert({"foo": source.foo})
merge = merge.when_matched_not_matched_by_source().then_delete()

... it does kinda look nice. 😄

Edited: went with this!!!

Also contemplating a 'partial' piping syntax:

Merge(
    target=target,
    source=source,
    on=into.c.b == using.c.b,
    when=[
        Merge.WhenMatched(target.foo > 10).then_insert({"foo": source.foo})
    ]
)

[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement

@munro
Copy link
Author

munro commented Sep 23, 2022

@pykenny OK I went with the pipe syntax, it feels really nice after all. All I got for now, code still really needs help & documentation — not sure if you have any time but it would be a huge help! ❤️‍🔥

Comment on lines +57 to +61
def when_matched_not_matched_by_target(self, condition: MergeConditionType = None):
return MergeWhenNotMatchedByTarget(self, condition)

def when_matched_not_matched_by_source(self, condition: MergeConditionType = None):
return MergeWhenNotMatchedBySource(self, condition)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't they be when_not_matched_by_target and when_not_matched_by_source? 🤔

def _compile_when_not_matched_by_source(
self: WhenNotMatchedBySource, compiler: SQLCompiler, **kwargs
):
return "WHEN MATCHED NOT MATCHED BY SOURCE" + _compile_when(
Copy link

@pykenny pykenny Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHEN MATCHED NOT MATCHED BY SOURCE -> WHEN NOT MATCHED BY SOURCE

MergeConditionType = Optional[ColumnElement[sa.Boolean]]


class Merge(ClauseElement):
Copy link

@pykenny pykenny Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClauseElement instance is not executable, so passing a Merge instance to Connection.execute() doesn't work.

updates = {"id": source.id, "name": source.name}
stmt = (
  Merge(target, source, on=target.id == source.id)
  .when_matched()
  .then_update(updates)
  .when_not_matched_by_target()
  .then_insert(updates)
)
connection.execute(stmt) # Ooops

How about inheriting UpdateBase instead?

@pykenny
Copy link

pykenny commented Dec 15, 2022

@munro Would it be possible to make Merge support ORM models as well? Users may want to write this:

from sqlalchemy.orm import aliased
...

alias_dest = aliased(MyModelDest)
alias_source = aliased(MyModelSource)

stmt = Merge(
    alias_dest,
    alias_source,
    on=...,
).when_matched().then_update(...)

@meredithslota
Copy link

Unfortunately, we can't accept contributions without a signed CLA. This has been open for a while with a couple comments to this effect, and while I can tell it would be helpful, we can't accept it as-is. See #429 (comment) and our CONTRIBUTING guide for more info.

@munro
Copy link
Author

munro commented Dec 30, 2023

@munro Thanks so much for the contribution. I hope to review soon. Unfortunately, I cannot merge this without the account associated with your commit (I see a kw.com email) signing the Google CLA at https://cla.developers.google.com/clas

@meredithslota I just signed the CLA! [1] Signed it because I see activity in the PR #428

[1] https://cla.developers.google.com/clas/new?domain=DOMAIN_GOOGLE&kind=KIND_INDIVIDUAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for MERGE DDL
5 participants