New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add MERGE DDL #429
feat: Add MERGE DDL #429
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). For more information, open the CLA check for this pull request. |
9b3090a
to
1e0f2a6
Compare
A MERGE statement is a DML statement that can combine INSERT, UPDATE, and DELETE operations into a single statement and perform the operations atomically. This commit introduces support for generating MERGE statements programmatically with SQLAlchemy. Refs: googleapis#428
1e0f2a6
to
a94e17d
Compare
@munro Thanks so much for the contribution. I hope to review soon. Unfortunately, I cannot merge this without the account associated with your commit (I see a kw.com email) signing the Google CLA at https://cla.developers.google.com/clas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! A few initial comments, though the CLA issue will also need to be resolved.
sqlalchemy_bigquery/merge.py
Outdated
|
||
@dataclass(frozen=True) | ||
class MergeThenDelete(MergeThen): | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is meant as a TODO, can we add a comment indicating that?
sqlalchemy_bigquery/merge.py
Outdated
from sqlalchemy.sql.selectable import Alias, Select, TableClause | ||
|
||
|
||
class MergeThen: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great to get some docstrings for the classes in this module explaining some of the purpose of the classes and a bit about the overall design.
Any updates to this PR? Also, merge statement in BigQuery allows more complicated case branching like: MERGE INTO target AS t
USING source AS s
ON t.id == s.id
-- Special cases
WHEN MATCHED AND s.expire_at <= CURRENT_DATETIME() THEN DELETE
WHEN NOT MATCHED BY SOURCE AND t.expire_at <= CURRENT_DATETIME() THEN DELETE
WHEN NOT MATCHED AND s.flag_a = 2 THEN INSERT ...
WHEN NOT MATCHED AND s.flag_a = 3 THEN INSERT ...
-- Defaults
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED BY SOURCE THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT ...
-- Unreachable cases
WHEN NOT MATCHED AND s.flag_a = 2 THEN INSERT ...
WHEN NOT MATCHED AND s.flag_a = 3 THEN INSERT ...
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED BY SOURCE THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT ... Which means:
|
|
@pykenny Oh wow, thank you, I didn't even know you could have multiple of the same when conditions. I read the documentation [1] and made it match.
merge = Merge( target=target, source=source, on=into.c.b == using.c.b when=[ Merge.WhenMatched(Merge.ThenInsert({"foo": source.c.foo}), condition=target.c.foo > 10), ] ) I really wanted to avoid 'piping', but IDK, it may be nicer...? merge = Merge(
target=target,
source=source,
on=into.c.b == using.c.b
)
merge = merge.when_matched(target.foo > 10).then_insert({"foo": source.foo})
merge = merge.when_matched_not_matched_by_source().then_delete() ... it does kinda look nice. 😄 Edited: went with this!!!
Merge( target=target, source=source, on=into.c.b == using.c.b, when=[ Merge.WhenMatched(target.foo > 10).then_insert({"foo": source.foo}) ] ) [1] https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement |
@pykenny OK I went with the pipe syntax, it feels really nice after all. All I got for now, code still really needs help & documentation — not sure if you have any time but it would be a huge help! ❤️🔥 |
…on. Added pipe syntax!
fc5a8b9
to
5f7105a
Compare
def when_matched_not_matched_by_target(self, condition: MergeConditionType = None): | ||
return MergeWhenNotMatchedByTarget(self, condition) | ||
|
||
def when_matched_not_matched_by_source(self, condition: MergeConditionType = None): | ||
return MergeWhenNotMatchedBySource(self, condition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't they be when_not_matched_by_target
and when_not_matched_by_source
? 🤔
def _compile_when_not_matched_by_source( | ||
self: WhenNotMatchedBySource, compiler: SQLCompiler, **kwargs | ||
): | ||
return "WHEN MATCHED NOT MATCHED BY SOURCE" + _compile_when( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WHEN MATCHED NOT MATCHED BY SOURCE
-> WHEN NOT MATCHED BY SOURCE
MergeConditionType = Optional[ColumnElement[sa.Boolean]] | ||
|
||
|
||
class Merge(ClauseElement): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClauseElement
instance is not executable, so passing a Merge
instance to Connection.execute()
doesn't work.
updates = {"id": source.id, "name": source.name}
stmt = (
Merge(target, source, on=target.id == source.id)
.when_matched()
.then_update(updates)
.when_not_matched_by_target()
.then_insert(updates)
)
connection.execute(stmt) # Ooops
How about inheriting UpdateBase
instead?
@munro Would it be possible to make from sqlalchemy.orm import aliased
...
alias_dest = aliased(MyModelDest)
alias_source = aliased(MyModelSource)
stmt = Merge(
alias_dest,
alias_source,
on=...,
).when_matched().then_update(...) |
Unfortunately, we can't accept contributions without a signed CLA. This has been open for a while with a couple comments to this effect, and while I can tell it would be helpful, we can't accept it as-is. See #429 (comment) and our CONTRIBUTING guide for more info. |
@meredithslota I just signed the CLA! [1] Signed it because I see activity in the PR #428 [1] https://cla.developers.google.com/clas/new?domain=DOMAIN_GOOGLE&kind=KIND_INDIVIDUAL |
help wanted!! ⛑
Throwing over some code I'm currently using to generate
MERGE
statements.Overall I really like that API, and it feels really slick when I've been using it.
One thing that is missing is
ThenUpdate
/ThenInsert
doesn't check if the key actually exists on theinto
table—which may be desirable in the instance the developer didn't describe the table in SQLAlchemy (I often use it like that). But it would be nice to check if the column does exist if the schema is defined.Fixes #428 🦕🙌🆒