Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Implementation of batch ddl in dbapi #1092

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

feat!: Implementation of batch ddl in dbapi #1092

wants to merge 4 commits into from

Conversation

ankiaga
Copy link
Contributor

@ankiaga ankiaga commented Feb 2, 2024

We are also adding a boolean property buffer_ddl_statements (disabled by default) which is changing the behavior of dbapi as currently it behave as per the property being enabled. We need to override the flag to enable it in SqlAlchemy and Django. More details about this flag below:

If the connection is in auto commit mode then this flag doesn't have any significance as ddl statements would be executed as they come.
For connection not in auto commit mode:

  •     If enabled ddl statements would be buffered at client and not executed at cloud spanner. When a non ddl statement comes or a transaction is committed then all the existing buffered ddl statements would be executed.
    
  •     If disabled then its an error to execute ddl statement in autocommit mode.
    

More details here https://docs.google.com/document/d/1mSVC_AhPpdSA0xUoj4LUt2o2tVM_LDv6IdssiwWnbmw

@ankiaga ankiaga requested review from a team as code owners February 2, 2024 04:37
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: spanner Issues related to the googleapis/python-spanner API. labels Feb 2, 2024
@@ -61,7 +106,7 @@ def execute_statement(self, parsed_statement: ParsedStatement):
raise ProgrammingError("Only DML statements are allowed in batch DML mode.")
self._statements.append(parsed_statement.statement)

def run_batch_dml(self):
def run_batch(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a breaking change?

We can probably get away with it now, but this is something that we need to be more careful with, and we should try to mark whatever is not for public consumption as such as much as possible.

Either by adding a _ at the start of the method name, or by explicitly adding documentation that the method is not for public use, and can change at any moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Added the following documentation on the method This method is internal and not for public use as it can change anytime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of renaming the method, we could mark it as deprecated and create a new one with the _ prefix which indicates that it is internal. Please avoid a major version bump if at all possible as it is disruptive for downstream users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the reason because of which we are proposing a major version bump. Please check https://docs.google.com/document/d/1mSVC_AhPpdSA0xUoj4LUt2o2tVM_LDv6IdssiwWnbmw for details

@@ -126,6 +134,15 @@ def autocommit(self):
"""
return self._autocommit

@property
def buffer_ddl_statements(self):
"""Whether to buffer ddl statements for this connection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more documentation. I (think I) understand what it means. To someone who is not familiar with Cloud Spanner, it is not clear what this means. It is also not clear when they should enable/disable this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -91,7 +95,9 @@ class Connection:
should end a that a new one should be started when the next statement is executed.
"""

def __init__(self, instance, database=None, read_only=False):
def __init__(
self, instance, database=None, read_only=False, buffer_ddl_statements=False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buffer_ddl_statements=False means that we are changing the default behavior in a significant way. That requires a major version bump, and should be clearly called out in the release notes. The description of the PR will automatically be included in the release notes, so please add information about this change there.

Also, in order to force a major version bump, you can add a ! to the title like this: feat!: Support explicit DDL batching

(Also note that the title of the PR will be included in the release notes, and as such is intended for public consumption. We should therefore try to keep that title as descriptive as possible for external users so they understand what the change is.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description and changed title to force a major version bump

self._batch_ddl_executor = BatchDdlExecutor(self)

@check_not_closed
def execute_batch_ddl_statement(self, parsed_statement: ParsedStatement):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this method to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment that its an internal method to this and other methods as well

google/cloud/spanner_dbapi/connection.py Show resolved Hide resolved
if auto_commit:
self._conn.autocommit = True

self._cursor.execute("start batch ddl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make more sense if this statement already causes an error when auto_commit=False, instead of letting the error only happen when run batch is executed. Or is there some scenario where this could become a valid DDL batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what if during start batch ddl auto_commit was True but at run batch it was False. The reverse is not possible as if it was False during start batch ddl then the connection is in transaction and one cant change the auto commit mode when a transaction is in progress

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that my main takeaway from that is: It should not be possible to change the value of auto_commit while a batch is in progress. That should be the case for both for DDL batches and DML batches.

Take for example DML batches, and assume the following script:

set auto_commit=True;
start batch dml;
insert into foo (..);
set auto_commit=False;
insert into bar (...);
run batch;
rollback;

What should be the result of the above script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change to not allow changing autocommit mode when a batch is in progress. Also raising error when starting a ddl batch when connection is not in autocommit mode

Comment on lines 494 to 496
self._cursor.execute("DROP TABLE Table_1")
assert table_2.exists() is True
self._cursor.execute("DROP TABLE Table_2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: combine the two DROP TABLE statements into one batch, that is more efficient to execute on Spanner. (So first verify that both tables exist, and then drop them)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@pytest.mark.parametrize("autocommit", [True, False])
def test_ddl_execute(self, autocommit, dbapi_database):
"""Check that DDL statement results in successful execution for execute
method in autocommit mode while it's a noop in non-autocommit mode."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should include the fact that the connection also uses buffered DDL. And that again means that the DDL statement is not a no-op (I think), but it is being buffered. And the test should then verify tha the statement is actually buffered, and that it is not really a no-op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1196,7 +1232,7 @@ def test_ddl_executemany_autocommit_false(self, dbapi_database):
table = dbapi_database.table("DdlExecuteManyAutocommit")
assert table.exists() is False

def test_ddl_execute(self, dbapi_database):
def test_ddl_then_non_ddl_execute(self, dbapi_database):
"""Check that DDL statement followed by non-DDL execute statement in
non autocommit mode results in successful DDL statement execution."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the comments of these test cases, that they require buffer_ddl=True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1339,31 +1375,23 @@ def test_json_array(self, dbapi_database):
op = dbapi_database.update_ddl(["DROP TABLE JsonDetails"])
op.result()

@pytest.mark.noautofixt
def test_DDL_commit(self, shared_instance, dbapi_database):
def test_DDL_commit(self, dbapi_database):
"""Check that DDLs in commit mode are executed on calling `commit()`."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this also is only true if buffer_ddl=True. In other cases, this would cause an error (?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats correct. Added the comment

@ankiaga ankiaga changed the title feat: Implementation of batch ddl in dbapi feat!: Implementation of batch ddl in dbapi Feb 5, 2024
"""Executes all the buffered statements on the active dml batch by
making a call to Spanner.

This method is internal and not for public use as it can change anytime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here and elsewhere

Suggested change
This method is internal and not for public use as it can change anytime.
This method is internal and not for public use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that you have not pushed that change yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did not. Thanks done

Comment on lines 140 to 148
If the connection is in auto commit mode then this flag doesn't have any
significance as ddl statements would be executed as they come.

For connection not in auto commit mode:
If enabled ddl statements would be buffered at client and not executed
at cloud spanner. When a non ddl statement comes or a transaction is
committed then all the existing buffered ddl statements would be executed.

If disabled then its an error to execute ddl statement in autocommit mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the connection is in auto commit mode then this flag doesn't have any
significance as ddl statements would be executed as they come.
For connection not in auto commit mode:
If enabled ddl statements would be buffered at client and not executed
at cloud spanner. When a non ddl statement comes or a transaction is
committed then all the existing buffered ddl statements would be executed.
If disabled then its an error to execute ddl statement in autocommit mode.
This flag determines how DDL statements are handled when auto_commit=False:
1. buffer_ddl_statements=True: DDL statements are buffered in the client until the
next non-DDL statement, or until the transaction is committed. Executing a
non-DDL statement causes the connection to send all buffered DDL statements
to Spanner, and then to execute the non-DDL statement. Note that although the
DDL statements are sent as one batch to Spanner, they are not guaranteed to be
atomic. See https://cloud.google.com/spanner/docs/reference/rpc/google.spanner.admin.database.v1#google.spanner.admin.database.v1.UpdateDatabaseDdlRequest
for more details on DDL batches.
2. buffer_ddl_statements=False: Executing DDL statements is not allowed and an
error will be raised. When the connection is in auto_commit=False mode, a
transaction is active, and DDL statements cannot be executed in a transaction.
This flag is ignored when auto_commit=True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

google/cloud/spanner_dbapi/cursor.py Show resolved Hide resolved
@@ -152,7 +154,7 @@ def test_commit_exception(self):
try:
self._conn.commit()
except Exception:
pass
self._conn.database._pool._sessions.get()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, would you mind adding a comment to the code here to explain why we are doing this.

if auto_commit:
self._conn.autocommit = True

self._cursor.execute("start batch ddl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that my main takeaway from that is: It should not be possible to change the value of auto_commit while a batch is in progress. That should be the case for both for DDL batches and DML batches.

Take for example DML batches, and assume the following script:

set auto_commit=True;
start batch dml;
insert into foo (..);
set auto_commit=False;
insert into bar (...);
run batch;
rollback;

What should be the result of the above script?

@ankiaga ankiaga added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 9, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 9, 2024
@ankiaga ankiaga added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 12, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/python-spanner API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants