New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transactions via an internal global session var #2569
base: master
Are you sure you want to change the base?
Transactions via an internal global session var #2569
Conversation
Still lots of questions...
* Maintain state of session when entering and exiting transaction * Update any collection methods used in mongoengine that accept session * Update any database methods used in mongoengine that accept session * Cascades should be accounted for as those methods are wrappers which call the same low-level pymongo colletion APIs updated above * Any signals that try to read or write should also be wrapped in the transaction
Something important for us to keep in mind: pymongo is extremely upfront that sessions are neither fork-safe nor thread-safe. At minimum the docs would need to mirror this, but from an implementation stand point, minor protections could be added here by leveraging python thread's local storage. That said...it wouldn't actually provide any real protections in a threaded/forked environment, given that behind the scenes pymongo is still the one in control. |
Whoops - forgot to give travis the "replicaSet" treatment! |
OK - I see what's going on - per the docs
So - the test suite is accurate! 😄 I'm testing against a more recent version of Mongo, which is why the tests pass locally. I think it makes sense to bubble up the exception to a user that they're violating the expectations of the version of Mongo they're using. Should I just have a conditional assertion in my tests that this case should throw an exception for versions<4.4 but >=4.4 the operation is allowed and successful? |
Digging through the repo, looks like this would be an acceptable approach. |
* Mongo 3 doesn't support transactions * Mongo 4 does, but 4.4 specifically started allowing colleciton creation in a transaction
This update accounts for the fact that at least one workflow runs against Mongo 4.0.23
is there an estimate on when this might be merged into master? |
pymongo added support for transactions in 3.7
Apologize for the delay, I had a first look at the proposal and it seems to be going in the right direction. From what I understand the Session is indeed documented as not thread safe, so you can't share a same Session among different threads, but you can let different threads instantiate their own session. So using thread locals should do the trick here. Each thread would create its own Session as needed. We would also need a test that simulates this (I would mix ~ 10 threads, with random delays, some of them saving objects, some of them aborting due to an exception), then assert at the end that only expected items were saved. |
No apologies necessary! Thanks for the feedback - I'll try to address them as soon as I can. Great idea about thread locals, btw! |
Use an extension of `threading.locals` to handle initialized sessions within a thread to ensure session isolation from other threads. Also, given that pymongo supports nested sessions + nested transactions, I added a deque to track the "current" session state and mimic this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @juannyG, PyMongo maintainer here, this is great work! I'm adding a few thoughts on this feature and how it can be improved in the future.
-
threadlocal is good but using contextvars would be better. Since contextvars requires Python 3.7+ we can rewrite it once mongoengine drops support for 3.6.
-
It would be good to add an api to set a session without starting a transaction, that way users can use sessions without transactions. This can be added as a follow on feature. like:
with run_with_session() as session: Model1.objects.get() Model2.objects.get()
-
The current implementation will reuse the local session even when using a different connection alias. I believe this is fine since pymongo will raise an error but it would be great to document the limitation so users have a reference point when they encounter this error.
-
I do think we should seriously consider abstracting session.with_transaction (added in pymongo 3.9) rather than start_transaction(), otherwise mongoengine users will need to write complex retry logic. We could add this version of
run_in_transaction
and later introduce a second methodrun_with_transaction
but I believe it will be confusing to have two apis. The big difference api-wise is that with_transaction accepts a callable instead of being a context manager. I propose we merged this PR and then update therun_in_transaction
api in a follow PR to use with_transaction, like this:def run_in_transaction( callback, alias=DEFAULT_CONNECTION_NAME, session_kwargs=None, transaction_kwargs=None ): conn = get_connection(alias) session_kwargs = session_kwargs or {} with conn.start_session(**session_kwargs) as session: transaction_kwargs = transaction_kwargs or {} transaction_kwargs["callback"] = callback _set_session(session) try: session.with_transaction(**transaction_kwargs): finally: _clear_session()
Then the updated api would be used like this:
def callback(): a_doc.update(name="a2") b_doc.update(name="b2") run_in_transaction(callback)
mongoengine/connection.py
Outdated
@@ -264,6 +267,8 @@ def disconnect(alias=DEFAULT_CONNECTION_NAME): | |||
if alias in _connection_settings: | |||
del _connection_settings[alias] | |||
|
|||
_local_sessions.clear_all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this? I would expect the session to only be removed from threadlocal when exiting the run_in_transaction block. Otherwise a session could unexpectedly disappear within a block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...in retrospect I think I was trying to protect from the very weird use case of: someone calls disconnect
inside of a transaction.
This is definitely over board for that use case. I think, instead, it's OK to say "don't call disconnect in a transaction - this will result in undefined behavior"
mongoengine/context_managers.py
Outdated
@contextmanager | ||
def run_in_transaction( | ||
alias=DEFAULT_CONNECTION_NAME, session_kwargs=None, transaction_kwargs=None | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a docstring here would be nice. (Could be done in a follow up PR.)
mongoengine/pymongo_support.py
Outdated
@@ -67,7 +71,7 @@ def count_documents( | |||
def list_collection_names(db, include_system_collections=False): | |||
"""Pymongo>3.7 deprecates collection_names in favour of list_collection_names""" | |||
if PYMONGO_VERSION >= (3, 7): | |||
collections = db.list_collection_names() | |||
collections = db.list_collection_names(session=connection._get_session()) | |||
else: | |||
collections = db.collection_names() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing session here.
tests/test_context_managers.py
Outdated
assert "b2" == B.objects.get(id=b_doc.id).name | ||
|
||
@requires_mongodb_gte_44 | ||
def test_collection_creation_via_upsersts_across_databases_in_transaction(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upsersts -> upserts
tests/test_context_managers.py
Outdated
pass | ||
except pymongo.errors.OperationFailure as op_failure: | ||
""" | ||
If there's a TransientTransactionError, retry - the lock could not be acquired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is automatically handled by session.with_transaction (added in pymongo 3.9) and is a good example of why we probably want to use with_transaction() rather than start_transaction(). Otherwise we'll be forcing mongoengine users into the painful act of diagnosing these errors and writing this retry logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - to be frank, this test was a pain 🤣
Thank you so much for the feedback @ShaneHarvey Great call about using I do see an implication with this approach: if you want to use mongoengine transactions, you have to be using |
* Don't clear out all sessions when disconnecting one alias * Add missing session kwargs * Fix `upserts` typo
Include warning about trying to run a transaction across different connections
I believe I've addressed the feedback provided, but do let me know if anything else comes to mind. I'm more than happy to do a follow up PR modifying the API to use My only concern with doing it as a follow up is that it would be a non-backwards compatible change and the the implication I mentioned above ( Looking forward to your thoughts! |
This PR #2768 |
Yes I totally agree that it's better done in this PR to avoid breaking changes or code churn.
This is okay. You could even check the pymongo version at runtime (like mongoengine does here) and give a nice error message if you like. |
pymongo documentation explicitly states the callback of `with_transaction` should not start new transactions
It's not a context manager anymore!
I've completed the modifications so that we now abstract The other additional change I made was to move Let me know if anything else comes to mind! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/test_connection.py
Outdated
is now how mongod is run for the test suites for transactions. | ||
""" | ||
connect( | ||
"mongoenginetest2", alias="t2", host="127.0.0.1", directConnection=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change still needed? It seems unrelated to the run_in_transaction feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this test I needed to make a change in order for it to pass. It seemed to me that we needed to account for how pymongo deals with connections to replica sets (which is needed for transactions) as of >= 4.0 https://pymongo.readthedocs.io/en/stable/migrate-to-pymongo4.html#mongoclient
Without it, this assertion fails because mongo_connections["t2"].address[0]
is equal to localhost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks for the explanation. The issue is that without directConnection=True, the client will discover the real server address of the replica set member so mongo_connections["t2"].address[0]
is equal to localhost
not 127.0.0.1
. This seems fine, although the test could be rewritten to just do this instead of checking the address property:
assert mongo_connections["t1"] is not mongo_connections["t2"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 great call. I went ahead and made this update as well
Not a context mgr anymore
again apologize for the delay, I'm finally getting back to this. @ShaneHarvey I find the "callback api" (i.e I agree that the retry logic is great to abstract from end users but Pymongo should provide the most flexible and pythonic approach. Using callback and lambda's also interfer with mypy so unless I'm missing something it really provides no advantages... In fact if Pymongo would offer the "context manager API" (that would include the retry logic), we could easily write a "callback" wrapper in just a few lines of Python but we can't do the opposite and derive a context manager out of the "callback api" which is a pity... |
@bagerard Thank you for the feedback. Would you prefer the implementation I had prior to the callback modification - or is there something else you have in mind? |
PyMongo does have a context manager API but the drawback is that is not possible to have retries with such an API as mentioned above. Retries are not possible because the context block can only ever be run once; a context manager can not re-run the block it contains. This is why we built the with_transaction api that accept a callback that can be executed multiple times if needed. |
I missed that it was actually re-calling the method 😬 , I understand now, thanks for (re)-clarifying. In practice when you just chain a bunch of pure MongoDB operations, using the callback thing is a perfect fit. When used in a more involved application, like a complex web application (which is what we are building on top of MongoEngine at work), I don't think it would be a sane default to just retry a chain of calls, simply failing the overall transaction and API call sounds like a much much safer thing to do (otherwise you must be careful that first execution is not going to leave stuff behind that could interfer with the second execution). Other obvious issue with this automatic re-call, is that pymongo is silently doing that, without even logging the error or the fact that it's retrying 😬 , again makes sense for a chain of pure db operations, but a nightmare to debug in a more complex setup. Long story short, I think it's a must to provide a simple context manager API for opening the transaction in MongoEngine, without any default retry logic. Next to that we can also support the "callback-api" but it's a nice to have IMO. |
I understand this concern but I don't think this is as big of a concern as you say since the primary use case for transaction is a bunch of pure MongoDB operations. Doing other work inside the transaction extends its lifespan and increases the chance of write conflicts. Also, basically all users will need to write retry logic so the same problem still exists in either API.
We are actually working on adding logging into PyMongo right now: https://jira.mongodb.org/browse/PYTHON-3113 The huge benefit to using the callback approach is that it makes applications much more resilient to transient errors (which are very common when using transactions). Users will be much less likely to have to write custom and error prone transaction retry logic. By opting for a context api you'll end up with a lot of users scratching their heads diagnosing transient write conflict errors. I would say that if we do end up adding two APIs, then the callback based one should still be the recommended approach in the docs and examples. The context based one could be a "power user" feature for those who want to customize their own retry behavior. |
transaction_kwargs["callback"] = callback | ||
_set_session(session) | ||
try: | ||
session.with_transaction(**transaction_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyMongo's with_transaction returns the callback's value so I believe this should be return session.with_transaction(...)
. Otherwise it becomes unwieldy to return a value.
One other idea: if you're very concerned about the retry behavior, one alternative is to expose a configurable knob like this: run_in_transaction(callback, retry_enabled=False/True) That would at least let you expose a single txn api instead of 2 different ones. I'm ambivalent but it's worth a thought. |
I'm expecting MongoEngine to be used mostly in web applications and users will be opening their transactions quite early, wrapping all their business logic. In the middle of the chain of calls we may have disk operations, network calls, use of globals/threads/threadlocals, stateful objects, etc. Safe repetition of call is not something you get out of the box so I can't recommend that setup in a general purpose ORM like MongoEngine. That being said, I don't have much experience with transactions in Mongo, is there anything in MongoDB transactions that is different from SQL transactions and that would make them fail more often? We don't see such callback api with repeated calls for dealing with transactions in extremely popular framework like SQLAlchemy or Django, how do you explain this? |
Hi yall. is there any update on this MR? Is there anything I can do to help push this along? |
I was waiting to see if there would be a response to @bagerard 's last question - it seemed to me that we were still waiting on alignment. All that said: @bagerard to keep things moving, I can roll this back to a state that I think reflects more of what you were expecting wrt "simple context manager without retry logic". More than happy to get that done. The test cases also include examples of how a user would have to deal with retrying transaction failures. |
An implementation of one of my proposals, resolves #2248 - Add support for transactions using global sessions
same low-level pymongo collection APIs updated above
transaction