Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction API #2248

Open
bagerard opened this issue Jan 19, 2020 · 22 comments · May be fixed by #2569
Open

Transaction API #2248

bagerard opened this issue Jan 19, 2020 · 22 comments · May be fixed by #2569

Comments

@bagerard
Copy link
Collaborator

bagerard commented Jan 19, 2020

I'd like to open the discussion on the interface that we want to use for supporting the use of transactions/sessions within MongoEngine.

Items to keep in mind:

  • Session needs to be used in all queries (write/update obviously but also any read queries)
  • There can be multiple sessions used concurrently, (e.g: 1 per database, not sure we want to support multiple sessions within 1 database)
  • Need to keep in mind the signals (pre/post_save, etc) that could benefit from this
  • Need to keep in mind the deletion rules (CASCADE, etc) that could benefit from this
  • Need to keep in mind that different databases can be connected to different models (+things like switch_database)

Assuming a test class like

class Person(Document):
    name = StringField()

I see 3 options:

  1. Explicitly pass the session object to each and every queryset/queries
with start_session_with_transaction() as session:
    person = Person()
    person.save(session=session)
    
    Person.objects(name='test', session=session)
    Person.objects.aggregate([a_pipeline], session=session)
    Person.objects().update(name="Updated Test User", session=session)
    Person.objects.insert(Person(), session=session)

Pros:

  • Explicit

Cons:

  • Users are forced to pass the session around in the whole code base
  • Person.objects(name='test', session=session) looks a bit weird
  • Means that we need to account for contradictory cases like Person.objects(name='test', session=session1).update(name='test2', session=session2)
  1. Allow to bind a Queryset to a session with Document.objects().session(session)
with start_session_with_transaction() as session:
    person = Person()
    person.save(session=session)
    
    # Binds a session to a queryset
    qs = Person.objects().session(session)
    qs.filter(name='test')
    qs.count()
    qs.aggregate([your_fancy_pipeline])

    Person.objects().session(session).update(name="Updated Test User")
    Person.objects().session(session).insert(Person())

Pros:

  • more or less explicit

Cons:

  • Users are forced to pass the session around in the whole code base, every class involved in a request needs to be manually bound to the session
  • I find it error prone, its very easy to forget it
  • Person.objects().session(session).insert(Person()) looks quite ugly
  1. Implicitly bind all underlying operations to a session (à la django)
with run_in_transaction(alias="default"):
    person = Person()
    person.save()

    qs = Person.objects()
    (...)

Pros:

  • simple to use
  • compatible with existing code base - i.e: easy adoption by users
  • mimic django (django being an inspiration for this project)
  • less error prone since user only has to define a

Cons:

  • very implicit
  • does not comply with a repository pattern / unit of work approach
  • unclear how we'll manage different databases

Of course, we would provide a decorator in addition to the context manager.

Personally I initially thought that 2) would be the best but I quickly came across a few examples that looked unnatural, so I'd suggest to go for 3)

Let me know what you think! (especially @wojcikstefan @erdenezul but happy to gather more feedbacks)

@angieduan
Copy link

Option 3 looks better.

@xqzhou
Copy link

xqzhou commented Feb 29, 2020

Option 3 looks good

@luking
Copy link

luking commented Mar 16, 2020

I like option 3

@SuCice
Copy link

SuCice commented May 18, 2020

I like option 3 too

@ajdas-code
Copy link

I vote for option 3 too

@ibmboy19
Copy link

Vote for option 3!

@coughlinjake
Copy link

Is there any coding progress? I'm REALLY interested in using MongoDB's transactions.... so interested that I'd offer to code it but I'm still fairly new to Python.

@bagerard
Copy link
Collaborator Author

bagerard commented Aug 7, 2020

Thanks for your interest @coughlinjake so far, I've only do a small proof of concept. I was in fact waiting to have some decisions from the maintainer (ping @wojcikstefan). I'm pretty busy now but if I manage to start at some point on this, I'll check if we can share the workload

@ajdas-code
Copy link

ajdas-code commented Aug 7, 2020 via email

@ajdas-code
Copy link

ajdas-code commented Aug 17, 2020 via email

@rjrakshit24
Copy link

rjrakshit24 commented Nov 30, 2020

Hi @bagerard , I am not sure if I am on right track or not, but I modified the library for transaction support. Following is the snippet for using the session.

k=get_connection()
  with k.start_session() as mongosession:
    with mongosession.start_transaction():
      prevData=getToLogValue(dtype=cls._ParentClass._fields.get('XYZ'),value=p['XYZ'])
      prevData={cls._ParentClass._toForm.get('XYZ'):prevData}
      item.update(session=mongosession, **n)
      newData=getToLogValue(dtype=cls._ParentClass._fields.get('XYZ'),value=n['XYZ'])
      newData={cls._ParentClass._toForm.get('XYZ'):newData}
      itemLog=cls._ItemLog(user=user, action='updated', newData=newData, prevData=prevData, recId=item)
      itemLog.save(session=mongosession)

P.S.:

  • It is not mandatory to provide a session variable when you don't want sequential execution (transaction execution).
  • CASCADE operation is also handled in case of delete
  • Have used for read as well.

@byte-voyager
Copy link

byte-voyager commented Dec 31, 2020

Is there any coding progress? Hope to realize this feature. Thanks

@shellcodesniper
Copy link

option 3 will be gooooooood

mcsimps2 added a commit to mcsimps2/mongoengine that referenced this issue Mar 15, 2021
This allows for the use of the update and modify MongoEngine functions to use a transaction.

Relevant: MongoEngine#2248
@byte-voyager
Copy link

Will this feature be merged?

@amazingguni
Copy link

option 2 looks good for me

mcsimps2 added a commit to mcsimps2/mongoengine that referenced this issue Jun 1, 2021
The session argument in the `save` function allows document insertions and updates within a transaction that go through the `save` method.  Note this does not enable transactions for the `update`, `modify`, and `delete` methods on the Document class.

Relevant: MongoEngine#2248
mcsimps2 added a commit to mcsimps2/mongoengine that referenced this issue Jun 1, 2021
The session attribute is used in _cursor_args to pass a transaction session to PyMongo whenever a queryset is iterated over.  It also replaces the session argument to the update and modify functions.  This means the following functions now support transactions: `insert, update, modify, __iter__, in_bulk`.  `delete` does not support sessions yet.

Relevant: MongoEngine#2248
mcsimps2 added a commit to mcsimps2/mongoengine that referenced this issue Jun 1, 2021
The session attribute is used in _cursor_args to pass a transaction session to PyMongo whenever a queryset is iterated over.  It also replaces the session argument to the update and modify functions.  This means the following functions now support transactions: `insert, update, modify, __iter__, in_bulk`.  `delete` does not support sessions yet.

Relevant: MongoEngine#2248
mcsimps2 added a commit to mcsimps2/mongoengine that referenced this issue Jun 2, 2021
A TransactionSession can be used as a context manager to wrap a series of statements that need to be performed as part of a transaction.  It will automatically commit upon exiting the context successfully or rollback if an error is encountered.  Recommended usage is 
```
with TransactionSesssion() as session: 
  User.objects.session(session).update(...)
```

Related: MongoEngine#2248
@juannyG
Copy link

juannyG commented Aug 30, 2021

I've been playing around with an implementation for option 3, without getting very deep into the guts of mongoengine (yet 😈 ) and I have some interface/usage thoughts/questions based on poking I've done.

Transactions are fine to run across multiple DBs.

from mongoengine import *

conn1 = connect()
db1 = conn1.db1
db1.test1.insert_one({'foo1': 'bar1'})

db2 = conn1.db2
db2.test2.insert_one({'foo2': 'bar2'})


with conn1.start_session() as session:
    with session.start_transaction():
        db1.test1.insert_one({'foo1-1': 'bar1-1'}, session=session)
        db2.test2.insert_one({'foo2-2': 'bar2-2'}, session=session)

The primary go/no-go point is the client instance (aka connection). So if you tried to run a transaction against two collections/databases across two completely different client instances - no good:

from mongoengine import *

conn1 = connect()
db1 = conn1.db1
db1.test1.insert_one({'foo1': 'bar1'})

conn2 = connect(alias='other-default', port=27018)
db2 = conn2.db2
db2.test2.insert_one({'foo2': 'bar2'})

with conn1.start_session() as session:
    with session.start_transaction():
        db1.test1.insert_one({'foo1-1': 'bar1-1'}, session=session)
        db2.test2.insert_one({'foo2-2': 'bar2-2'}, session=session)

InvalidOperation: Can only use session with the MongoClient that started it

So that's good - internally, there are already checks that sessions are "consistent."

Fun fact - this doesn't raise an exception and goes through fine:

with conn1.start_session() as session:
    with session.start_transaction():
        db1.test1.insert_one({'foo1-1': 'bar1-1'}, session=session)
        db2.test2.insert_one({'foo2-2': 'bar2-2'}, session=None)

Implementation samples

Global _session instance

_session = None

def _set_session(session):
    global _session
    _session = session


def _get_session():
    global _session
    return _session


@contextmanager
def run_in_transaction(alias=DEFAULT_CONNECTION_NAME):
    conn = get_connection(alias)
    with conn.start_session() as session:
        with session.start_transaction():
            _set_session(session)
            yield
            _set_session(None)

def disconnect(alias=DEFAULT_CONNECTION_NAME):
    # normal disconnect stuff...
    _session = None

Sessions by alias

_sessions = {}

def _set_session(session, alias=DEFAULT_CONNECTION_NAME):
    global _sessions
    _sessions[alias] = session


def _get_session(alias=DEFAULT_CONNECTION_NAME):
    global _sessions
    return _sessions.get(alias)


@contextmanager
def run_in_transaction(alias=DEFAULT_CONNECTION_NAME):
    conn = get_connection(alias)
    with conn.start_session() as session:
        with session.start_transaction():
            _set_session(session, alias)
            yield
            _set_session(None, alias)


def disconnect(alias=DEFAULT_CONNECTION_NAME):
    # normal disconnect stuff...
    if alias in _sessions:
        del _sessions[alias]

Analysis

From a complexity stand point - I believe a global session would easier to manager, because

  • deeper within the mongoengine machinery, we'll only need to provide pymongo the appropriate session kwarg via session=_get_session()
  • on disconnect we simply set the _session var back to None

Given the cross-client protections pymongo provides, there's really only one situation that I can think of that would warrant maintaining sessions by alias - someone starts a transaction for conn1 and within it calls disconnect(alias="conn2"):

with conn1.start_session() as session:
    with session.start_transaction():
        db1.test1.insert_one({'foo1-1': 'bar1-1'}, session=session)
        disconnect(alias='conn2')
        // ... more db1 session things

But...this seems really weird to me though. My toy example above can be resolved with moving the disconnect out of the context block. The more complex case is the idea that there is a conditional within the transaction that, when met, would require closing the connection to another host. Again - this seems like an odd outlier case that I'm not sure warrants tracking sessions via connection.

If we were to go about tracking sessions via alias I think the primary point of complexity occurs in the lower-level mongoengine API for querysets and documents (and anything else we'd want to make "session conscious") - how does one provide the _get_session(alias=...) method the correct alias value?

While digging around in these lower levels - it does not seem easy to get the alias. That said:

  • I have found that a document has a _get_db method.
  • the pymongo Database object returned has access to the pymongo Client
  • the alias could be retrieved by iterating over the connections and returning the alias for the matching Client

I know this is a lot - and I welcome any and all questions. I've just been thinking about this pretty intensely for the past few days and I wanted to get my ideas out there, so (1) I don't forget all of this and (2) in case anyone has any feedback that might push me in different directions.

@juannyG
Copy link

juannyG commented Sep 5, 2021

I deleted my last comment regrading coupling of databases/connections 🤦 - missed something simple and I went down a rabbit hole - sorry for the noise ❤️

@juannyG juannyG linked a pull request Sep 25, 2021 that will close this issue
@daillouf
Copy link

@juannyG I would just like to push a very pragmatic, almost non technical argument :

  • an ORM is a huge feature, 95 % of all projects will require it
  • of all work done with an orm, quite a lot can be done without transactions, but also, quite a lot of the same work would be done better with transactions, I’d say it’s more than 10 % of use cases that could benefit from it.
  • of all work done with transactions, only a few will need to be done across different mongo connections. It certainly will happen, and it will certainly be asked for. Simply because it’s murphy’s law, anything that can happen will happen, but, it’s probably going to be 10% or less of all work using transactions, and I think even less.

So, while it certainly will be needed in the future, it’s not as important to manage connections as having a transaction API.

You’ll know best if the effort is worth it, I just saw you were wondering about implementing it in your PR, so this is just to say that I think that, yes session-by-connection will be asked for, but it’s a corner case

@ShaneHarvey
Copy link
Contributor

My vote would be for 3) as well but instead of a global session, consider using a ContextVar so that multiple threads can execute transactions concurrently:

from contextvars import ContextVar

_session = ContextVar("_session", default=None)

def _set_session(session):
    if _session.get():
        raise InvalidOperation("nested run_in_transaction calls are not allowed")
    return _session.set(session)

def _get_session():
    return _session.get()
...

You may also want to consider using the ClientSession.with_transaction api because it handles retries automatically.

@vainu-jussi
Copy link
Contributor

vainu-jussi commented Oct 20, 2022

My 2 cents:

Are these options mutually exclusive?

I think you would like have option 2 anyways and it's quite easy to implement. It goes in line with pymongo==mongo interfaces and is a lot less magical. I suggested that this would be done first. If some one would do this would be open to merging it in? As this issue has ben open for almost three years we could just implement the simple one?

Option 3 is the more complex and magical and I can image there are many pitfalls that are not instantly obvious. Accidentially wrapping too much in one session etc.

@vainu-jussi
Copy link
Contributor

One more point. Sometimes you don't need transaction, but just explicit session. Right now it's not possible run long queries with mongoengine at all as cursor will timeout after 30 minutes even with no_cursor_timeout=True.

https://www.mongodb.com/docs/manual/reference/method/cursor.noCursorTimeout/#session-idle-timeout-overrides-nocursortimeout

If a session is idle for longer than 30 minutes, the MongoDB server marks that session as expired and may close it at any time. When the MongoDB server closes the session, it also kills any in-progress operations and open cursors associated with the session. This includes cursors configured with noCursorTimeout() or a maxTimeMS() greater than 30 minutes.

@RempelOliveira
Copy link

Hi everyone, it's already 2024 and this seems to be a forgotten topic, any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.