Skip to content

EP07 Database transaction API

Paul Haesler edited this page Oct 19, 2022 · 11 revisions

ODC-EP 07 - Database transaction API

Overview

Database transactions are currently handled internally to Index Drivers and not exposed to higher architectural levels.

This EP proposes an API to allow user-managed transactions. The API consists solely of new methods, and new optional arguments to existing methods where the default (i.e. not supplying the new arguments) corresponds to existing behaviour. This proposal is therefore fully backwards compatible.

This proposal is a more detailed writeup of an enhancement already proposed in the ODCv2 Road Map.

Proposed By

Paul Haesler (@SpacemanPaul)

State

  • Under Discussion
  • In Progress
  • Completed
  • Rejected
  • Deferred

Motivation

Currently, high-level indexing tools (e.g. in odc-tools) cannot make use of database transactions to ensure index consistency and have limited efficiency of bulk indexing because each dataset must be indexed in a separate database transaction.

Proposal

  1. A new abstract Transaction class.

    • Has abstract methods begin_transaction, commit, and rollback (intended for internal use - use as context manager preferred).
    • SHALL implement ContextHandler semantics (i.e. __enter__ and __exit__ functions.)
    • SHALL carry awareness of whether the transaction it represents is active, committed or rolledback.
    • SHALL carry methods to construct special exceptions that will be caught in __exit__ and trigger a commit or rollback: commit_exception and rollback_exception.
    • Implemented for postgres and postgis drivers (as wrappers around the existing PostgresDbAPI/PostgisDbAPI classes)
  2. A new abstract method on AbstractIndex: transaction()

    • Returns a Transaction context handler with an active transaction.
    • Implemented for postgres and postgis drivers
    • Raises an exception on Index Drivers that do not support transactions.
    • A link to the Transaction is maintained in thread-local storage by the index driver.
    • If there is an active transaction in thread-local storage, then all database access through that index driver from that thread is performed within the context of that transaction. (There is already a helper method) Code Examples:
# Context manager
with dc.index.transaction() as trans:
   # Archive old datasets and add new ones in single transaction
   dc.index.datasets.archive([old_ds1.id, old_ds2.id], transaction=trans)
   dc.index.datasets.add(ds1)
   dc.index.datasets.add(ds2)

   # If execution gets to here, the transaction is committed.
   # If an exception was raised by any of the above methods, the transaction is rolled back.
# Handling commits and rollbacks with exceptions
with dc.index.transaction() as trans:
   # Archive old datasets and add new ones in single transaction
   dc.index.datasets.archive([old_ds1.id, old_ds2.id])
   dc.index.datasets.add(ds1)
   if dont_need_ds2(ds1, ds2):
       # Commit immediately and exit context handler.
       raise trans.commit_exception("Shortcut commit")
   elif problematic(ds1, ds2):
       # Rollback immediately and exit context handler
       raise trans.rollback_exception("Shortcut rollback")
   elif really_problematic(ds1, ds2):
       # Other exceptions trigger rollback as well, but
       # are not caught by `__exit__` and so are propagated on
       # up the stack.
       raise ValueError("Uncaught exceptions trigger rollback too.")

   dc.index.datasets.add(ds2)
   # If execution gets to here, the transaction is committed.
# Manual low-level transaction management (not recommended, but supported)
trans = dc.index.transaction()
# Transaction is activated by ContextManager semantics - No `with` means we need to call begin manually.
trans.begin()
dc.index.datasets.update(ds1)
dc.index.datasets.update(ds2)
trans.rollback()
# ds1 and ds2 are NOT updated in the database

# No active thread-local transaction, so update() manages it's own internal transaction.
# (ds3 IS updated in the database)
dc.index.datasets.update(ds3)

# Manually start a new transaction:
trans.begin()
dc.index.datasets.update(ds4)
dc.index.datasets.update(ds5)
trans.commit()

# ds4 and ds5 ARE updated in the database
# Transaction is now inactive. No further cleanup required.

Feedback

  • Robert Woodcock (@woodcockr)

    • This looks good to me. Straightforward context and if need be manual handling. Provides the opportunity for the transaction benefit. Further optimisation can be performed through incremental changes to other code. I'm not overly familiar with all the database code in ODC these days but I am familiar with the impact of not having the transaction handling. Get's my vote.
  • @Kirill888 suggests using thread-local storage instead of the user managing the Transaction manually.

    • merged

Voting

Approved (with @Kirill888's amendment, now merged into this EP) in Steering Council meeting September 2022.

Enhancement Proposal Team

  • Paul Haesler (@SpacemanPaul)

Links

Clone this wiki locally