Skip to content
This repository has been archived by the owner on Mar 22, 2023. It is now read-only.

FEAT: transactions #810

Open
igchor opened this issue Oct 6, 2020 · 5 comments
Open

FEAT: transactions #810

igchor opened this issue Oct 6, 2020 · 5 comments

Comments

@igchor
Copy link
Contributor

igchor commented Oct 6, 2020

FEAT: transactions

Rationale

This is an alternative solution to batch_put: #708 more similar to what RocksDB offers.

Description

The idea is to expose a transaction class which can be used to group puts (and maybe also removes and updates?) into one atomic action.

API Changes

C++

/* For pmemobj transactions we can only have one tx per 
 * thread but having this as an explicit structure 
 * (instead of relying on tls and closure-like tx is a 
 * more generic solution imho). Also, this is quite similar
 * to RocksDB api.
 */
class transaction { // or class batch
    status put(string_view key, string_view value);

    /* (Optional? what about isolation?) */
    status remove(string_view key);

    status commit();
    void abort();
};

class db {
    transaction begin_transaction(void* options); // TODO: should options be struct or config?
};

C

pmemkv_tx* pmemkv_tx_begin(pmemkv_db *db, void *options);
status pmemkv_tx_put(pmemkv_tx* tx, const char* k, size_t kb, const char* v, size_t vb);
status pmemkv_tx_remove(pmemkv_tx *tx, const char *k, size_t kb); // OPTIONAL
status pmemkv_tx_commit(pmemkv_tx *tx);
void pmemkv_tx_abort(pmemkv_tx *tx);

Implementation details

/* There are several ways we can implement put for the transactions.
 * We can perform in-place updates using undo logs - this is pretty
 * simple for single threaded engines (for concurrent ones we
 * would need to delay modification to commit - once we have all the locks
 * since only then we can start the obj transaction). However this would mean
 * that all non-transactional operations (e.g. from the same thread) would
 * see the non-commited state (which is inconsistent with RocksDB) unless we ban such operations...
 * Moreover in this approach, for non-existent elements allocations are done in a
 * separate transactions (since we cannot call cmap.insert() in a tx) which
 * has performance penalty.
 * 
 * An alternative solution would be to start a obj tx in pmem::kv::transaction constructor.
 * On put, we can allocate appropriate node type and put it in a persistent_tls
 * (which will act as redo log). On pmem::kv::transaction::commit() we first commit
 * the changes to redo log and finish the transaction. After that, we iterate over
 * the redo log and insert the elements in the map (we can also acquire all the locks
 * first but since we do not have a snapshot isolation for reads I believe this is not
 * necessary) using something like insert(node_type&&) which will just replace the pointers.
 * 
 * For certain engines (e.g. csmap) we cannot just allocate the node and replace the existing
 * ones in the datastructure (since we do not have garbage collection). However, we can still
 * allocate the value (pmem::obj::string) and just call insert_or_assing(std::move(key), std::move(value));
 * 
 * This approach means we have more allocations (even if the node already exists we will
 * allocate anyway) but all allocations will be in the same tx so the cost should be
 * acceptable.
 */
transaction::transaction() {
    this->obj_tx = new pmem::obj::transaction::manual();

    persistent_tls[tx.id] = make_persistent()
}

transaction::put(string_view key, string_view value) {
    persistent_tls[tx.id].push_back(make_persistent<map::node_type>(key, value));
}

transaction::commit() {
    pmem::obj::transaction::commit();
    delete this->obj_tx;

    // Optional? We do not need full isolation since we do not have range reads nor
    // snapshot isolation.
    // std::vector<lock> locks;
    // for (auto &e : persistent_tls[tx.id]) {
    //     auto it = map.find(e.key);
    //     if (it != map.end())
    //         locks.push_back(it->second.lock);
    // }

    for (auto &e : persistent_tls[tx.id]) {
        auto ret = map.insert(std::move(e));
        // Optional?
        // if (ret.second) // element inserted
        //   locks.push_back(ret.first->second.lock)
    }

    pmem::obj::transaction::run(pop, [&]{
        delete_persistent(persistent_tls[tx.id]);
    });
}
@karczex
Copy link

karczex commented Oct 7, 2020

I suggest to expand this API with transactional operations inside lambda expression, and get of not yet committed key:

...
auto *kv = new pmem::kv::db() ;
kv->open("enigne", std::move(cfg));
auto s = kv->transaction([&](pmem:::kv::transaction t) {
        t-> put(key, value); //commit when lambda expression ends
        kv->get(key); // returns value as is outside transaction 
        t->get(key); // returns value submitted by transactional put. This may be very useful if you update part of value, and want to abort on some conditions.   
    }); //commit 

@igchor
Copy link
Contributor Author

igchor commented Oct 7, 2020

@karczex one thing to consider for closure transaction is nesting. For current engines I think we should just allow for single tx per thread but we can relax this in future.

When we have an explicit transaction class I would like to be able to do something like this:

auto tx1 = db.begin_transaction();
tx1.put("k1", "v1"):

auto tx2 = db.begin_transaction();
tx2.put("k2", "v2"):

tx2.commit()

// tx1 is not commited yet, "k1" is not visible

tx1.commit();

For closure-like tx we could either do the same thing (e.g. inner tx is independent) or we flatten the transactions like in C++ (so that, the commit is actually delayed to the outer tx commit). I would prefer the first approach but it will not be consistent with libpmemobj-cpp...

@igchor
Copy link
Contributor Author

igchor commented Oct 7, 2020

As we talked on the meeting, we can just ban from using nested transactions in case of lambdas. This is probably the simplest approach.

@pbalcer
Copy link
Member

pbalcer commented Oct 12, 2020

What you should think about is read or read for write type operations for transactions. If those aren't possible due to technical reasons, I'm not sure if there's much of a benefit to having transactional API vs a simple batch put.

@igchor
Copy link
Contributor Author

igchor commented Oct 12, 2020

@pbalcer I was thinking about exposing read/read_for_update via iterators (see read_range in #809). So, there would be no explicit get() method for a transaction but you could mix put's and updates (via iterators) in a single tx. Moreover, with transnational API, we can also mix put's and remove's (we don't have to implement it now, but we can extend that in future).

Anyway, I think the batch_put (in C API at least) would actually look pretty similar. We would have to expose some batch structure (instead of transaction structure) and allow users to call put on that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants