future plans (geo, kafka, wal, map-reduce) #38

glycerine · 2016-11-21T16:40:20Z

I watched @SergejJurecko 's excellent talk on ActorDB from February 2016 here --

vide: https://www.percona.com/resources/videos/actordb-alternative-view-distributed-database

slides: https://www.percona.com/live/data-performance-conference-2016/sites/default/files/slides/ActorDB.pdf

In the future plans slide, there are a number of exciting features discussed:

Geo replication
Kafka like pubsub
WAL for LMDB
Map-reduce (luajit)

Could you comment on the state of these? In particular, I would find (2) and (4) useful, and the kafka-like pubsub very useful.

Depending on the state or progress towards implementation, I could be interested in contributing; though these days I mostly write Go. I've worked with a couple of pub-sub systems in the past, namely mangos (https://github.com/go-mangos/mangos) and NATS (nats.io); I've written my own job distributed job scheduler as well, which is half the work of map-reduce (github.com/glycerine/goq).

SergejJurecko · 2016-11-21T17:56:59Z

At the time of that talk I had a queue implementation that would have been kafka like, but I decided to scrap it for something better.

The last few months I have worked on a general purpose c library that can be used to implement a kafka like server, a WAL for a database or anything else that needs to write to disk really fast in a fifo manner.

It should result in an order of magnitude better performance. Worker threads no longer touch any io directly and do not require mutexes. When an actor writes data to WAL, it is suspended using libctx. Once write is done it gets switched back for processing.

In the meantime worker thread is free to do processing for other actors and they never block for io. There is a dedicated write io thread that uses async io primitives of the local system. There is a pool of threads for blocking operations (like reading from lmdb).

All in all a design that is completely hardware scalable. The more cores or disk io you throw at it the better it will run. Threads no longer get in each others way. Allocations are predictable, thread syncronization is just a lock free queue.

The library works well and it already works with sqlite. What's missing is raft replication for it. For that I will almost certainly use https://github.com/willemt/raft

Once the replication aspect is finalized it will be open sourced.

After that some decisions must be made how the next iteration of actordb will look like. I am leaning towards Rust for the actual fast path. That is receiving requests (thrift/mysql protocol), parsing and actor execution+replication. The higher level stuff of shard balancing would be nice to keep as is. Nothing wrong with using a high level (slower) language for it.

As for the map-reduce bit. It seems to me that apache spark's rdd concept is the way of the future. A rdd implementation as a library (with a c api) seems to me like a better way to go.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

future plans (geo, kafka, wal, map-reduce) #38

future plans (geo, kafka, wal, map-reduce) #38

glycerine commented Nov 21, 2016 •

edited

SergejJurecko commented Nov 21, 2016 •

edited

future plans (geo, kafka, wal, map-reduce) #38

future plans (geo, kafka, wal, map-reduce) #38

Comments

glycerine commented Nov 21, 2016 • edited

SergejJurecko commented Nov 21, 2016 • edited

glycerine commented Nov 21, 2016 •

edited

SergejJurecko commented Nov 21, 2016 •

edited