Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

future plans (geo, kafka, wal, map-reduce) #38

Open
glycerine opened this issue Nov 21, 2016 · 1 comment
Open

future plans (geo, kafka, wal, map-reduce) #38

glycerine opened this issue Nov 21, 2016 · 1 comment

Comments

@glycerine
Copy link

glycerine commented Nov 21, 2016

I watched @SergejJurecko 's excellent talk on ActorDB from February 2016 here --

vide: https://www.percona.com/resources/videos/actordb-alternative-view-distributed-database

slides: https://www.percona.com/live/data-performance-conference-2016/sites/default/files/slides/ActorDB.pdf

In the future plans slide, there are a number of exciting features discussed:

  1. Geo replication

  2. Kafka like pubsub

  3. WAL for LMDB

  4. Map-reduce (luajit)

Could you comment on the state of these? In particular, I would find (2) and (4) useful, and the kafka-like pubsub very useful.

Depending on the state or progress towards implementation, I could be interested in contributing; though these days I mostly write Go. I've worked with a couple of pub-sub systems in the past, namely mangos (https://github.com/go-mangos/mangos) and NATS (nats.io); I've written my own job distributed job scheduler as well, which is half the work of map-reduce (github.com/glycerine/goq).

@SergejJurecko
Copy link
Contributor

SergejJurecko commented Nov 21, 2016

At the time of that talk I had a queue implementation that would have been kafka like, but I decided to scrap it for something better.

The last few months I have worked on a general purpose c library that can be used to implement a kafka like server, a WAL for a database or anything else that needs to write to disk really fast in a fifo manner.

It should result in an order of magnitude better performance. Worker threads no longer touch any io directly and do not require mutexes. When an actor writes data to WAL, it is suspended using libctx. Once write is done it gets switched back for processing.

In the meantime worker thread is free to do processing for other actors and they never block for io. There is a dedicated write io thread that uses async io primitives of the local system. There is a pool of threads for blocking operations (like reading from lmdb).

All in all a design that is completely hardware scalable. The more cores or disk io you throw at it the better it will run. Threads no longer get in each others way. Allocations are predictable, thread syncronization is just a lock free queue.

The library works well and it already works with sqlite. What's missing is raft replication for it. For that I will almost certainly use https://github.com/willemt/raft

Once the replication aspect is finalized it will be open sourced.

After that some decisions must be made how the next iteration of actordb will look like. I am leaning towards Rust for the actual fast path. That is receiving requests (thrift/mysql protocol), parsing and actor execution+replication. The higher level stuff of shard balancing would be nice to keep as is. Nothing wrong with using a high level (slower) language for it.

As for the map-reduce bit. It seems to me that apache spark's rdd concept is the way of the future. A rdd implementation as a library (with a c api) seems to me like a better way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants