Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ActorDB-geo #35

Open
spog opened this issue Nov 15, 2016 · 7 comments
Open

ActorDB-geo #35

spog opened this issue Nov 15, 2016 · 7 comments

Comments

@spog
Copy link

spog commented Nov 15, 2016

While exploring your impressive project, I dared to think about a potential geo-replication scenario in ActorDB.

Basic idea is to have all data available at all locations:

  • All nodes active/writable
  • Maybe “eventualy” consistent locations
  • Potentially via per cluster geo-replication (see attached picture)
    actordb-geo

I thought such geo-scenario might work for ActorDB, since a particular actor (i.e. a mobile user) typically operates from one location at the time. However groups of mobile users might be active at different locations simultaneously.

I am more a system level programmer and I do not have a lot of knowledge regarding databases, so please excuse me if you find this post irrelevant.

@SergejJurecko
Copy link
Contributor

Geo replication where all locations are active and serving data can be achieved by spreading out the individual clusters. So if 3 nodes in a cluster, place each in a separate location. This will of course mean writes will have a larger latency. But if you send queries to nodes where the user is closest, reads will be generally quick (unless safe flag is used, then it will have to verify with other nodes).

What you have in mind is another layer on top lets say locations. So we would have: nodes -> clusters -> locations

Some clusters would be active in one location and other locations would just be backups for that cluster. We were thinking about something similar.

The main question is what happens when a location goes offline. To make it work seamless is quite a difficult task. It also requires clients to have location reconnect logic in them. Writes must also replicate across at least two locations (so latency).

Eventually consistent locations mean when one goes offline and you get some writes to a new location, you may have just thrown away some writes from that offline location. You told the client write was complete, but if it has not managed to replicate to another location it may still disappear.

You either sacrifice latency or you sacrifice consistency. I guess the best option would be to allow both scenarios and leave it up to the user.

@spog
Copy link
Author

spog commented Nov 16, 2016

Thank you very much for your response.

Would it help adding internal location information:

  1. To each node of a cluster in case of the currently possible scenario of spreading 3 node clusters over 3 locations to automate using closest node for optimised reads?
  2. To each cluster (same for all nodes of a cluster) in case of additional "locations" layer to help automate client reconnection logic?

A bold suggestion for location info:

  • location ID (i.e. 1, 2, 3)
  • connection priority order (i.e.: if client's closest is location 1, then next closest is 3 and next closest is 2, ... for all locations)

And regarding locations consistency I agree that this requirement is highly application dependent.

thanks again, Samo

@SergejJurecko
Copy link
Contributor

One relevant point for this discussion is that if you have a 3 node cluster and all your client connections are to one node, replication leaders for actors will mostly live on that node. Since we use raft any node can either be a leader (executes SQL, reads and writes) or a follower (passively receives data from leader on writes).

@spog
Copy link
Author

spog commented Nov 16, 2016

... so nodes would not be naturally load balanced?

@SergejJurecko
Copy link
Contributor

The other option would mean 2/3 of queries would be proxied by the server you are talking to. Thus increasing latency and have a bigger chance of something failing.

@catroot
Copy link

catroot commented Jan 9, 2017

To support cross DC replication we need extend the stripe of clusters with parity. Not looking to extra network latency, DC can have good uplinks and writes may be not very intensive and massive. There are another solutions for realtime data processing.

I like idea of DC redundancy - and here is my point of view.
As i can see, in order to achieve a true DC redundancy and failover ActorDB need to support 1 thing:

  • Ability to rebalance shards
    (automated after configurable timeout to wait the lost cluster goes alive and manually forced - e.q. changing redundancy level or kick off a whole cluster if needed - e.q. 7 clusters setup can be shrinked to 5 clusters or 6 in case if N-2 redundancy will be required or something like this) This mean sightly different logic - the setup will be 3 cluster by 3 nodes per cluster distributed across different DC can deliver redundancy level enought to survive in case of failure at 1 DC + 1 node per alive DC. clusters must store data with parity like raid5 does, nodes operates in mirror mode as it is. if setup has only 2 clusters - the data will be simply striped between them, as it is for now.

With this we can change setup config on the fly - adding/removing clusters, their nodes and manipulate redundancy level from dangerous stripe to the N - any_level, achieving read speed improvement with required level of safety.

@SergejJurecko
Copy link
Contributor

Thank you for the information. We will keep it in mind for future versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants