New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support: Configuration issues with multiple nodes #20
Comments
Answer for Q2
A new node needs only one alive member to discover the entire cluster. This feature is provided by memberlist.
I didn't test this but I think NodeA cannot discover NodeB again in this scenario. We need an automatic peer discovery system here.
Single alive node should be enough for discovery but it's good to maintain a static list of peers, if it's possible. Clearly, we need to have a service discovery integration here. I may work on a possible Consul integration. It's already in my task list for v0.3.x. It looks easy to implement: https://sreeninet.wordpress.com/2016/04/17/service-discovery-with-consul/ |
Answer for Q3
Backup count is controlled by Config.ReplicaCount. It's 1 by default. You feel free to set 2 if you want a replica of every entry in the cache. |
Answer for Q4
Good catch! Actually Olric registers these structs[0] before encoding but the remote side also needs to register the structs before decoding. I missed this detail. I'm going to document this behaviour of Gob. [0] https://github.com/buraksezer/olric/blob/master/serializer/serializer.go#L47 Answer for Q1
What do you think? Would you like to give more details about your deployment scenarios?
A list of domain names or IP addresses of the other nodes. Like this: <node1.olric:3322, node2.olric:3322> Please note that that port number is used by memberlist.
This is a bit confusing. I'm going to prepare a more detailed answer on this topic when I have time. Currently you should use the same values for
I may add more details for Q1. Thank you! |
Thanks for confirming that the peer list is just a bootstrap list, and for suggesting that you might look into Consul service discovery! Very helpful to know.
I tried setting the count to 2 and performing the following steps:
Is there some long interval I need to wait for the async replication? The stats on NodeB don't end up reflecting the same size as NodeA to indicate a replication of the cache item.
This current cluster test isn't 100% realistic of a production deployment since the nodes would run on different hosts. However, sometimes I run a low-configured dev instance alongside a production instance on the same node. So when dealing with the embedded olric configuration, the more unique ports that have to be claimed translate into more port configuration options I have to expose through my app and be aware of to avoid startup clashes. These olric ports are in addition to my primary application tcp port... so its.. a lot of ports :-) |
Is this Name/port broadcasted to the other nodes through the member communication? My reason for asking is, if I didn't need the option of connecting a client to a specific tcp port, or I was happy to look at the log to see the actual port, is it feasible for |
Actually I picked two ports. First one for Olric, it's used for client communication and sync data/state between Olric nodes. It's 3320. Second one for memberlist library. It's 3322. Peer discovery and failure detection system(so it's memberlist) uses that port. Olric only process NodeJoin/NodeLeave events from memberlist library. There is no other business logic here. I don't fully understand that why do you need select random ports for Olric nodes? As my understanding, you only need to use a unique domain name or IP address for Olric nodes in your configuration file.
Yes, it is. It's a hack but I use the following in the tests to get a random port from the operating system. You may want to use it to generate random ports before create a configuration for an Olric node. func getRandomPort() (string, error) {
addr, err := net.ResolveTCPAddr("tcp", "127.0.0.1:0")
if err != nil {
return "", err
}
l, err := net.ListenTCP("tcp", addr)
if err != nil {
return "", err
}
defer l.Close()
hostport := l.Addr().String()
_, port, err := net.SplitHostPort(hostport)
if err != nil {
return "", err
}
return port, nil
} I imagine something like that: port, err := getRandomPort()
// handle err
config.Config{
Name: hostname + ":" + port, This is how I implement integration tests to test multiple Olric nodes: https://github.com/buraksezer/olric/blob/master/dmap_test.go#L174 Note: I'm going to write something more detailed during the day. It's 08:00 AM in Istanbul :) |
If you want to run to applications containing Olric, on the same host, with the default settings one of them will fail with port conflicts. In order to avoid port conflicts, one must uniquely set 2 different ports. Actualy I ended up using the same random port selection logic before I saw your example. Because I don't neccessarily need external client access to olric, and the nodes all broadcast the port around anyways, I just default the client port to random. |
There may be two different answers for that question: 1- memberlist can be tuned to be more responsive against network events. I prefer the use the defaults. The following lines from memberlist/config.go, DefaultLANConfig function: TCPTimeout: 10 * time.Second, // Timeout after 10 seconds
IndirectChecks: 3, // Use 3 nodes for the indirect ping
RetransmitMult: 4, // Retransmit a message 4 * log(N+1) nodes
SuspicionMult: 4, // Suspect a node for 4 * log(N+1) * Interval
SuspicionMaxTimeoutMult: 6, // For 10k nodes this will give a max timeout of 120 seconds
PushPullInterval: 30 * time.Second, // Low frequency
ProbeTimeout: 500 * time.Millisecond, // Reasonable RTT time for LAN
ProbeInterval: 1 * time.Second, // Failure check every second
DisableTcpPings: false, // TCP pings are safe, even with mixed versions
AwarenessMaxMultiplier: 8, // Probe interval backs off to 8 seconds
GossipNodes: 3, // Gossip to 3 nodes
GossipInterval: 200 * time.Millisecond, // Gossip more rapidly
GossipToTheDeadTime: 30 * time.Second, // Same as push/pull
GossipVerifyIncoming: true,
GossipVerifyOutgoing: true, These variables is also used by HashiCorp's Serf tool. From the documentation:
2- When a node goes down, Olric doesn't try to create a new copy of the data from the primary copy. Instead of that, it applies a technique called read-repair. When you call get on a key, if the backups are out-of-sync or doesn't have the key, the primary owner sends the recent key/value pair to its replicas. I tried to explain it in the readme. https://github.com/buraksezer/olric#read-repair-on-dmaps
Size may be different but the key/value pair is available, it's okay. This is a subtle and harmless "bug" in the storage engine. |
Ok, |
I think a clarification is needed. When a node goes down, Olric doesn't make a new copy for missing replicas immediately. This kind of inconsistencies is covered under an umbrella term called entropy. There are different approaches to decrease entropy in a distributed system. Olric currently implements quorum-based replica control and read-repair techniques to keep entropy low. A merkle-tree based anti-entropy system is planned to make a full scan and repair. So when a node goes down, the replica count for keys decreases. That's clear. Olric doesn't make a full-sync immediately. Instead of that, when you call put or get functions on the keys, Olric ensures that the replication factor is satisfied and the most recent version of the key/value pair is replicated over the cluster. Please take look at this scenario:
I couldn't find an optimal way to make full-sync of backup partitions in the case of network partitioning. We may try to find a way if it's vital for you. |
Thank you for this detailed explanation. I thought I had confirmed a kind of automatic value replication when I alternated between taking down NodeA and NodeB and confirmed I would still get a cache hit on the other node. It seemed with Repair enabled that it was keeping a copy on both Nodes. |
I couldn't find an optimal way of how a node decides to make a full replica sync. In ReadRepair, keys are replicated and updated on the backup nodes, if it's needed. And read-repair is triggered by a I'm not willing to increase coordination level between the cluster members. But we may still work on a new way to handle it. |
Hey @justinfx, I hope you are well. I have been working on Kubernetes integration since a couple of weeks. It seems to work fine. You may want to check out this: https://github.com/buraksezer/olric#kubernetes |
@buraksezer , very cool! I've been doing a lot of k8s investigation lately. It's great to see that the new plugins are continuing that pattern of providing both a plugin and library interface. My primary integration of olric is embedded into a distributed app. So I imagine I will need the k8s discovery plugin for that. |
@justinfx thank you! olric-cloud-plugin will provide a service discovery intervace for k8s and many other cloud providers including Amazon AWS, Google Cloud and Microsoft Azure. It uses hashicorp/go-discover under the hood. You can use the plugin in your distributed app. It's designed for this purpose. I'm expanding the documentation these days. I'll test and add olric-nats-plugin in a couple of days. Keep in touch! |
Hi @justinfx It has been for a long time since our last contact. I saw your recent PR to the olric-nats-plugin repository. I wonder whether you use Olric at production? |
Hi @buraksezer. Well I had that one project where I introduced Olric and set up a helm deployment for kubernetes. But then we ran into some unrelated infrastructure problems with putting it on kubernetes, so I ended up shelving the new version. There is probably still a way to deploy this to our older setup, so I could possibly revisit it again soon. But I do still want to use Olric in production and I have a new project that will need a distributed cache in front of databases accesses. Do you have any production success stories from other projects yet? Interested to know where it is in serious use. |
We have a few users. One of the bigs is IBM Cloud. See #82 and #106 for details. The other one is LilithGames. They are building a proxy tool that uses Olric but I don't have any idea about the current status of the project: https://github.com/LilithGames/spiracle Open source projects by indie developers and communities:
Some of them are probably inactive. There may be unknown users. Olric has ~300 clones per day. |
Thanks for that info. Congrats on receiving so much activity! Well if I end up using Olric on this new project then I am sure you will hear from me. I'm hoping to use it as an embedded cache to the application nodes, instead of the previous version that used varnish in front of the application nodes. |
I have a number questions about the Config, now that I actually have Olric embedded in my application and working, and am spinning up 2 local instances to form a cluster experiment.
Question 1
I've found (just as you documented) that the memberlist configuration is complicated. There are a lot of ports that need to be allocated for each instance in order to not conflict with another instance and also so the instances can actually connect into a cluster. For each instance on the same host I have to set:
Config.Name
to something like0.0.0.0:<unique port>
Config.MemberlistConfig.BindPort
to a unique portConfig.Peers
to point at the unique bind port of the other instanceDo I need to worry about
Config.MemberlistConfig.AdvertisePort
? It seems to not conflict when I dont adjust it on either host. But it WILL conflict and error if I am not using peers and two node instances on the same host use the sameAdvertisePort
.What is the difference between the Name port and the Bind port, and do I really need to make sure to have 2 unique ports per instance?
Question 2
For
Peers
, is that just a seed list where I only have to point it at one other node for the new node to connect to the entire cluster? Or does every node need a complete list of every other node, like a memcached configuration?When I start NodeA, I may not have started it with a known peer list. Then I start NodeB with a Peer list pointing at NodeA (great!). When I start NodeA up again, I need to make sure to point it at NodeB (I think?).
I'm hoping I only need a single other seed node in the peer list. My goal would be to somehow point the peer list at a single service discovery endpoint to find another node in the active cluster. Otherwise it would be more difficult to dynamically scale the service if you always need to know the address of the other nodes when you scale.
Aside from the seed node question, the node discovery problem may be outside the scope of Olric. Likely would I would end up doing is have each Olric node register itself with Consul. And then it would check Consul for one other node to use as its peer list.
Question 3
When I start local NodeA and NodeB, I can confirm that an item cached on NodeA can be retrieved by NodeB. However I can't seem to find the configuration option that gets NodeA to actually back up the item to NodeB. That is, NodeB isn't storing the cached item, so when NodeA goes down, the next request to NodeB is a cache miss. What is the combination of config options that will result in backing up the cached item to at least one other node?
Question 4
When I cache an item into NodeA, and then retrieve it for the first time in NodeB, the gob encoder in NodeB spits out an error
gob: name not registered for interface: "*mylib.MyCachedItem"
, which results in a cache miss. When it then fills the cache, subsequent requests will now work correctly since the gob encoder knows the type. To fix this, I had to add this to my application:Should this be documented somewhere in Olric? The fact that it uses gob is kind of an implementation detail.
Thanks for the support!
The text was updated successfully, but these errors were encountered: