New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
too few peers are available to reach consensus: 0 of 1 #35
Comments
This SPOF behavior only occurs if I do not enable caching as a argument to my volume mount command. Thus, the trade-off to enable caching seems to be HA with slower propagation of file changes vs. a SPOF and near synchronous file change propagation. |
Hi @tarlano ! Sorry it took me so long to reply. I just asked @mefyl about the consensus, he's the expert. However, I'm concerned with the last error (the handshake). When you say |
Hi @tarlano. Here's my suspicion from what I see: your replication factor might be 1, in which case every block only has one copy, and shutting down the node that has the root block will indeed make the network unusable. Can you please check with
Regarding the fact it works with caching, I suppose you're starting the volume, it works, then shut down the node with the root block, and it keeps working ? That would be normal, since the other nodes cached the root block, so the volume will be up until the cache is invalidated. If you start both other nodes without ever giving them access to the third with the root block, I predict it would not work. But those are just guesses, let's first check that replication factor :) |
Sorry for the late reply. I am not copying over infinit home and I am also not using the hub. I currently have a 17 node cluster using kouncil and a caching setup. The current caching setup is working with a replication value of 3, but since I would like the changes to be more synchronous, I will configure another cluster to use kelips and no caching, later on today to get the information you asked for. Here are the details of my current caching cluster. I am deploying four exported files via a deployment RPM. The exported files are for the user, silo, network, and volume respectively. Then I import the four files with a post install bash script in the RPM. All devices are client/server devices. And all devices have the same user that is an admin with read+write to get around any passport issues.
Here is the infinit.silo that I import on each device.
Here is the infinit.volume that I import on each device.
After the deployment RPM is installed I start the node with the following config
Note the peer file (peeraddresses.conf), from the --peer argument, is generated using consul template and contains all the IP addresses of all the other devices/peers in the network cluster. Here is an example with my IP's obfuscated.
The endpoints file (endpoints) is just the local devices endpoints. Here is the output of the export of one of the running devices network. As you can see the
Thanks for your help! |
I want to update you both. I didn't have the chance to switch the configuration, but even with the caching config I wrote about previously, I am still seeing the following error message in the error log for the
The interesting thing about Have you seen anything along these lines in the past? Tony |
Hello,
I have a three device cluster. Each device has its own filesystem silo and joins to a single network that I created with kelips.
The volume creation and mount on all devices succeed, and I can write/read from all three devices to the fuse directory and I see the files in each filesystem. I can read and write to the files on any of the devices and see them on the others.
The issue is if I stop the device that originally did the volume mount with --allow-root-creation then I can't see the files on the other two devices.
I turned on debugging and I see the following messages
Can someone tell me how to get all three devices in the consensus? From the error message it seems as if only the device that created the root block is in the quorum.
Additionally I did see the following failure message, before stopping the root block creating device, but everything seemed to work properly from the replication perspective.
The version is
$ infinit --version
0.8.0
The text was updated successfully, but these errors were encountered: