Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why bucky servers command say Number of replicas: 100??? #37

Open
tantra35 opened this issue Sep 18, 2021 · 1 comment
Open

why bucky servers command say Number of replicas: 100??? #37

tantra35 opened this issue Sep 18, 2021 · 1 comment

Comments

@tantra35
Copy link

tantra35 commented Sep 18, 2021

Bucky tools declares tha it works only with cluster with repolication 1, but when we invoke bucky servers, we see discouraging output:

vagrant@123124234:~/builddocker$ ./bucky servers -h carbon-a:2023
Buckd daemons are using port: 2023
Hashing algorithm: [carbon: 3 nodes, 100 replicas, 300 ring members carbon-a:2023=a carbon-b:2023=b carbon-c:2023=c]
Number of replicas: 100
Found these servers:
        carbon-a
        carbon-b
        carbon-c.

Is cluster healthy: true

But why 100? In source code we see that When construction any hashring replication factor put to 100, and never changes. For example for carbon hashring https://github.com/jjneely/buckytools/blob/master/hashing/hashing.go#L84-L92

// NewCarbonHashRing sets up a new CarbonHashRing and returns it.
func NewCarbonHashRing() *CarbonHashRing {
	var chr = new(CarbonHashRing)
	chr.ring = make([]RingEntry, 0, 10)
	chr.nodes = make([]Node, 0, 10)
	chr.replicas = 100  // is this bug?

	return chr
}

and SetReplicas never called, but why?

@deniszh
Copy link
Contributor

deniszh commented Sep 18, 2021

Hi @tantra35

It's replicas in the hash ring, not carbon replicas. This code just mimicking Python code from carbon - https://github.com/graphite-project/carbon/blob/9fad18df5731271aab6f5c81d32eddcecdc1a695/lib/carbon/hashing.py#L57

Unfortunately buckytools not supporting replication factor >1, even in go-graphite fork we didn't fix this issue. In reality having 2 identical clusters with RF=1 is much easier to operate then single cluster with RF=2. See e.g. "Improving the backend" in https://grafana.com/blog/2019/03/21/how-booking.com-handles-millions-of-metrics-per-second-with-graphite/ :

One resilient and failure-safe approach to storing data for a backend is Replication Factor 2. However, the backend tools the team was using to do the operational work on Graphite didn’t work with Replication Factor 2. They experimented with using Replication Factor 1, sending it twice to split the server fleet manually into two equal parts and sending it out to different parts.
In order to choose which approach to use, they created a replication factor test to calculate the potential for data loss in case of server failure. For a group of eight servers, the team found that with Replication Factor 2, you lose a smaller amount of data than with Replication Factor 1. But when two servers fail with Replication Factor 2, there will always be a small percentage of data that is definitely not available. With Replication Factor 1, the probability that data is lost when two servers fail is only 15%. The team opted for using Replication Factor 1 in two different sets of servers to reduce the probability of losing data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants