Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use buckytools to rebalance a fnv1a_ch cluster? #17

Closed
mwtzzz-zz opened this issue Jun 26, 2017 · 62 comments
Closed

Can I use buckytools to rebalance a fnv1a_ch cluster? #17

mwtzzz-zz opened this issue Jun 26, 2017 · 62 comments

Comments

@mwtzzz-zz
Copy link

The developers at carbon-c-relay mentioned that I could use this to rebalance a fnv1a_ch hashring. But when i run buckyd, I get the following message:
[root@ec2-xxx radar123 bin]$ setuidgid uuu ./buckyd -node ec2-xxx.compute-1.amazonaws.com -hash fnv1a_ch 2017/06/25 22:08:54 Invalide hash type. Supported types: [carbon jump_fnv1a]

Does buckytools support this type of hash? If not, do you know of how I can rebalance my cluster upon adding a new cache host?

@jjneely
Copy link
Owner

jjneely commented Jun 27, 2017

I presently support the carbon_ch and jump_fnv1a hashing algorithms in carbon-c-relay speak. Normally, yeah, this would definitely be the tool to use but I don't have that hash type implemented.

However, it probably wouldn't take much code if you are interested. I've already got the fnv1a hashing function coded in and working as part of the jump hash. You'd just need to implement a hashing.HashRing interface and plugging it into the buckyd and bucky commands should be fairly straight forward.

Glad to give you a hand to get some patches.

@mwtzzz-zz
Copy link
Author

I'm definitely interested. I've got a 12 node cache cluster that's kind of hitting the ceiling in terms of of a throughput bottleneck. I need to add a 13th node, but I can't do it unless I can rebalance the cluster.

Do you need anything from me in terms of putting together a patch?

@mwtzzz-zz
Copy link
Author

I'd like to gently encourage some progress on this .... We just got notice that one of the instances in our relay cluster is scheduled for termination in the next week. We've got to get the data off of there and onto a new node. Hopefully I can do this with buckytools if its ready for that time. Otherwise I'm going to have to use rsync or some similar brute force method and cut over to the new host when most or all of that data has been transfered.

@grobian
Copy link
Contributor

grobian commented Jul 12, 2017

I wasn't aware of this bug, I did a try in PR #18

@mwtzzz-zz
Copy link
Author

I'll test this PR ...

@mwtzzz-zz
Copy link
Author

@grobian your patch didn't work for me. When I run something like bucky list it fails to retrieve anything.

@jjneely
Copy link
Owner

jjneely commented Jul 28, 2017

Sorry for the delay here....I should be able to spend some time here this coming week. Although, I know that cuts it close for that EC2 termination.

@mwtzzz-zz
Copy link
Author

@jjneely I already completed the migration, but I definitely still need buckytools to support fnv1a_ch. For two reasons: (a) there's duplicate metrics spread around the cluster which need to be consolidated, and (b) if we ever need to scale horizontally I need to be able to add more hosts and rebalance the cluster.

@grobian
Copy link
Contributor

grobian commented Aug 11, 2017

@mwtzzz can you explain exactly how you setup bucky? What I did to test this in a very simple manner.

on the server:

buckyd -node <graphite1> -prefix <path/to/carbon/whisper> -hash fnv1a -b :5678 <graphite1>

then from the client

env BUCKYHOST="<graphite1>:5678" bucky du -r '<tld>'

that returned on the client something like:

2017/08/11 15:41:15 Results from nut:5678 not available. Sleeping.
2017/08/11 15:41:15 Results from nut:5678 not available. Sleeping.
2017/08/11 15:41:16 nut:5678 returned 350 metrics
2017/08/11 15:41:16 Progress: 100/350 28.57%
2017/08/11 15:41:16 Progress: 200/350 57.14%
2017/08/11 15:41:16 Progress: 300/350 85.71%
2017/08/11 15:41:16 Du operation complete.
2017/08/11 15:41:16 912254000 Bytes
2017/08/11 15:41:16 869.99 MiB
2017/08/11 15:41:16 0.85 GiB

Does something like this work for you at all? I admit I don't fully understand the hostnames and how they are used by bucky, but it looks as if buckyd tells bucky where to connect to, so ensure buckyd has a correct list of hostnames for the hash-ring hosts.

@jjneely
Copy link
Owner

jjneely commented Aug 17, 2017

FNV1a support is now merged in as 0.4.0. Bug reports appreciated. Also note the change in how hashrings are specified by a list of SERVER[:PORT][=INSTANCE] strings.

@jjneely jjneely closed this as completed Aug 17, 2017
@mwtzzz-zz
Copy link
Author

Ah! ... Just getting around to seeing this (I got pulled away on other stuff at work)... Sorry I missed this earlier. Let me take a look at it today.

@mwtzzz-zz
Copy link
Author

mwtzzz-zz commented Oct 3, 2017

-bash-4.2$ ./bucky servers
Buckd daemons are using port: 4242
Hashing algorithm: [fnv1a: 12 nodes, 100 replicas, 1200 ring members radar-be-a:1905=a radar-be-b:1905=b ...]
Number of replicas: 100
Found these servers:
        radar-be-a
        radar-be-b
        ...
Is cluster healthy: false
2017/10/03 15:32:17 Cluster is inconsistent.

If I run any other command (list, inconsistent, etc), the following appears:

-bash-4.2$ ./bucky inconsistent
2017/10/03 15:34:35 Warning: Cluster is not healthy!
2017/10/03 15:34:35 Results from radar-be-c:4242 not available. Sleeping.
2017/10/03 15:34:35 Results from radar-be-k:4242 not available. Sleeping.

@mwtzzz-zz
Copy link
Author

mwtzzz-zz commented Oct 3, 2017

I see file descriptor 5 for buckyd process is iterating through the whisper files.... Perhaps it just takes time before buckyd has results ready?

on a different note, the bucky du -r command is working:

-bash-4.2$ ./bucky du -r '^niseidx\.adgroups\.multiplier\.1856660\.'
2017/10/03 15:53:11 radar-be-c:4242 returned 2 metrics
2017/10/03 15:53:12 radar-be-b:4242 returned 1 metrics
2017/10/03 15:53:12 radar-be-d:4242 returned 0 metrics
2017/10/03 15:53:12 radar-be-k:4242 returned 0 metrics
2017/10/03 15:53:12 radar-be-l:4242 returned 1 metrics
2017/10/03 15:53:13 radar-be-f:4242 returned 1 metrics
2017/10/03 15:53:13 radar-be-e:4242 returned 0 metrics
2017/10/03 15:53:13 radar-be-h:4242 returned 1 metrics
2017/10/03 15:53:13 radar-be-a:4242 returned 2 metrics
2017/10/03 15:53:13 radar-be-i:4242 returned 1 metrics
2017/10/03 15:53:13 radar-be-g:4242 returned 1 metrics
2017/10/03 15:53:13 radar-be-j:4242 returned 2 metrics
2017/10/03 15:53:13 Du operation complete.
2017/10/03 15:53:13 645600 Bytes
2017/10/03 15:53:13 0.62 MiB
2017/10/03 15:53:13 0.00 GiB

@jjneely
Copy link
Owner

jjneely commented Oct 3, 2017

If you are asking about the sleeping bits, yes, it takes a bit for buckyd to build a cache. The bucky CLI will wait for them.

@deniszh
Copy link
Contributor

deniszh commented Oct 3, 2017

@mwtzzz : could you please share your relay config and buckyd command line options?

@mwtzzz-zz
Copy link
Author

-bash-4.2$ ./bucky inconsistent -v -h radar-be-c:4242 -s
2017/10/03 16:06:01 Warning: Cluster is not healthy!
2017/10/03 16:06:54 radar-be-c.mgmt.xad.com:4242 returned 12102076 metrics
Killed

This Killed is occuring automatically after a minute or so.

@mwtzzz-zz
Copy link
Author

mwtzzz-zz commented Oct 3, 2017

@deniszh My relay config looks like this with 12 nodes:

cluster radar-be
  fnv1a_ch
    radar-be-a:1905=a
    radar-be-b:1905=b
    ...
  ;

My buckyd command line (which I am running identically on each of the 12 hosts) looks like this (notice I'm not using -n):

/usr/local/src/buckyd -hash fnv1a -p /media/ephemeral0/carbon/storage/whisper radar-be-a:1905=a radar-be-b:1905=b ... <12 hosts total>

@deniszh
Copy link
Contributor

deniszh commented Oct 3, 2017

Why not using -n?

@mwtzzz-zz
Copy link
Author

mwtzzz-zz commented Oct 3, 2017

@deniszh mostly because it's more work for me to include it and I thought it was optional (?). But if it's necessary, I'll definitely include it.... Should I put it in?

@mwtzzz-zz
Copy link
Author

ah my bad. just noticed the documentation, -n defaults to whatever hostname -I says, which is not what I want .... I'll restart all the daemons with the right -n

@mwtzzz-zz
Copy link
Author

ok, this is looking much better. bucky servers now shows a healthy cluster.... I'll keep playing with the commands, I'm going to see if there's any inconsistencies and try to fix them. Assuming it's working, then I will add a 13th node to the cluster and do a rebalance.

@azhiltsov
Copy link

Be aware of #19 @mwtzzz

@mwtzzz-zz
Copy link
Author

mwtzzz-zz commented Oct 3, 2017

@azhiltsov I'm assuming the rebalance would make use of bucky-fill at some point, which could possibly corrupt some of my archive sums? .... It looks like Civil made a PR with his fix, I might just go ahead and merge that into my local copy.

@mwtzzz-zz
Copy link
Author

mwtzzz-zz commented Oct 4, 2017

I'm having an issue running bucky inconsistent. After several minutes it shows "Killed":

-bash-4.2$ ./bucky inconsistent -h radar-be-c:4242 
Killed

The output from the command shows a bunch of "Results from radar-be-x not available. Sleeping", then shows a metrics count for only six of the twelve nodes:

2017/10/04 07:17:42 radar-be-c:4242 returned 12139223 metrics
2017/10/04 07:21:34 radar-be-b:4242 returned 14213010 metrics
2017/10/04 07:23:49 radar-be-k:4242 returned 15511087 metrics
2017/10/04 07:23:54 radar-be-l:4242 returned 14627005 metrics
2017/10/04 07:23:57 radar-be-e:4242 returned 15568388 metrics
2017/10/04 07:25:03 radar-be-d:4242 returned 14510385 metrics

The buckyd log file doesn't show much, some "get /metrics" and:

2017/10/04 06:55:19 172.17.33.75:59447 - - GET /metrics
2017/10/04 06:55:19 Scaning /media/ephemeral0/carbon/storage/whisper for metrics...
2017/10/04 07:16:02 172.17.33.75:18013 - - GET /metrics
2017/10/04 07:16:56 Scan complete.
2017/10/04 07:16:57 172.17.33.75:18259 - - GET /metrics

I ran it a second time. This time only 4 of the nodes returned metrics before "Killed."

@deniszh
Copy link
Contributor

deniszh commented Oct 4, 2017

Check your syslog. It looks like OOM killer, so bucky consumes too much memory, which is totally possible for 12-15 mln metrics x 12 nodes....

@jjneely
Copy link
Owner

jjneely commented Oct 4, 2017

Was about to write the same thing. The client you are running the bucky CLI on doesn't have enough memory.

@mwtzzz-zz
Copy link
Author

mwtzzz-zz commented Oct 4, 2017

Ah, good suggestion. Indeed it was oom-killer that nuked it. It looks like bucky was consuming 20G+ of RAM on this 64GB system.

increasing the RAM on these instances is not an option. Do you have any suggestions on how I can get it to work? Does bucky really need 20G+ of RAM?

@deniszh
Copy link
Contributor

deniszh commented Oct 4, 2017

Spawn another instance with enough ram? It should host bucky only, not buckyd nor graphite.

@mwtzzz-zz
Copy link
Author

mwtzzz-zz commented Oct 5, 2017

copy/paste error. the full paste is: bucky inconsistent reports:
radar-be-i:4242: atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts

  • radar-be-i: carbon-relay is writing here
  • radar-be-f: bucky thinks this is where it should go

(note that "radar-be-f" is the shortened hostname. I remove the domain before posting.)

@mwtzzz-zz
Copy link
Author

bucky rebalance --no-op shows this:

2017/10/05 13:49:18 [radar-be-i:4242] atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts => radar-be-f

@mwtzzz-zz
Copy link
Author

Are there any tools we can use to see what carbon-relay is doing to calculate the hashring node, to see what bucky is doing, and to find out why they're giving different results? I see buckytools has a fnv1a_test.go but it seems to be missing a module.

@grobian
Copy link
Contributor

grobian commented Oct 6, 2017

Yes, if you launch your carbon-c-relay with the -t (test) flag, it will prompt for data input and show you how it would route the input (assuming it is a metric string). So, in your case, just start the relay with -t -f <conffile> and paste atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts. That will show you where the relay thinks it should be routed to. If this is different, then we need to examine what's going on in this case.

@mwtzzz-zz
Copy link
Author

mwtzzz-zz commented Oct 6, 2017

@thanks for letting me know about the test flag. Very useful. Now I know what's going on. First of all, it's not bucky's fault. Bucky is computing the correct placement of metric based on the information it's given.

The problem is that we are rewriting the metric name on the backend right before it goes to the local cache on that node. Our front end relay is sending the metric with "agg" prepended to the name. The backend relay receives this metric and then removes "agg" before writing it to its local cache. Bucky doesn't know about this rewrite, so it thinks the metric is on the wrong node. Technically it is on the wrong node given the metric name. But it is on the right node if the name has "agg" prepended to it.

So my problem is: how to rebalance this cluster, placing metrics whose name contains xxxx.sum_all.hosts into the node where they would go if the name contained agg.xxxx.sum_all.hosts. Any thoughts?

Here are the details:
atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum.yyy arrives at front end carbon-c-relay. an aggregate rule sums it and rewrites the metric name as agg.*.sum_all.hosts. this metric is then passed on to the back end relay. As you can see this is passed to the radar-be-i node:

agg.atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
match
    ^agg\. [strncmp: agg.]
    -> agg.atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
    fnv1a_ch(radar-be)
        radar-be-i:1905

The metric arrives at radar-be-i node where it is summed again and then "agg" is stripped from the metric name and then it is written to local whisper file as atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts:

agg.atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
aggregation
    ^(us-east-1\.)?agg\.(.+)$ (regex) -> agg.atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
    sum(\2) -> atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts 
atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
match
    * -> atlantic_exchange.usersync.cookiepartner.TAPAD.syncs.sum_all.hosts
    fnv1a_ch(cache)
        127.0.0.1:2107
    stop

The culprit here is the following rules in the relay conf file:

aggregate
    ^(us-east-1\.)?agg\.(.+)$
  compute sum write to \2
  ;
match ^agg\. send to blackhole stop;
match * send to cache stop;

It rewrites the name to \2 and then sends it to local cache. I suppose right before the last last I could insert a new rule that sends anything with "sum_all.hosts" back to the relay so that it gets routed to the correct host according to the hash. This is the only thing I can think of unless bucky has (or could have?) a way to balance a cluster based on some rewrite rules.

@deniszh
Copy link
Contributor

deniszh commented Oct 6, 2017

It rewrites the name to \2 and then sends it to local cache. I suppose right before the last last I could insert a new rule that sends anything with "sum_all.hosts" back to the relay so that it gets routed to the correct host according to the hash. This is the only thing I can think of unless bucky has (or could have?) a way to balance a cluster based on some rewrite rules.

Indeed. You should send new metric back to relay and not to local cache.
Or you can have separate whisper storage for local metrics, but that's quite ugly IMO.

@mwtzzz-zz
Copy link
Author

the good news is, bucky is doing things correctly. I'm looking forward to being able to add more nodes to the cluster and using it to rebalance.

@mwtzzz-zz
Copy link
Author

in testing this out, I came across a new unrelated issue. My carbon-cache instances write their own metrics to /media/ephemeral0/carbon/storage/whisper/carbon.radar702 as per a directive I set in the carbon-cache config file:
CARBON_METRIC_PREFIX = carbon.radar702
carbon-cache appears to write its own metrics directly to disk bypassing the relay (correct me if I'm wrong). Unfortunately, bucky looks at it and thinks it belongs on a different node:

2017/10/06 16:28:39 [radar-be-a:4242] carbon.radar702.agents.ip-172-22-17-20-2105.errors => radar-be-b

Is there a way to deal with this?

@jjneely
Copy link
Owner

jjneely commented Oct 7, 2017

You are correct about carbon-cache.py. It writes its own metrics directly to disk and they cannot go through the relay. Usually, these are prefixed with carbon.agents. and bucky has an exception for them in the rebalance code.

@mwtzzz-zz
Copy link
Author

made the change from carbon.radar702 to carbon.agents.radar702 and it works perfectly.

excellent tool. I used it on our small qa cluster and it rebalanced 45,000 out of 320,000 metrics in a matter of a couple seconds.

@jjneely
Copy link
Owner

jjneely commented Oct 7, 2017

Okay, so what issues remain here? The rebalance and corruption?

@mwtzzz-zz
Copy link
Author

No issues remaining. It seems to be working correctly. Thanks for your help on this, much appreciated!

@mwtzzz-zz
Copy link
Author

@jjneely I've noticed a new issue. I completed a rebalance on our main production cluster. Everything great, except there are a handful (about 700) metrics that the relays are putting on node "radar-be-k" while bucky thinks they should be on node "radar-be-i". The curious thing is that this is only happening on the one node. The other eleven nodes don't have this discrepansy.

I ran some of the metric names through carbon-c-relay -t -f on both the front end and backend relays for testing, and they always hash them to radar-be-k. So the relay is putting it in the spot it thinks it should go on.

in this case, it seems bucky is incorrect about the placement.

@grobian
Copy link
Contributor

grobian commented Oct 8, 2017

We'd need the exact metric name, so we can debug the hash on both c-relay and bucky.

@mwtzzz-zz
Copy link
Author

That's what I figured. The metric names include our ec2 instance hostnames. Can I private-message to you directly?

@grobian
Copy link
Contributor

grobian commented Oct 8, 2017

Yes of course. email is fine too.

@mwtzzz-zz
Copy link
Author

@grobian I just sent you an email from my gmail account.

@grobian
Copy link
Contributor

grobian commented Oct 9, 2017

all three metrics you sent me, end up on the same hash position (4379), and more annoyingly, that hash position is occupied by two of your servers, f and g. Now you mention k and i, so that's slightly odd, but it could very well be that carbon-c-relay is choosing the last matching entry, whereas the bucky implementation picks the first. This situation is exactly graphite-project/carbon@024f9e6, which I chose NOT to implement in carbon-c-relay because that would make the input order of servers define the outcome of the ring positions.

@grobian
Copy link
Contributor

grobian commented Oct 9, 2017

Likely reason is that carbon-c-relay nowadays uses a binary-search, which means it approaches the duplicates from the right, instead of the left as the original bisect-left did.

@grobian
Copy link
Contributor

grobian commented Oct 9, 2017

Python implements this:

    if lo < 0:
        raise ValueError('lo must be non-negative')
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        if a[mid] < x: lo = mid+1
        else: hi = mid
    return lo

IoW if carbon-c-relay uses a binary search, it does it wrong, because it should select the leftmost matching key.

@mwtzzz-zz
Copy link
Author

Is there anything I should do on my end to correct this?

@mwtzzz-zz
Copy link
Author

mwtzzz-zz commented Oct 9, 2017 via email

@grobian
Copy link
Contributor

grobian commented Oct 10, 2017

It seems to me both bucky and carbon-c-relay implement bisectLeft wrongly. The focus for me is now to align at least bucky and carbon-c-relay (which they are not at the moment). I'm thinking about how to align the algorithm in carbon-c-relay with the bucky implementation (which equals the old carbon-c-relay implementation iirc) as I suspect this will result in minimal (perhaps zero) changes in routing at least for some of my boundary tests. Yesterday I got stuck in trying to understand why my algorithm is so extremely complex compared to the simplicity the python folks came up with. Their implementation is a downright disaster because a lot of metrics will change destination.
Sidenote, this change isn't that bad if we separate carbon_ch from fnv1a. carbon-c-relay is "authorative" for fnv1a_ch since it introduced it. For carbon_ch it seems we're just doing it wrong. But I'd love someone to tell me right or wrong here.

@mwtzzz-zz
Copy link
Author

Thanks for looking at it. On my end, currently there's about 800 metrics that bucky has misplaced. That's 800 out of 150 million, so a very low percentage. As you mentioned, if any changes are made to relay it would be great if they result in minimal changes in routing.

grobian added a commit to grobian/carbon-c-relay that referenced this issue Oct 11, 2017
When collisions occur, we would stable-sort them such that the ring
would always be the same, regardless of input order.  However, the
binary search method (historical mistake) could end up on a dupicate
pos, and take the server as response, clearly not honouring the contract
of returning the /first/ >= pos match.
This change ensures collisions on pos are voided, and basically restores
pre-binary-search distribution introduced in v3.1.

This change should match the ring output with what bucky expects
jjneely/buckytools#17
@grobian
Copy link
Contributor

grobian commented Oct 11, 2017

Now the only thing necessary is that bucky ensures that servers are sorted/ordered in the same way carbon-c-relay does, then the output must be the same.

@mwtzzz-zz
Copy link
Author

@grobian do you want me to test that commit?

@grobian
Copy link
Contributor

grobian commented Oct 13, 2017

Yes please, it should result in those 800 metrics being sent to the nodes bucky wants them to be.

@mwtzzz-zz
Copy link
Author

Is it going to preserve the locations of the other metrics?

@grobian
Copy link
Contributor

grobian commented Oct 14, 2017

If you want to be sure, try and build a list of metric names (can get it off disk with find, replace all / with . and strip .wsp) then run it through carbon-c-relay -t and ensure all of them return the host you grabbed the conf from. Put differently, if you do this for your current running version and the latest HEAD, you should find running diff -u old new is rather small, where the new points to the box, and old points to another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants