Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latency spikes at every hour due to connection refreshes #1463

Closed
wwchen opened this issue Jul 13, 2017 · 6 comments
Closed

Latency spikes at every hour due to connection refreshes #1463

wwchen opened this issue Jul 13, 2017 · 6 comments
Assignees
Labels
api: bigtable Issues related to the googleapis/java-bigtable-hbase API.

Comments

@wwchen
Copy link

wwchen commented Jul 13, 2017

Context: We use on-premise datacenters to connect to Bigtable, with a connection count of 180 (default CPU multiplier), currently on "com.google.cloud.bigtable" % "bigtable-hbase-1.2" % "0.9.6.2", upgrading to latest soon ("bigtable-hbase-1.x" % "1.0.0-pre1")

At every hour, we see a big spike in latency, and we believe there's some kind of connection refresh that is happening

image

Can the client do this in the background so it doesn't impact request timings?

@sduskis
Copy link
Contributor

sduskis commented Jul 14, 2017

Short answer: It's not technically feasible to create a generic solution for this in the short term, but I can walk you through a solution.

Why you have this hourly issue: Connections shut down every hour, and that's something Cloud Bigtable team can't control. gRPC, which is our underlying infrastructure, performs a reconnect. The single connection from the client side initiates a serial set of new connections between components and requires a set of server-side caching.

The Cloud Bigtable client doesn't control the connection lifecycle, since grpc handles the reconnect a lot better than the Cloud Bigtable client could. Back in the early days of Cloud Bigtable, we did the reconnect ourselves, but we ran into corner cases that we couldn't solve; those corner cases usually related to long running operations like scans or operations when the server-side was under load when may operations take longer than normal. Once grpc provided an automatic reconnect, we gave up on our sometimes broken solution and used grpc's better implementation.

In your case (which has been discussed internally), the problem is much more pronounced given the physical distance between your client and your cluster. This problem is negligible for GCE users in the same zone.

What a client can do to fix this: HBase connection has multiple grpc Channels under the covers. Each grpc Channel needs to be primed with an RPC that's not part of a serving path, and then put into rotation for use. The normal configuration is 2.5 Channels per connection, so on a 4 CPU machine, 10 request would be required per Connection to prime the Connection.

An infrastructure level change would require the client to have a generic RPC to prime a connection, but sadly that generic RPC does not exist yet, and given the difficulty in finding all of the corner cases, I don't see a near term fix in the core client.

That said, you can build something similar yourself. You can create a new Connection ever 30-40 minutes, run a few exist() calls on that connection to prime the underlying gRPC channels, and then swap out the new Connection for your main operations.

@wwchen
Copy link
Author

wwchen commented Jul 14, 2017

Got it, thanks @sduskis for the explanation, the best yet I've gotten so far. Your proposed solution is something I've considered implementing, and that may very well be the case until we have the opportunity to move to GCE. I'm glad to hear it was discussed internally and that I have a more realistic prioritization. 👏

@wwchen wwchen closed this as completed Jul 14, 2017
@google-cloud-label-sync google-cloud-label-sync bot added the api: bigtable Issues related to the googleapis/java-bigtable-hbase API. label Jan 31, 2020
@liufuyang
Copy link

Seems like this is still a problem. Will there be some fix on this issue in the near feature? So the solution is still to create a new client every 30 to 40 minutes? @sduskis

@igorbernstein2
Copy link
Collaborator

We implemented a solution for this in googleapis/java-bigtable#115.
We are currently in the process of migrating the hbase adapter to use java-bigtable instead of bigtable-client-core. So it will inherit this feature

@liufuyang
Copy link

Oh cool, that is good to know 👍 Thanks for the update

@liufuyang
Copy link

liufuyang commented Mar 13, 2020

@igorbernstein2 Do you know for sure that this setting can solve this hourly issue? Or it only can reduce it in some degree? We tried to use it and it seems not really helping and we still see hourly latency peaks? 🤔 (Perhaps I should ping you at other issue page, as we are not really using hbase, we basically use java-bigtable and we have a java backend server that connections onto bigtable :))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/java-bigtable-hbase API.
Projects
None yet
Development

No branches or pull requests

4 participants