Latency spikes at every hour due to connection refreshes #1463

wwchen · 2017-07-13T21:51:09Z

Context: We use on-premise datacenters to connect to Bigtable, with a connection count of 180 (default CPU multiplier), currently on "com.google.cloud.bigtable" % "bigtable-hbase-1.2" % "0.9.6.2", upgrading to latest soon ("bigtable-hbase-1.x" % "1.0.0-pre1")

At every hour, we see a big spike in latency, and we believe there's some kind of connection refresh that is happening

Can the client do this in the background so it doesn't impact request timings?

The text was updated successfully, but these errors were encountered:

sduskis · 2017-07-14T02:48:06Z

Short answer: It's not technically feasible to create a generic solution for this in the short term, but I can walk you through a solution.

Why you have this hourly issue: Connections shut down every hour, and that's something Cloud Bigtable team can't control. gRPC, which is our underlying infrastructure, performs a reconnect. The single connection from the client side initiates a serial set of new connections between components and requires a set of server-side caching.

The Cloud Bigtable client doesn't control the connection lifecycle, since grpc handles the reconnect a lot better than the Cloud Bigtable client could. Back in the early days of Cloud Bigtable, we did the reconnect ourselves, but we ran into corner cases that we couldn't solve; those corner cases usually related to long running operations like scans or operations when the server-side was under load when may operations take longer than normal. Once grpc provided an automatic reconnect, we gave up on our sometimes broken solution and used grpc's better implementation.

In your case (which has been discussed internally), the problem is much more pronounced given the physical distance between your client and your cluster. This problem is negligible for GCE users in the same zone.

What a client can do to fix this: HBase connection has multiple grpc Channels under the covers. Each grpc Channel needs to be primed with an RPC that's not part of a serving path, and then put into rotation for use. The normal configuration is 2.5 Channels per connection, so on a 4 CPU machine, 10 request would be required per Connection to prime the Connection.

An infrastructure level change would require the client to have a generic RPC to prime a connection, but sadly that generic RPC does not exist yet, and given the difficulty in finding all of the corner cases, I don't see a near term fix in the core client.

That said, you can build something similar yourself. You can create a new Connection ever 30-40 minutes, run a few exist() calls on that connection to prime the underlying gRPC channels, and then swap out the new Connection for your main operations.

wwchen · 2017-07-14T03:53:13Z

Got it, thanks @sduskis for the explanation, the best yet I've gotten so far. Your proposed solution is something I've considered implementing, and that may very well be the case until we have the opportunity to move to GCE. I'm glad to hear it was discussed internally and that I have a more realistic prioritization. 👏

liufuyang · 2020-03-09T17:08:29Z

Seems like this is still a problem. Will there be some fix on this issue in the near feature? So the solution is still to create a new client every 30 to 40 minutes? @sduskis

igorbernstein2 · 2020-03-09T19:54:15Z

We implemented a solution for this in googleapis/java-bigtable#115.
We are currently in the process of migrating the hbase adapter to use java-bigtable instead of bigtable-client-core. So it will inherit this feature

liufuyang · 2020-03-10T10:07:10Z

Oh cool, that is good to know 👍 Thanks for the update

liufuyang · 2020-03-13T16:06:07Z

@igorbernstein2 Do you know for sure that this setting can solve this hourly issue? Or it only can reduce it in some degree? We tried to use it and it seems not really helping and we still see hourly latency peaks? 🤔 (Perhaps I should ping you at other issue page, as we are not really using hbase, we basically use java-bigtable and we have a java backend server that connections onto bigtable :))

wwchen closed this as completed Jul 14, 2017

wwchen mentioned this issue Sep 5, 2017

Make exist calls public OpenTSDB/asyncbigtable#26

Open

google-cloud-label-sync bot added the api: bigtable Issues related to the googleapis/java-bigtable-hbase API. label Jan 31, 2020

jc2729 mentioned this issue Sep 1, 2020

Bigtable latency spikes every hour due to grpc reconnects googleapis/google-cloud-cpp#4996

Closed

JustinBeckwith assigned wwchen Feb 1, 2021

dbolduc mentioned this issue Aug 23, 2022

Support creating hundreds of simultaneous Bigtable Table clients googleapis/google-cloud-cpp#5871

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency spikes at every hour due to connection refreshes #1463

Latency spikes at every hour due to connection refreshes #1463

wwchen commented Jul 13, 2017

sduskis commented Jul 14, 2017

wwchen commented Jul 14, 2017

liufuyang commented Mar 9, 2020

igorbernstein2 commented Mar 9, 2020

liufuyang commented Mar 10, 2020

liufuyang commented Mar 13, 2020 •

edited

Latency spikes at every hour due to connection refreshes #1463

Latency spikes at every hour due to connection refreshes #1463

Comments

wwchen commented Jul 13, 2017

sduskis commented Jul 14, 2017

wwchen commented Jul 14, 2017

liufuyang commented Mar 9, 2020

igorbernstein2 commented Mar 9, 2020

liufuyang commented Mar 10, 2020

liufuyang commented Mar 13, 2020 • edited

liufuyang commented Mar 13, 2020 •

edited