Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated hbase:1.4 image causing connection failures #42

Open
seglo opened this issue Mar 11, 2020 · 5 comments
Open

Updated hbase:1.4 image causing connection failures #42

seglo opened this issue Mar 11, 2020 · 5 comments
Assignees

Comments

@seglo
Copy link

seglo commented Mar 11, 2020

An update to the HarkiSekhon/hbase:1.4 image broke our Alpakka hbase connector integration test some time after March 8th (last successful hbase integration test build). An earlier cached version of the image (from 2 months ago) seems to work fine.

akka/alpakka#2185

Our test clients are failing with several different error messages, but I think the underlying errors are connection timeouts. We update our hosts file to point hbase to 127.0.0.1, but this doesn't seem to work locally or on travis, but it did with the old cached version I had.

A connection timeout:

[error] Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
[error] Wed Mar 11 10:29:06 EDT 2020, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68437: Call to hbase/127.0.0.1:16020 failed on connection exception: java.net.ConnectException: Connection refused row 'person2,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hbase,16020,1583936752151, seqNum=0
[error]     at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:329)
[error]     at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:242)
[error]     at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
[error]     at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
[error]     at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:275)
[error]     at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:436)
[error]     at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:310)
[error]     at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:196)
[error]     at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
[error]     at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isTableAvailable(ConnectionManager.java:1057)
[error]     at org.apache.hadoop.hbase.client.HBaseAdmin.isTableAvailable(HBaseAdmin.java:1537)
[error]     at akka.stream.alpakka.hbase.impl.HBaseCapabilities.$anonfun$getOrCreateTable$1(HBaseCapabilities.scala:53)
[error]     at akka.stream.alpakka.hbase.impl.HBaseCapabilities.twr(HBaseCapabilities.scala:26)
[error]     at akka.stream.alpakka.hbase.impl.HBaseCapabilities.twr$(HBaseCapabilities.scala:24)
[error]     at akka.stream.alpakka.hbase.impl.HBaseFlowStage$$anon$1.twr(HBaseFlowStage.scala:25)
[error]     at akka.stream.alpakka.hbase.impl.HBaseCapabilities.getOrCreateTable(HBaseCapabilities.scala:51)
[error]     at akka.stream.alpakka.hbase.impl.HBaseCapabilities.getOrCreateTable$(HBaseCapabilities.scala:49)
[error]     at akka.stream.alpakka.hbase.impl.HBaseFlowStage$$anon$1.getOrCreateTable(HBaseFlowStage.scala:25)
[error]     at akka.stream.alpakka.hbase.impl.HBaseFlowStage$$anon$1.table$lzycompute(HBaseFlowStage.scala:31)
[error]     at akka.stream.alpakka.hbase.impl.HBaseFlowStage$$anon$1.akka$stream$alpakka$hbase$impl$HBaseFlowStage$$anon$$table(HBaseFlowStage.scala:31)
[error]     at akka.stream.alpakka.hbase.impl.HBaseFlowStage$$anon$1$$anon$3.$anonfun$onPush$1(HBaseFlowStage.scala:48)
[error]     at scala.collection.Iterator.foreach(Iterator.scala:941)
[error]     at scala.collection.Iterator.foreach$(Iterator.scala:941)
[error]     at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
[error]     at scala.collection.IterableLike.foreach(IterableLike.scala:74)
[error]     at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
[error]     at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
[error]     at akka.stream.alpakka.hbase.impl.HBaseFlowStage$$anon$1$$anon$3.onPush(HBaseFlowStage.scala:46)
[error]     at akka.stream.impl.fusing.GraphInterpreter.processPush(GraphInterpreter.scala:523)
[error]     at akka.stream.impl.fusing.GraphInterpreter.processEvent(GraphInterpreter.scala:480)
[error]     at akka.stream.impl.fusing.GraphInterpreter.execute(GraphInterpreter.scala:376)
[error]     at akka.stream.impl.fusing.GraphInterpreterShell.runBatch(ActorGraphInterpreter.scala:606)
[error]     at akka.stream.impl.fusing.ActorGraphInterpreter$SimpleBoundaryEvent.execute(ActorGraphInterpreter.scala:47)
[error]     at akka.stream.impl.fusing.ActorGraphInterpreter$SimpleBoundaryEvent.execute$(ActorGraphInterpreter.scala:43)
[error]     at akka.stream.impl.fusing.ActorGraphInterpreter$BatchingActorInputBoundary$OnNext.execute(ActorGraphInterpreter.scala:85)
[error]     at akka.stream.impl.fusing.GraphInterpreterShell.processEvent(ActorGraphInterpreter.scala:581)
[error]     at akka.stream.impl.fusing.ActorGraphInterpreter.akka$stream$impl$fusing$ActorGraphInterpreter$$processEvent(ActorGraphInterpreter.scala:749)
[error]     at akka.stream.impl.fusing.ActorGraphInterpreter$$anonfun$receive$1.applyOrElse(ActorGraphInterpreter.scala:764)
[error]     at akka.actor.Actor.aroundReceive(Actor.scala:539)
[error]     at akka.actor.Actor.aroundReceive$(Actor.scala:537)
[error]     at akka.stream.impl.fusing.ActorGraphInterpreter.aroundReceive(ActorGraphInterpreter.scala:671)
[error]     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:612)
[error]     at akka.actor.ActorCell.invoke(ActorCell.scala:581)
[error]     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268)
[error]     at akka.dispatch.Mailbox.run(Mailbox.scala:229)
[error]     ... 3 more
[error] Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68437: Call to hbase/127.0.0.1:16020 failed on connection exception: java.net.ConnectException: Connection refused row 'person2,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hbase,16020,1583936752151, seqNum=0
[error]     at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:178)
[error]     at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
[error]     ... 3 more
[error] Caused by: java.net.ConnectException: Call to hbase/127.0.0.1:16020 failed on connection exception: java.net.ConnectException: Connection refused
[error]     at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:165)
[error]     at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:389)
[error]     at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:94)
[error]     at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
[error]     at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
[error]     at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
[error]     at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
[error]     at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:422)
[error]     at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:327)
[error]     at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:94)
[error]     at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:571)
[error]     at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:37059)
[error]     at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:405)
[error]     at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:274)
[error]     at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
[error]     at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
[error]     at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388)
[error]     at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362)
[error]     at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142)
[error]     ... 4 more
[error] Caused by: java.net.ConnectException: Connection refused
[error]     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[error]     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[error]     at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
[error]     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
[error]     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
[error]     at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.setupConnection(BlockingRpcConnection.java:256)
[error]     at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.setupIOstreams(BlockingRpcConnection.java:437)
[error]     at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.writeRequest(BlockingRpcConnection.java:540)
[error]     at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.tracedWriteRequest(BlockingRpcConnection.java:520)
[error]     at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.access$200(BlockingRpcConnection.java:85)
[error]     at org.apache.hadoop.hbase.ipc.BlockingRpcConnection$4.run(BlockingRpcConnection.java:724)
[error]     at org.apache.hadoop.hbase.ipc.HBaseRpcControllerImpl.notifyOnCancel(HBaseRpcControllerImpl.java:240)
[error]     at org.apache.hadoop.hbase.ipc.BlockingRpcConnection.sendRequest(BlockingRpcConnection.java:699)
[error]     at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:420)
[error]     ... 15 more

Another test complaining about HADOOP_HOME. This was never necessary before, so it seems odd that it would be now.

--> [docs.scaladsl.HBaseStageSpec: HBase stage must write write entries to a sink] Start of log messages of test that [Failed(org.scalatest.concurrent.Futures$FutureConcept$$anon$1: A timeout occurred waiting for a future to complete. Queried 11 times, sleeping 500000000 nanoseconds between each query.)]
10:27:51.314 INFO  [default-dispatcher-2] akka.event.slf4j.Slf4jLogger          Slf4jLogger started
10:27:51.327 DEBUG [default-dispatcher-2] akka.event.EventStream                logger log1-Slf4jLogger started
10:27:51.329 DEBUG [default-dispatcher-2] akka.event.EventStream                Default Loggers started
10:27:51.492 DEBUG [pool-1-thread-1     ] logcapture                            enabling CapturingAppender
10:27:51.631 DEBUG [pool-1-thread-1     ] org.apache.hadoop.util.Shell          Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
        at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:329)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:354)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
        at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1437)
        at org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:67)
        at org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:81)
        at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:96)
        at docs.scaladsl.HBaseStageSpec.<init>(HBaseStageSpec.scala:102)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.lang.Class.newInstance(Class.java:442)
        at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:450)
        at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:304)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

The hbase stdout indicates a connection is trying to be made for each of our tests, but does not succeed.

hbase_1                        | 2020-03-11 14:35:40,351 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /192.168.160.1:53018
hbase_1                        | 2020-03-11 14:35:40,357 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /192.168.160.1:53018

Here's the full log of hbase stdout: https://pastebin.com/x0dy7d8J

The hash of the image we're currently using.

harisekhon/hbase                                                                                                                                 1.4                                        0ae79dcd8e6b        6 days ago          243MB

Please let me know if I can provide any additional troubleshooting info or context.

@HariSekhon HariSekhon self-assigned this Mar 11, 2020
@HariSekhon
Copy link
Owner

DockerHub is dying with builds backed up for days trying to build images... so I've just pushed the latest 1.4 version myself.

Please pull and check against that most recent version (re-running existing build commit is usually sufficient), as I've run a full suite of tests from Advanced Nagios Plugins repo against this HBase 1.4 image.

@HariSekhon
Copy link
Owner

HariSekhon commented Mar 11, 2020

I'll be working on re-enabling the extensive automatic test suites against these images later in the month as they were exceeding CI time limits but GitHub Actions is more flexible.

@seglo
Copy link
Author

seglo commented Mar 11, 2020

@HariSekhon Thanks for the reply. I re-ran a Travis job and it doesn't seem to have made a difference. We don't have Travis setup to cache docker images, so it will pull it each time.

https://travis-ci.com/github/akka/alpakka/jobs/296475571

I tried locally too, with the same results. The age on the image still shows 6 (now 7) days.

harisekhon/hbase                                                                                                                                 1.4                                        e6435aac09c1        7 days ago          243MB

@seglo
Copy link
Author

seglo commented Mar 26, 2020

Hi @HariSekhon. Were you able to run your testsuites yet?

@bdevetak
Copy link

@HariSekhon : I think this is the same issue as #43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants