Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncaught exception for AWS S3 connection #2

Open
dbtucker opened this issue May 2, 2016 · 6 comments
Open

Uncaught exception for AWS S3 connection #2

dbtucker opened this issue May 2, 2016 · 6 comments

Comments

@dbtucker
Copy link

dbtucker commented May 2, 2016

I'm attempting to test the Sink connector in the latest Kafka 0.9.0.1 framework. When attempting to launch a standalone connect worker task with the S3 sink configured in it (using the example*.properties files modified for my environment, the following exception is thrown at startup

Exception in thread "WorkerSinkTask-s3-sink-0" java.lang.NoSuchFieldError: INSTANCE
at com.amazonaws.http.conn.SdkConnectionKeepAliveStrategy.getKeepAliveDuration(SdkConnectionKeepAliveStrategy.java:48)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:535)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:822)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:576)
at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:362)
at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:328)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:307)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3643)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1148)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1037)
at com.deviantart.kafka_connect_s3.S3Writer.fetchOffset(S3Writer.java:100)
at com.deviantart.kafka_connect_s3.S3SinkTask.recoverPartition(S3SinkTask.java:193)
at com.deviantart.kafka_connect_s3.S3SinkTask.recoverAssignment(S3SinkTask.java:181)
at com.deviantart.kafka_connect_s3.S3SinkTask.start(S3SinkTask.java:84)
at org.apache.kafka.connect.runtime.WorkerSinkTask.joinConsumerGroupAndStart(WorkerSinkTask.java:154)
at org.apache.kafka.connect.runtime.WorkerSinkTaskThread.execute(WorkerSinkTaskThread.java:54)
at org.apache.kafka.connect.util.ShutdownableThread.run(ShutdownableThread.java:82)

I suspect something fundamental in the AWS configuration, but all the CLI options (aws s3 *) are working fine against the bucket specified in the *.properties file.
NOTE: the bucket is private to my account ... not world-accessible.

I tried to update to the latest AWS artifact (1.10.43 ... upgraded from 1.10.37 in your latest checkin). No improvement.

@dbtucker
Copy link
Author

dbtucker commented May 6, 2016

More details if it helps:

The S3Writer.fetchOffset function should probably catch things of type Throwable, not just exceptions (given that the java.lang.Error inherits from there). Even when I modified the code to forward on an IOException for that specific error case, the connector still fails. It is as if creating a new set of files in an empty bucket is not working.

Any thoughts as to the root cause of this ?

@banks
Copy link
Contributor

banks commented May 6, 2016

Hi David, thanks for the report.

Honestly it looks like you have more ideas than I do about this! This was the first real thing I've ever written in Java and I mostly took other Confluent connectors as gospel in terms of structure and what needed to be caught etc.

I'm not sure I agree just catching everything is quite the right solution though. The whole point of exceptions is to allow handling what you understand while allowing stuff you know nothing about to be handled by some other part of code that knows what the safest resolution is. Catching IOException makes sense - if the network call fails we don't want to blow up. But catching all throwable things and pretending there is no error sounds like a bad idea (at least to a Java non-expert) - what if memory allocation failure or corruption caused an exception and we just caught it and carry on blindly?

At a high level, the error seems to hint towards API version issues - it's deep in AWS SDK code and is a type error for a field - but you say you already tried latest AWS SDK version so not sure what else to suggest.

Did you manage to get the system tests running? That might be useful although it's not totally trivial. See https://github.com/DeviantArt/kafka-connect-s3/tree/master/system_test

Sorry I can't help more off the top of my head. I may get a chance to re-test and see if I can reproduce, although this was working for me a few weeks ago. I've also actually since moved on from DeviantArt so I won't be using this code in production any time soon (or have much spare time to debug it I guess).

Not sure if @metamode, @kojik1010 or @redstonemercury would be able to look into it more.

@ChenShuai1981
Copy link

I met the same issues as David, @banks could you attach your jar file here for us try? I also tried latest aws-java-sdk 1.10.76 version but it doesn't help.

@banks
Copy link
Contributor

banks commented May 10, 2016

@ChenShuai1981 sadly I don't have that easily available right now as it was work done for a previous employer.

On the other hand I googled the error and saw this: https://caffinc.github.io/2015/12/sqs-instance-exception/

I finally realized that this was a problem with the version of Apache HTTP Client that was on my colleague’s machine. She had a dependency on v4.2 of the Apache HTTP Client in her code, while the Amazon AWS SDK used v4.3.6

Does that help either of you?

@dbtucker
Copy link
Author

The issue is indeed one of jar overlap. The Confluent Platform includes a version of the AWS SDK that conflicts with the version built into the connector. I was able to use vanilla Apache Kafka 0.9 and correctly get things working. I will log an issue on the Confluent side to isolate their AWS SDK from the connector worker tasks.

FWIW, I was able to use the FileStreamSourceConnector (via both connect-console-source and connect-file-source) to create data for ingestion.

  • Decide on a topic, and use the connect-standalone.sh script to deploy a quick source connector to seed your topic.
  • Confirm that there is data there using the kafka-console-consumer.sh script against the topic name. You'll see lines of the form "{"schema":{"type":"string","optional":false},"payload":"topic=console-test4"}"
  • Then launch the s3 sink connector (again using connect-standalone.sh)

The S3 bucket should have both an offset file along with a directory containing the actual data (in the index.json and gzip format described in the README).

Feeding a continuous stream of text to the connect-console-source will managing is as simple as setting a simple file path in the connect-file-source.properties file, matching the topic setting in that file to the connect-s3-sink.properties file, and then deploying both connectors with
bin/connect-standalone.sh config/connect-standalone.properties config/connect-s3-sink.properties config/connect-file-source.properties

You may see some warning messages about the file not existing ... but simply copying any file to that target file name will have the desired effect. One file after another can be uploaded to S3 cleanly.

@dbtucker
Copy link
Author

The jar collision is with the hdfs connect distributed with the Confluent Platform. Moving the $PLATFORM_HOME/share/java/kafka-connect-hdfs directory out the way allowed the S3 connector to function properly in my environment.

The other issue is the need for a property setting for "local.buffer.dir". I default it to /tmp on my Mac.

orsher referenced this issue in personali/kafka-connect-cloud-storage Nov 14, 2017
iamnoah pushed a commit to iamnoah/kafka-connect-s3 that referenced this issue Dec 11, 2020
…-configuration

fix typo in boolean expression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants