Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 TransferManager incompatible with AWS X-Ray #313

Open
Zhenye-Na opened this issue Dec 18, 2021 · 7 comments
Open

S3 TransferManager incompatible with AWS X-Ray #313

Zhenye-Na opened this issue Dec 18, 2021 · 7 comments
Assignees

Comments

@Zhenye-Na
Copy link

Hello

We are currently using XRay for the services we own, and one of the API involves files transfer, so I add the dependency of using S3 transferManager. However this throws XRay "SegmentNotFoundException".

Spend a little time checking what is the root cause and it turns out that it is because transferManager creates a thread pool and XRay is not able to gather context for the threads that transferManager created.

I am wondering any available solution for this already, having checked the following resources, but no luck

resources:

  1. https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-java-multithreading.html
  2. https://github.com/aws-samples/eb-java-scorekeep/blob/xray/src/main/java/scorekeep/MoveFactory.java#L70-L79
  3. https://stackoverflow.com/questions/53841672/aws-xray-sdk-issue-failed-to-begin-subsegment-named-amazon-s3-segment-cannot
  4. S3 TransferManager incompatible with AWS X-Ray aws-sdk-java#1572
@willarmiros
Copy link
Contributor

Hi @Zhenye-Na,

Thank you for raising this. You're on the right track. Basically the X-Ray SDK stores segment context using ThreadLocal. It uses this context to capture outgoing AWS SDK requests & generate a subsegment for them. If there's no context available, the SDK throws a SegmentNotFoundException. If transferManager creates a thread pool and uses new threads to send requests, then the X-Ray SDK will attempt to capture them and fail due to empty threadlocal, causing this exception.

To ignore this error, you can set the env var AWS_XRAY_CONTEXT_MISSING=IGNORE_ERROR, though of course this will cause some requests to not be instrumented. I'm not sure if the AWS SDK exposes enough of their implementation for us to hook into the new thread pool and capture these requests, nor do I think we'd have the bandwidth to extend our instrumentation to support this case. However I would recommend you open this feature request in the OpenTelemetry Java repo as well since they have an AWS SDK instrumentation that could be extended to support this.

@Zhenye-Na
Copy link
Author

Hello @willarmiros

Thank you so much for your reply and confirmation on the experiments I did. Basically what happened after this is we decided to temporarily bypass the SegmentNotFoundException by using the low level API that S3 team provided to do multi-part uploading and XRay works well with it so far.

I will open a feature request in the repo you mentioned above. However, I am not very familiar with the "terminology" / detailed process to solve this problem. Do you mind if I cc you later in the new issue I raised for OpenTelimetry team?

Thank you so much!

Merry Xmas 🎅

@Zhenye-Na
Copy link
Author

add some details on my own experiments for someone comes to this issues:

  1. Instead of setting env var, I did AWSXRay.withContextMissingStrategy(IgnoreErrorXXXStrategy) this does not throw any exceptions which is nice, but the request is timed out.
  2. In the code that transferManager create threadPool, try to retrieve the traceEntity of the GlobalRecorder and beginSubsegment() in each threads that transferManager created. -> either timed out or exception thrown

@willarmiros
Copy link
Contributor

Do you mind if I cc you later in the new issue I raised for OpenTelimetry team?

No problem

but the request is timed out.

Hmm so just adding X-Ray instrumentation and the ignore error strategy caused the request to time out? That's strange. It might have something to do with how transferManager works. Feel free to post some reproduction code, but glad you have a workaround for now!

@Zhenye-Na
Copy link
Author

open-telemetry/opentelemetry-java-instrumentation#6104

Issue created in OpenTelemetry Java, lets see how this goes

@Zhenye-Na
Copy link
Author

Also, raised one ticket in AWS SDK v2 to see if we get the chance to fix this

aws/aws-sdk-java-v2#3217

@Zhenye-Na
Copy link
Author

I am wondering if this issue will be included in the roadmap ?

Or are there any workarounds if we would like to continue use X-Ray in a multi-threading env

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants