Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong version of aws-java-sdk-bundle in sagemaker-spark 1.4.5 #149

Open
jobvisser03 opened this issue Aug 30, 2022 · 1 comment
Open

Wrong version of aws-java-sdk-bundle in sagemaker-spark 1.4.5 #149

jobvisser03 opened this issue Aug 30, 2022 · 1 comment

Comments

@jobvisser03
Copy link

jobvisser03 commented Aug 30, 2022

System Information

  • Spark or PySpark: 3.3.0
  • SDK Version: 1.4.5
  • Spark Version: 3.3.0

Describe the problem

I just spend 3 days trying to fix this but to no avail. My setup on an AWS notebook instance:
jars:
aws-java-sdk-bundle-1.11.901.jar
aws-java-sdk-core-1.12.262.jar
aws-java-sdk-kms-1.12.262.jar
aws-java-sdk-s3-1.12.262.jar
aws-java-sdk-sagemaker-1.12.262.jar
aws-java-sdk-sagemakerruntime-1.12.262.jar
aws-java-sdk-sts-1.12.262.jar
hadoop-aws-3.3.1.jar
sagemaker-spark_2.12-spark_3.3.0-1.4.5.jar

Problem:

Based on suggested workarounds in the article above I tried 4 things

  1. upgrade aws-java-sdk-bundle to version 1.12.262 like the other jars → didn’t work
  2. downgrade httpclient to version 4.5.10 → didn’t work
  3. tried to set the aws-java-sdk to disable SSL certificate checking (SSLPeerUnverifiedException on S3 actions aws-sdk-java-v2#1786 ) → didn’t work with "-Dcom.amazonaws.sdk.disableCertChecking=true"
  4. try to read from a bucket that doesn’t contain dots (.) → works

Minimal repo / logs

22/08/30 11:00:22 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: s3a://comp.data.sci.data.tst/some/folder/export_date=20220822. org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://comp.data.sci.data.tst/some/folder/export_date=20220822: com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <comp.data.sci.data.tst.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: Unable to execute HTTP request: Certificate for <comp.data.sci.data.tst.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185) at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277) at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189)

  • Exact command to reproduce:
    Works:
    df = spark.read.parquet("s3a://aws-bucket-with-dashes/file_0_1_0.snappy.parquet")
    Doesn't work:
    df = spark.read.parquet("s3a://aws.bucket.with.dots/file_0_1_0.snappy.parquet")

It's not possible to rename the bucket due to the many data consumers that depend on them.

@steveloughran
Copy link

  1. you shouldn't be duplicating sagemaker jars with the sdk bundle, as that contains everything and is meant to be shaded so as to avoid transient dependency issues.
  2. it's probably a problem with other things on your classpath
  3. s3a connector support for buckets with dots is incomplete and wont be fixed

@jobvisser03 jobvisser03 changed the title Read using S3A doesn't work; SdkClientException: Unable to execute HTTP request: Certificate for ... doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] Wrong version of aws-java-sdk-bundle in sagemaker-spark 1.4.5 Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants