Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AmazonClientException: Unable to complete transfer: Connection pool shut down #207

Open
dkazakevich opened this issue Jan 22, 2019 · 1 comment

Comments

@dkazakevich
Copy link

We have a spark job that loads a list of datasets from db2 into COS that runs every day.
df.write.mode(SaveMode.Overwrite).format("parquet").save("cos://staging.dev/snapshots/temp/dataset/parquet")
It works fine with small datasets, but for the big ones (about 30GB) it throws AmazonClientException:

2019-01-22 00:35:52 WARN  WatchConnectionManager:192 - Exec Failure
java.io.EOFException
	at okio.RealBufferedSource.require(RealBufferedSource.java:60)
	at okio.RealBufferedSource.readByte(RealBufferedSource.java:73)
	at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:113)
	at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:97)
	at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2019-01-22 00:35:52 INFO  WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0)
2019-01-22 00:38:45 WARN  TaskSetManager:66 - Lost task 11.0 in stage 18.0 (TID 751, 172.30.198.214, executor 6): org.apache.spark.SparkException: Task failed while writing rows.
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:254)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:168)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: saving output snapshots/temp/dataset/parquet/dt=2019-01-22/ts=1548115246780/part-00011-41e2b878-bd4f-4e46-9997-8fac71582b7a-attempt_20190122003826_0018_m_000011_0.c000.snappy.parquet com.amazonaws.AmazonClientException: Unable to complete transfer: null
	at com.ibm.stocator.fs.cos.COSOutputStream.close(COSOutputStream.java:173)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
	at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.close(HadoopPositionOutputStream.java:64)
	at org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:685)
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:122)
	at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:165)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:57)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:74)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:244)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:239)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:245)
	... 10 more

2019-01-22 00:39:05 WARN  TaskSetManager:66 - Lost task 29.0 in stage 18.0 (TID 763, 172.30.198.214, executor 6): org.apache.spark.SparkException: Task failed while writing rows.
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:254)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:168)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: saving output snapshots/temp/dataset/parquet/dt=2019-01-22/ts=1548115246780/part-00029-41e2b878-bd4f-4e46-9997-8fac71582b7a-attempt_20190122003854_0018_m_000029_0.c000.snappy.parquet com.amazonaws.AmazonClientException: Unable to complete transfer: Connection pool shut down
	at com.ibm.stocator.fs.cos.COSOutputStream.close(COSOutputStream.java:173)
	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
	at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.close(HadoopPositionOutputStream.java:64)
	at org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:685)
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:122)
	at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:165)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:57)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:74)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:244)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:239)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:245)
	... 10 more
@Dewsmen
Copy link

Dewsmen commented Jun 28, 2019

Hi, I have the same issue, some additional details and (probably) workaround:

  1. The same code works if I use "csv" instead of "parquet"
  2. There is no exception if disable SSL connection to COS: fs.cos.connection.ssl.enabled=false.
    I hope it gives you some clue.

P.S. Looks like this is the same issue Spark 2.3 app in k8s cluster: parquet->COS throws exception #200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants