Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with S3 directory as spark.sparklens.data.dir #31

Open
prateekgupta opened this issue Feb 27, 2019 · 2 comments
Open

Issue with S3 directory as spark.sparklens.data.dir #31

prateekgupta opened this issue Feb 27, 2019 · 2 comments

Comments

@prateekgupta
Copy link

Hi,
I tried specifying an S3 location as spark.sparklens.data.dir and I am getting the following error.
The issue is reproduced in every job.

As per my understanding, this is the code which is dumping the spark lens json file - https://github.com/qubole/sparklens/blob/master/src/main/scala/com/qubole/sparklens/QuboleJobListener.scala#L135
This should work with S3 directories as well.

19/02/27 07:23:18 ERROR Utils: uncaught error in thread SparkListenerBus, stopping SparkContext
java.lang.ExceptionInInitializerError
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createAndTrack(TemporaryDirectoriesGenerator.java:145)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createTemporaryDirectories(TemporaryDirectoriesGenerator.java:94)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:642)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:932)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:810)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:171)
at com.qubole.sparklens.QuboleJobListener.dumpData(QuboleJobListener.scala:136)
at com.qubole.sparklens.QuboleJobListener.onApplicationEnd(QuboleJobListener.scala:164)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
Caused by: java.lang.IllegalStateException: Shutdown in progress
at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:66)
at java.lang.Runtime.addShutdownHook(Runtime.java:211)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoryShutdownHook.(TemporaryDirectoryShutdownHook.java:17)
... 22 more
19/02/27 07:23:18 ERROR Utils: throw uncaught fatal error in thread SparkListenerBus
java.lang.ExceptionInInitializerError
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createAndTrack(TemporaryDirectoriesGenerator.java:145)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createTemporaryDirectories(TemporaryDirectoriesGenerator.java:94)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:642)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:932)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:810)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:171)
at com.qubole.sparklens.QuboleJobListener.dumpData(QuboleJobListener.scala:136)
at com.qubole.sparklens.QuboleJobListener.onApplicationEnd(QuboleJobListener.scala:164)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
Caused by: java.lang.IllegalStateException: Shutdown in progress
at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:66)
at java.lang.Runtime.addShutdownHook(Runtime.java:211)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoryShutdownHook.(TemporaryDirectoryShutdownHook.java:17)
... 22 more
19/02/27 07:23:18 INFO SparkContext: SparkContext already stopped.
Exception in thread "SparkListenerBus" java.lang.ExceptionInInitializerError
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createAndTrack(TemporaryDirectoriesGenerator.java:145)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoriesGenerator.createTemporaryDirectories(TemporaryDirectoriesGenerator.java:94)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:642)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:932)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:913)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:810)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:171)
at com.qubole.sparklens.QuboleJobListener.dumpData(QuboleJobListener.scala:136)
at com.qubole.sparklens.QuboleJobListener.onApplicationEnd(QuboleJobListener.scala:164)
at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
Caused by: java.lang.IllegalStateException: Shutdown in progress
at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:66)
at java.lang.Runtime.addShutdownHook(Runtime.java:211)
at com.amazon.ws.emr.hadoop.fs.files.TemporaryDirectoryShutdownHook.(TemporaryDirectoryShutdownHook.java:17)
... 22 more
19/02/27 07:23:18 INFO YarnClusterSchedulerBackend: Shutting down all executors

@iamrohit
Copy link
Contributor

@prateekgupta I suspect that this is something to do with how the filesystem is being initialised, very likely missing some mandatory configuration. We are using new Configuration() which should pick up the default config. I could suggest printing the configuration from the filesystem object in sparklens code as well as in your spark application to find out the difference.

Sparklens also work using event log files. To unblock yourself (assuming you have access to event log files for your application), you could use this alternative method instead.

@mdmohsin2005khan
Copy link

mdmohsin2005khan commented Jul 14, 2020

I tried giving s3a instead of just s3 and it worked for me, sparklens was writing the metrics to s3 location.

Example: --conf spark.sparklens.data.dir=s3a://bucket-name/prefixes/sparklens/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants