Graph generator txt output format error #171

Arash-Afshar · 2018-05-01T04:26:01Z

Spark-Bench version (version number, tag, or git commit hash)

spark-bench_2.3.0_0.4.0-RELEASE

Details of your cluster setup (Spark version, Standalone/Yarn/Local/Etc)

Spark 2.2.0, Yarn

Scala version on your cluster

Your exact configuration file (with system details anonymized for security)

spark-bench = {
spark-submit-config = [{
spark-args = {
master = "yarn"
executor-memory = 5G
num-executors = 5
}
workload-suites = [
{
descr = "Graph Gen"
benchmark-output = "console"
workloads = [
{
name = "graph-data-generator"
vertices = 1000
output = "hdfs:///one-thousand-vertex-graph.txt"
}
]
}
]
}]
}

Relevant stacktrace

18/04/30 22:21:00 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (**********:40656) with ID 1
18/04/30 22:21:00 INFO storage.BlockManagerMasterEndpoint: Registering block manager **********:40021 with 2.5 GB RAM, BlockManagerId(1, *********, 40021, None)
18/04/30 22:21:15 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
Exception in thread "main" java.lang.Exception: Unrecognized or unspecified save format. Please check the file extension or add a file format to your arguments: Some(hdfs:///one-thousand-vertex-graph.txt)
at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.verifyFormatOrThrow(SparkFuncs.scala:92)
at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.verifyOutput(SparkFuncs.scala:35)
at com.ibm.sparktc.sparkbench.workload.Workload$class.run(Workload.scala:49)
at com.ibm.sparktc.sparkbench.datageneration.GraphDataGen.run(GraphDataGen.scala:90)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:98)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:98)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$.com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially(SuiteKickoff.scala:98)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$2.apply(SuiteKickoff.scala:72)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$2.apply(SuiteKickoff.scala:67)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$.run(SuiteKickoff.scala:67)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially$1.apply(MultipleSuiteKickoff.scala:38)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially$1.apply(MultipleSuiteKickoff.scala:38)
at scala.collection.immutable.List.foreach(List.scala:381)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$.com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially(MultipleSuiteKickoff.scala:38)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$run$1.apply(MultipleSuiteKickoff.scala:28)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$run$1.apply(MultipleSuiteKickoff.scala:25)
at scala.collection.immutable.List.foreach(List.scala:381)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$.run(MultipleSuiteKickoff.scala:25)
at com.ibm.sparktc.sparkbench.cli.CLIKickoff$.main(CLIKickoff.scala:30)
at com.ibm.sparktc.sparkbench.cli.CLIKickoff.main(CLIKickoff.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/04/30 22:21:15 INFO spark.SparkContext: Invoking stop() from shutdown hook

Description of your problem and any other relevant info

Despite using "hdfs:///one-thousand-vertex-graph.txt" as output, it complains about incorrect output format:

Arash-Afshar · 2018-05-01T04:58:04Z

I tracked it down to this line:

spark-bench/utils/src/main/scala/com/ibm/sparktc/sparkbench/utils/SparkFuncs.scala

Line 52 in be31655

    
           def verifyFormat(outputDir: String, fileFormat: Option[String] = None): Boolean = {

When calling graph data gen, the output is txt, but the function defined at that line does not recognize txt as a valid extension.

justorez · 2018-05-01T13:16:39Z

I don't think it supports text formatting. You could try to change the output file suffix to .csv

Arash-Afshar · 2018-05-01T13:27:35Z

It would not work. The documentation of graph data gen states that it should be *.txt:
https://codait.github.io/spark-bench/workloads/data-generator-graph/

I have also tried it with a not-txt extension and it had failed with a different error message, saying to choose txt.

hzhuang1 · 2018-11-19T05:40:27Z

It could be fixed in this pull request.
#180

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph generator txt output format error #171

Graph generator txt output format error #171

Arash-Afshar commented May 1, 2018

Arash-Afshar commented May 1, 2018

justorez commented May 1, 2018

Arash-Afshar commented May 1, 2018

hzhuang1 commented Nov 19, 2018

Graph generator txt output format error #171

Graph generator txt output format error #171

Comments

Arash-Afshar commented May 1, 2018

Spark-Bench version (version number, tag, or git commit hash)

Details of your cluster setup (Spark version, Standalone/Yarn/Local/Etc)

Scala version on your cluster

Your exact configuration file (with system details anonymized for security)

Relevant stacktrace

Description of your problem and any other relevant info

Arash-Afshar commented May 1, 2018

justorez commented May 1, 2018

Arash-Afshar commented May 1, 2018

hzhuang1 commented Nov 19, 2018