Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph generator txt output format error #171

Open
Arash-Afshar opened this issue May 1, 2018 · 4 comments
Open

Graph generator txt output format error #171

Arash-Afshar opened this issue May 1, 2018 · 4 comments

Comments

@Arash-Afshar
Copy link

Spark-Bench version (version number, tag, or git commit hash)

spark-bench_2.3.0_0.4.0-RELEASE

Details of your cluster setup (Spark version, Standalone/Yarn/Local/Etc)

Spark 2.2.0, Yarn

Scala version on your cluster

Your exact configuration file (with system details anonymized for security)

spark-bench = {
spark-submit-config = [{
spark-args = {
master = "yarn"
executor-memory = 5G
num-executors = 5
}
workload-suites = [
{
descr = "Graph Gen"
benchmark-output = "console"
workloads = [
{
name = "graph-data-generator"
vertices = 1000
output = "hdfs:///one-thousand-vertex-graph.txt"
}
]
}
]
}]
}

Relevant stacktrace

18/04/30 22:21:00 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (**********:40656) with ID 1
18/04/30 22:21:00 INFO storage.BlockManagerMasterEndpoint: Registering block manager **********:40021 with 2.5 GB RAM, BlockManagerId(1, *********, 40021, None)
18/04/30 22:21:15 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
Exception in thread "main" java.lang.Exception: Unrecognized or unspecified save format. Please check the file extension or add a file format to your arguments: Some(hdfs:///one-thousand-vertex-graph.txt)
at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.verifyFormatOrThrow(SparkFuncs.scala:92)
at com.ibm.sparktc.sparkbench.utils.SparkFuncs$.verifyOutput(SparkFuncs.scala:35)
at com.ibm.sparktc.sparkbench.workload.Workload$class.run(Workload.scala:49)
at com.ibm.sparktc.sparkbench.datageneration.GraphDataGen.run(GraphDataGen.scala:90)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:98)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially$1.apply(SuiteKickoff.scala:98)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$.com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runSerially(SuiteKickoff.scala:98)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$2.apply(SuiteKickoff.scala:72)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$2.apply(SuiteKickoff.scala:67)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$.run(SuiteKickoff.scala:67)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially$1.apply(MultipleSuiteKickoff.scala:38)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially$1.apply(MultipleSuiteKickoff.scala:38)
at scala.collection.immutable.List.foreach(List.scala:381)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$.com$ibm$sparktc$sparkbench$workload$MultipleSuiteKickoff$$runSuitesSerially(MultipleSuiteKickoff.scala:38)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$run$1.apply(MultipleSuiteKickoff.scala:28)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$$anonfun$run$1.apply(MultipleSuiteKickoff.scala:25)
at scala.collection.immutable.List.foreach(List.scala:381)
at com.ibm.sparktc.sparkbench.workload.MultipleSuiteKickoff$.run(MultipleSuiteKickoff.scala:25)
at com.ibm.sparktc.sparkbench.cli.CLIKickoff$.main(CLIKickoff.scala:30)
at com.ibm.sparktc.sparkbench.cli.CLIKickoff.main(CLIKickoff.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/04/30 22:21:15 INFO spark.SparkContext: Invoking stop() from shutdown hook

Description of your problem and any other relevant info

Despite using "hdfs:///one-thousand-vertex-graph.txt" as output, it complains about incorrect output format:

@Arash-Afshar
Copy link
Author

I tracked it down to this line:

def verifyFormat(outputDir: String, fileFormat: Option[String] = None): Boolean = {

When calling graph data gen, the output is txt, but the function defined at that line does not recognize txt as a valid extension.

@justorez
Copy link

justorez commented May 1, 2018

I don't think it supports text formatting. You could try to change the output file suffix to .csv

@Arash-Afshar
Copy link
Author

It would not work. The documentation of graph data gen states that it should be *.txt:
https://codait.github.io/spark-bench/workloads/data-generator-graph/

I have also tried it with a not-txt extension and it had failed with a different error message, saying to choose txt.

@hzhuang1
Copy link

It could be fixed in this pull request.
#180

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants