Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to run linear regression workload #134

Open
akasaki opened this issue Dec 30, 2017 · 4 comments
Open

failed to run linear regression workload #134

akasaki opened this issue Dec 30, 2017 · 4 comments

Comments

@akasaki
Copy link

akasaki commented Dec 30, 2017

Spark-Bench Version: spark-bench_2.1.1_0.2.2

Spark Version on Your Cluster: 2.1.1.2.6.1.0

Scala Version on Your Spark Cluster: 2.11.8

date_gen_lr.conf

spark-bench = {
export SPARK_HOME=/usr/hdp/current/spark2-client

export SPARK_MASTER_HOST=yarn
  spark-submit-config = [{
    workload-suites = [
      {
        descr = "Data-generator-lr"
        benchmark-output = "console"
        parallel=true
        repeat=1
        workloads = [
          {
            name = "data-generation-lr"
            rows = 10
            cols=12
            partitions=1
            output="hdfs:///tmp/lr-small-test.parquet"
          }
        ]
      }
    ]
  }]
}

lr.conf

spark-bench = {
export SPARK_HOME=/usr/hdp/current/spark2-client

export SPARK_MASTER_HOST=yarn
  spark-submit-config = [{
    workload-suites = [
      {
        descr = "lr"
        benchmark-output = "console"
        parallel=true
        repeat=1
        workloads = [
          {

                 name = "lr-bml"
                 input = "/tmp/lr-small.parquet/part-00000-1c1b2d41-6590-4938-ad2e-07005310f75b.snappy.parquet"
                testfile = "/tmp/lr-small-test.parquet/part-00000-62d3a63c-58fd-4de8-89a5-ce2cf1c1f298.snappy.parquet"
                output = "/tmp/lr-results-small.csv"

          }
        ]
      }
    ]
  }]
}

Relevant Stack Trace (If Applicable)

Description of Problem, Any Other Info

I was trying to run linear regression workload, but it failed. Basically, I use Data Generator - Linear Regressionr to create 2 small files, one as input file and one as test file (indicated in the lr.conf). Anyone can help me with fixing the error? Thanks a lot.

The error message is attached below:

  *** SPARK-SUBMIT: [/usr/hdp/current/spark2-client/bin/spark-submit, --class, com.ibm.sparktc.sparkbench.cli.CLIKickoff, --master, yarn, /home/ubuntu/spark-bench_2.1.1_0.2.2-RELEASE/lib/spark-bench-2.1.1_0.2.2-RELEASE.jar, {"spark-bench":{"spark-submit-config":[{"workload-suites":[{"benchmark-output":"console","descr":"lr","parallel":true,"repeat":1,"workloads":[{"input":"/tmp/lr-small.parquet/part-00000-1c1b2d41-6590-4938-ad2e-07005310f75b.snappy.parquet","name":"lr-bml","output":"/tmp/lr-results-small.csv","testfile":"/tmp/lr-small-test.parquet/part-00000-62d3a63c-58fd-4de8-89a5-ce2cf1c1f298.snappy.parquet"}]}]}]}}]
17/12/30 21:44:20 INFO SparkContext: Running Spark version 2.1.1.2.6.1.0-129
17/12/30 21:44:21 INFO SecurityManager: Changing view acls to: ubuntu
17/12/30 21:44:21 INFO SecurityManager: Changing modify acls to: ubuntu
17/12/30 21:44:21 INFO SecurityManager: Changing view acls groups to: 
17/12/30 21:44:21 INFO SecurityManager: Changing modify acls groups to: 
17/12/30 21:44:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ubuntu); groups with view permissions: Set(); users  with modify permissions: Set(ubuntu); groups with modify permissions: Set()
17/12/30 21:44:21 INFO Utils: Successfully started service 'sparkDriver' on port 52637.
17/12/30 21:44:21 INFO SparkEnv: Registering MapOutputTracker
17/12/30 21:44:21 INFO SparkEnv: Registering BlockManagerMaster
17/12/30 21:44:21 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/12/30 21:44:21 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/12/30 21:44:21 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-bebc9aff-5865-40d5-9b4e-55879b7e659c
17/12/30 21:44:21 INFO MemoryStore: MemoryStore started with capacity 2004.6 MB
17/12/30 21:44:22 INFO SparkEnv: Registering OutputCommitCoordinator
17/12/30 21:44:22 INFO log: Logging initialized @3145ms
17/12/30 21:44:22 INFO Server: jetty-9.2.z-SNAPSHOT
17/12/30 21:44:22 INFO Server: Started @3284ms
17/12/30 21:44:22 INFO ServerConnector: Started ServerConnector@797501a{HTTP/1.1}{0.0.0.0:4040}
17/12/30 21:44:22 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1734f68{/jobs,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@31c269fd{/jobs/json,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@47747fb9{/jobs/job,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4e9658b5{/jobs/job/json,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@20312893{/stages,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@c41709a{/stages/json,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@54ec8cc9{/stages/stage,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1a6f5124{/stages/stage/json,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@ec2bf82{/stages/pool,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6cc0bcf6{/stages/pool/json,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@32f61a31{/storage,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@669253b7{/storage/json,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@51a06cbe{/storage/rdd,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@49a64d82{/storage/rdd/json,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@66d23e4a{/environment,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4d9d1b69{/environment/json,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@251f7d26{/executors,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@52d10fb8{/executors/json,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1fe8d51b{/executors/threadDump,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@22680f52{/executors/threadDump/json,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@39c11e6c{/static,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2b46a8c1{/,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@29caf222{/api,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@69c43e48{/jobs/job/kill,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@3a80515c{/stages/stage/kill,null,AVAILABLE,@Spark}
17/12/30 21:44:22 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.0.2:4040
17/12/30 21:44:22 INFO SparkContext: Added JAR file:/home/ubuntu/spark-bench_2.1.1_0.2.2-RELEASE/lib/spark-bench-2.1.1_0.2.2-RELEASE.jar at spark://10.0.0.2:52637/jars/spark-bench-2.1.1_0.2.2-RELEASE.jar with timestamp 1514670262631
17/12/30 21:44:24 INFO RMProxy: Connecting to ResourceManager at hadoop-b9c9c4da-8852-4731-92cb-ec26bebd9f5e.novalocal/10.0.0.2:8050
17/12/30 21:44:24 INFO Client: Requesting a new application from cluster with 3 NodeManagers
17/12/30 21:44:24 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
17/12/30 21:44:24 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
17/12/30 21:44:24 INFO Client: Setting up container launch context for our AM
17/12/30 21:44:24 INFO Client: Setting up the launch environment for our AM container
17/12/30 21:44:24 INFO Client: Preparing resources for our AM container
17/12/30 21:44:26 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs://hadoop-b9c9c4da-8852-4731-92cb-ec26bebd9f5e.novalocal:8020/hdp/apps/2.6.1.0-129/spark2/spark2-hdp-yarn-archive.tar.gz
17/12/30 21:44:26 INFO Client: Source and destination file systems are the same. Not copying hdfs://hadoop-b9c9c4da-8852-4731-92cb-ec26bebd9f5e.novalocal:8020/hdp/apps/2.6.1.0-129/spark2/spark2-hdp-yarn-archive.tar.gz
17/12/30 21:44:26 INFO Client: Uploading resource file:/tmp/spark-bbbba650-e4a6-4f05-a522-58295e254e83/__spark_conf__391304631160105821.zip -> hdfs://hadoop-b9c9c4da-8852-4731-92cb-ec26bebd9f5e.novalocal:8020/user/ubuntu/.sparkStaging/application_1513645674201_0052/__spark_conf__.zip
17/12/30 21:44:26 INFO SecurityManager: Changing view acls to: ubuntu
17/12/30 21:44:26 INFO SecurityManager: Changing modify acls to: ubuntu
17/12/30 21:44:26 INFO SecurityManager: Changing view acls groups to: 
17/12/30 21:44:26 INFO SecurityManager: Changing modify acls groups to: 
17/12/30 21:44:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ubuntu); groups with view permissions: Set(); users  with modify permissions: Set(ubuntu); groups with modify permissions: Set()
17/12/30 21:44:26 INFO Client: Submitting application application_1513645674201_0052 to ResourceManager
17/12/30 21:44:27 INFO YarnClientImpl: Submitted application application_1513645674201_0052
17/12/30 21:44:27 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1513645674201_0052 and attemptId None
17/12/30 21:44:28 INFO Client: Application report for application_1513645674201_0052 (state: ACCEPTED)
17/12/30 21:44:28 INFO Client: 
	 client token: N/A
	 diagnostics: AM container is launched, waiting for AM container to Register with RM
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1514670266800
	 final status: UNDEFINED
	 tracking URL: http://hadoop-b9c9c4da-8852-4731-92cb-ec26bebd9f5e.novalocal:8088/proxy/application_1513645674201_0052/
	 user: ubuntu
17/12/30 21:44:29 INFO Client: Application report for application_1513645674201_0052 (state: ACCEPTED)
17/12/30 21:44:30 INFO Client: Application report for application_1513645674201_0052 (state: ACCEPTED)
17/12/30 21:44:31 INFO Client: Application report for application_1513645674201_0052 (state: ACCEPTED)
17/12/30 21:44:32 INFO Client: Application report for application_1513645674201_0052 (state: ACCEPTED)
17/12/30 21:44:32 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
17/12/30 21:44:32 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop-b9c9c4da-8852-4731-92cb-ec26bebd9f5e.novalocal, PROXY_URI_BASES -> http://hadoop-b9c9c4da-8852-4731-92cb-ec26bebd9f5e.novalocal:8088/proxy/application_1513645674201_0052), /proxy/application_1513645674201_0052
17/12/30 21:44:32 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
17/12/30 21:44:33 INFO Client: Application report for application_1513645674201_0052 (state: RUNNING)
17/12/30 21:44:33 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 10.0.0.13
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1514670266800
	 final status: UNDEFINED
	 tracking URL: http://hadoop-b9c9c4da-8852-4731-92cb-ec26bebd9f5e.novalocal:8088/proxy/application_1513645674201_0052/
	 user: ubuntu
17/12/30 21:44:33 INFO YarnClientSchedulerBackend: Application application_1513645674201_0052 has started running.
17/12/30 21:44:33 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33888.
17/12/30 21:44:33 INFO NettyBlockTransferService: Server created on 10.0.0.2:33888
17/12/30 21:44:33 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/12/30 21:44:33 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.0.2, 33888, None)
17/12/30 21:44:33 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.0.2:33888 with 2004.6 MB RAM, BlockManagerId(driver, 10.0.0.2, 33888, None)
17/12/30 21:44:33 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.0.2, 33888, None)
17/12/30 21:44:33 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.0.2, 33888, None)
17/12/30 21:44:33 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@666618d6{/metrics/json,null,AVAILABLE,@Spark}
17/12/30 21:44:33 INFO EventLoggingListener: Logging events to hdfs:///spark2-history/application_1513645674201_0052
17/12/30 21:44:37 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.0.0.13:39702) with ID 1
17/12/30 21:44:37 INFO BlockManagerMasterEndpoint: Registering block manager hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal:42603 with 5.2 GB RAM, BlockManagerId(1, hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal, 42603, None)
17/12/30 21:44:38 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.0.0.14:42956) with ID 2
17/12/30 21:44:38 INFO BlockManagerMasterEndpoint: Registering block manager hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal:36659 with 5.2 GB RAM, BlockManagerId(2, hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal, 36659, None)
17/12/30 21:44:52 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
17/12/30 21:44:52 INFO SharedState: Warehouse path is 'file:/home/ubuntu/spark-bench_2.1.1_0.2.2-RELEASE/spark-warehouse/'.
17/12/30 21:44:52 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1f0e987b{/SQL,null,AVAILABLE,@Spark}
17/12/30 21:44:52 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7707089b{/SQL/json,null,AVAILABLE,@Spark}
17/12/30 21:44:52 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4d25a6b8{/SQL/execution,null,AVAILABLE,@Spark}
17/12/30 21:44:52 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5b62354c{/SQL/execution/json,null,AVAILABLE,@Spark}
17/12/30 21:44:52 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@65536160{/static/sql,null,AVAILABLE,@Spark}
17/12/30 21:44:53 INFO SparkContext: Starting job: parquet at SparkFuncs.scala:54
17/12/30 21:44:53 INFO DAGScheduler: Got job 0 (parquet at SparkFuncs.scala:54) with 1 output partitions
17/12/30 21:44:53 INFO DAGScheduler: Final stage: ResultStage 0 (parquet at SparkFuncs.scala:54)
17/12/30 21:44:53 INFO DAGScheduler: Parents of final stage: List()
17/12/30 21:44:53 INFO DAGScheduler: Missing parents: List()
17/12/30 21:44:54 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at parquet at SparkFuncs.scala:54), which has no missing parents
17/12/30 21:44:54 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 89.0 KB, free 2004.5 MB)
17/12/30 21:44:54 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.5 KB, free 2004.5 MB)
17/12/30 21:44:54 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.0.2:33888 (size: 33.5 KB, free: 2004.6 MB)
17/12/30 21:44:54 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996
17/12/30 21:44:54 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at parquet at SparkFuncs.scala:54)
17/12/30 21:44:54 INFO YarnScheduler: Adding task set 0.0 with 1 tasks
17/12/30 21:44:54 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal, executor 1, partition 0, PROCESS_LOCAL, 6288 bytes)
17/12/30 21:44:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal:42603 (size: 33.5 KB, free: 5.2 GB)
17/12/30 21:44:58 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 4266 ms on hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal (executor 1) (1/1)
17/12/30 21:44:58 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/12/30 21:44:58 INFO DAGScheduler: ResultStage 0 (parquet at SparkFuncs.scala:54) finished in 4.287 s
17/12/30 21:44:58 INFO DAGScheduler: Job 0 finished: parquet at SparkFuncs.scala:54, took 4.753065 s
17/12/30 21:44:59 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 351.9 KB, free 2004.1 MB)
17/12/30 21:44:59 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 30.8 KB, free 2004.1 MB)
17/12/30 21:44:59 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.0.0.2:33888 (size: 30.8 KB, free: 2004.5 MB)
17/12/30 21:44:59 INFO SparkContext: Created broadcast 1 from textFile at LogisticRegressionWorkload.scala:70
17/12/30 21:45:00 INFO ContextCleaner: Cleaned accumulator 49
17/12/30 21:45:00 INFO ContextCleaner: Cleaned accumulator 48
17/12/30 21:45:00 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 10.0.0.2:33888 in memory (size: 33.5 KB, free: 2004.6 MB)
17/12/30 21:45:00 INFO BlockManagerInfo: Removed broadcast_0_piece0 on hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal:42603 in memory (size: 33.5 KB, free: 5.2 GB)
17/12/30 21:45:01 INFO CodeGenerator: Code generated in 463.027923 ms
17/12/30 21:45:01 INFO FileInputFormat: Total input paths to process : 1
17/12/30 21:45:01 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 352.0 KB, free 2003.9 MB)
17/12/30 21:45:01 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 30.8 KB, free 2003.9 MB)
17/12/30 21:45:01 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.0.0.2:33888 (size: 30.8 KB, free: 2004.5 MB)
17/12/30 21:45:01 INFO SparkContext: Created broadcast 2 from textFile at LogisticRegressionWorkload.scala:70
17/12/30 21:45:01 INFO FileInputFormat: Total input paths to process : 1
17/12/30 21:45:02 INFO CodeGenerator: Code generated in 27.93171 ms
17/12/30 21:45:02 INFO CodeGenerator: Code generated in 21.46953 ms
17/12/30 21:45:02 INFO SparkContext: Starting job: count at LogisticRegressionWorkload.scala:89
17/12/30 21:45:02 INFO DAGScheduler: Registering RDD 7 (cache at LogisticRegressionWorkload.scala:81)
17/12/30 21:45:02 INFO DAGScheduler: Registering RDD 20 (count at LogisticRegressionWorkload.scala:89)
17/12/30 21:45:02 INFO DAGScheduler: Got job 1 (count at LogisticRegressionWorkload.scala:89) with 1 output partitions
17/12/30 21:45:02 INFO DAGScheduler: Final stage: ResultStage 3 (count at LogisticRegressionWorkload.scala:89)
17/12/30 21:45:02 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 2)
17/12/30 21:45:02 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2)
17/12/30 21:45:02 INFO DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[7] at cache at LogisticRegressionWorkload.scala:81), which has no missing parents
17/12/30 21:45:02 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 17.2 KB, free 2003.8 MB)
17/12/30 21:45:02 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 7.1 KB, free 2003.8 MB)
17/12/30 21:45:02 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.0.0.2:33888 (size: 7.1 KB, free: 2004.5 MB)
17/12/30 21:45:02 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:996
17/12/30 21:45:02 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[7] at cache at LogisticRegressionWorkload.scala:81)
17/12/30 21:45:02 INFO YarnScheduler: Adding task set 1.0 with 2 tasks
17/12/30 21:45:02 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal, executor 2, partition 0, NODE_LOCAL, 6201 bytes)
17/12/30 21:45:02 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal, executor 1, partition 1, NODE_LOCAL, 6201 bytes)
17/12/30 21:45:02 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal:42603 (size: 7.1 KB, free: 5.2 GB)
17/12/30 21:45:02 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal:42603 (size: 30.8 KB, free: 5.2 GB)
17/12/30 21:45:03 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal, executor 1): java.lang.NumberFormatException: For input string: "%labelfeature%type%%size5indices5list%element5values5list"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
	at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
	at java.lang.Double.parseDouble(Double.java:538)
	at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
	at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:72)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:72)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:72)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:71)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

17/12/30 21:45:03 INFO TaskSetManager: Starting task 1.1 in stage 1.0 (TID 3, hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal, executor 1, partition 1, NODE_LOCAL, 6201 bytes)
17/12/30 21:45:03 INFO TaskSetManager: Lost task 1.1 in stage 1.0 (TID 3) on hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal, executor 1: java.lang.NumberFormatException (For input string: "%labelfeature%type%%size5indices5list%element5values5list") [duplicate 1]
17/12/30 21:45:03 INFO TaskSetManager: Starting task 1.2 in stage 1.0 (TID 4, hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal, executor 2, partition 1, NODE_LOCAL, 6201 bytes)
17/12/30 21:45:03 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal:36659 (size: 7.1 KB, free: 5.2 GB)
17/12/30 21:45:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal:36659 (size: 30.8 KB, free: 5.2 GB)
17/12/30 21:45:06 INFO TaskSetManager: Lost task 1.2 in stage 1.0 (TID 4) on hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal, executor 2: java.lang.NumberFormatException (For input string: "%labelfeature%type%%size5indices5list%element5values5list") [duplicate 2]
17/12/30 21:45:06 INFO TaskSetManager: Starting task 1.3 in stage 1.0 (TID 5, hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal, executor 2, partition 1, NODE_LOCAL, 6201 bytes)
17/12/30 21:45:06 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal, executor 2): java.lang.NumberFormatException: For input string: "PAR1��"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
	at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
	at java.lang.Double.parseDouble(Double.java:538)
	at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
	at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:72)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:72)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:72)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:71)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

17/12/30 21:45:06 INFO TaskSetManager: Starting task 0.1 in stage 1.0 (TID 6, hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal, executor 1, partition 0, NODE_LOCAL, 6201 bytes)
17/12/30 21:45:06 INFO TaskSetManager: Lost task 0.1 in stage 1.0 (TID 6) on hadoop-becd9f60-1605-4492-bf15-2e928fbfca2e.novalocal, executor 1: java.lang.NumberFormatException (For input string: "PAR1��") [duplicate 1]
17/12/30 21:45:06 INFO TaskSetManager: Starting task 0.2 in stage 1.0 (TID 7, hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal, executor 2, partition 0, NODE_LOCAL, 6201 bytes)
17/12/30 21:45:06 INFO TaskSetManager: Lost task 1.3 in stage 1.0 (TID 5) on hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal, executor 2: java.lang.NumberFormatException (For input string: "%labelfeature%type%%size5indices5list%element5values5list") [duplicate 3]
17/12/30 21:45:06 ERROR TaskSetManager: Task 1 in stage 1.0 failed 4 times; aborting job
17/12/30 21:45:06 INFO YarnScheduler: Cancelling stage 1
17/12/30 21:45:06 INFO YarnScheduler: Stage 1 was cancelled
17/12/30 21:45:06 INFO DAGScheduler: ShuffleMapStage 1 (cache at LogisticRegressionWorkload.scala:81) failed in 4.406 s due to Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 5, hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal, executor 2): java.lang.NumberFormatException: For input string: "%labelfeature%type%%size5indices5list%element5values5list"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
	at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
	at java.lang.Double.parseDouble(Double.java:538)
	at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
	at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:72)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:72)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:72)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:71)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
17/12/30 21:45:06 INFO DAGScheduler: Job 1 failed: count at LogisticRegressionWorkload.scala:89, took 4.504117 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 5, hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal, executor 2): java.lang.NumberFormatException: For input string: "%labelfeature%type%%size5indices5list%element5values5list"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
	at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
	at java.lang.Double.parseDouble(Double.java:538)
	at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
	at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:72)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:72)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:72)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:71)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
	at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
	at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
	at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2420)
	at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2419)
	at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
	at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$3.apply(LogisticRegressionWorkload.scala:89)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$3.apply(LogisticRegressionWorkload.scala:89)
	at com.ibm.sparktc.sparkbench.utils.GeneralFunctions$.time(GeneralFunctions.scala:67)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload.doWorkload(LogisticRegressionWorkload.scala:89)
	at com.ibm.sparktc.sparkbench.workload.Workload$class.run(Workload.scala:55)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload.run(LogisticRegressionWorkload.scala:60)
	at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runParallel$1.apply(SuiteKickoff.scala:91)
	at com.ibm.sparktc.sparkbench.workload.SuiteKickoff$$anonfun$com$ibm$sparktc$sparkbench$workload$SuiteKickoff$$runParallel$1.apply(SuiteKickoff.scala:91)
	at scala.collection.parallel.AugmentedIterableIterator$class.map2combiner(RemainsIterator.scala:115)
	at scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:62)
	at scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1054)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
	at scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1051)
	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
	at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinTask.doJoin(ForkJoinTask.java:341)
	at scala.concurrent.forkjoin.ForkJoinTask.join(ForkJoinTask.java:673)
	at scala.collection.parallel.ForkJoinTasks$WrappedTask$class.sync(Tasks.scala:378)
	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:443)
	at scala.collection.parallel.ForkJoinTasks$class.executeAndWaitResult(Tasks.scala:426)
	at scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:56)
	at scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:958)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
	at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
	at scala.collection.parallel.ParIterableLike$ResultMapping.tryLeaf(ParIterableLike.scala:953)
	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
	at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.NumberFormatException: For input string: "%labelfeature%type%%size5indices5list%element5values5list"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
	at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
	at java.lang.Double.parseDouble(Double.java:538)
	at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:284)
	at scala.collection.immutable.StringOps.toDouble(StringOps.scala:29)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:72)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1$$anonfun$2.apply(LogisticRegressionWorkload.scala:72)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:72)
	at com.ibm.sparktc.sparkbench.workload.ml.LogisticRegressionWorkload$$anonfun$load$1.apply(LogisticRegressionWorkload.scala:71)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17/12/30 21:45:06 INFO TaskSetManager: Lost task 0.2 in stage 1.0 (TID 7) on hadoop-40e1e9a7-499f-4e75-90fa-b8cccc465b3d.novalocal, executor 2: java.lang.NumberFormatException (For input string: "PAR1��") [duplicate 2]
17/12/30 21:45:06 INFO YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 
17/12/30 21:45:06 INFO SparkContext: Invoking stop() from shutdown hook
17/12/30 21:45:06 INFO ServerConnector: Stopped Spark@797501a{HTTP/1.1}{0.0.0.0:4040}
17/12/30 21:45:06 INFO SparkUI: Stopped Spark web UI at http://10.0.0.2:4040
17/12/30 21:45:06 INFO YarnClientSchedulerBackend: Interrupting monitor thread
17/12/30 21:45:06 INFO YarnClientSchedulerBackend: Shutting down all executors
17/12/30 21:45:06 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
17/12/30 21:45:06 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
17/12/30 21:45:06 INFO YarnClientSchedulerBackend: Stopped
17/12/30 21:45:06 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/12/30 21:45:06 INFO MemoryStore: MemoryStore cleared
17/12/30 21:45:06 INFO BlockManager: BlockManager stopped
17/12/30 21:45:06 INFO BlockManagerMaster: BlockManagerMaster stopped
17/12/30 21:45:06 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/12/30 21:45:06 INFO SparkContext: Successfully stopped SparkContext
17/12/30 21:45:06 INFO ShutdownHookManager: Shutdown hook called
17/12/30 21:45:06 INFO ShutdownHookManager: Deleting directory /tmp/spark-bbbba650-e4a6-4f05-a522-58295e254e83
Exception in thread "main" java.lang.Exception: spark-submit failed to complete properly given these arguments: 
	--class com.ibm.sparktc.sparkbench.cli.CLIKickoff --master yarn /home/ubuntu/spark-bench_2.1.1_0.2.2-RELEASE/lib/spark-bench-2.1.1_0.2.2-RELEASE.jar {"spark-bench":{"spark-submit-config":[{"workload-suites":[{"benchmark-output":"console","descr":"lr","parallel":true,"repeat":1,"workloads":[{"input":"/tmp/lr-small.parquet/part-00000-1c1b2d41-6590-4938-ad2e-07005310f75b.snappy.parquet","name":"lr-bml","output":"/tmp/lr-results-small.csv","testfile":"/tmp/lr-small-test.parquet/part-00000-62d3a63c-58fd-4de8-89a5-ce2cf1c1f298.snappy.parquet"}]}]}]}}
	at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch$.launch(SparkLaunch.scala:65)
	at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch$$anonfun$launchSparkSubmitScripts$2.apply(SparkLaunch.scala:57)
	at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch$$anonfun$launchSparkSubmitScripts$2.apply(SparkLaunch.scala:57)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch$.launchSparkSubmitScripts(SparkLaunch.scala:57)
	at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch$.main(SparkLaunch.scala:34)
	at com.ibm.sparktc.sparkbench.sparklaunch.SparkLaunch.main(SparkLaunch.scala)
@ecurtin
Copy link
Contributor

ecurtin commented Jan 2, 2018

Hi @akasaki, I updated your comment to have backticks ``` around the plaintext and I'm looking into your issue now.

@ecurtin
Copy link
Contributor

ecurtin commented Jan 10, 2018

Hi @akasaki just an update to let you know that I'm still working on this.

At a very high level summary there's basically two things going on. One is that there's some issues with your config file. The second, however, is a much bigger issue on the Spark-Bench side. Long story short, there's no linear regression workload available for you to actually run, yet.

Based on your ticket I revised #83 and am close to a PR for it. You can see my progress here: https://github.com/ecurtin/spark-bench/tree/linear-regression-workload although there is additional progress that hasn't been pushed up yet.

@akasaki
Copy link
Author

akasaki commented Jan 11, 2018

@ecurtin I see. Thank you for taking time. I was trying to run logistic regression and failed (Typo: I said 'linear regression' in the title). I thought the dataset generated by linear regression data generator can also be used for logistic regression, but it seems like the datasets for these two workloads are totally different. From my understanding, the logistic regression data generator hasn't been ported to this new version. Am I right?

@ecurtin
Copy link
Contributor

ecurtin commented Jan 11, 2018

That's correct. It's a long, convoluted story but the result is that at the current moment we have the linear regression generator and the logistic regression workload, which is nuts, I know. The linear regression workload is first on the list but the logistic regression generator will be coming at some point!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants