EPIC: Legacy Workload Ports #79

ecurtin · 2017-09-20T15:03:48Z

Port all workloads available in legacy version to new version.

akasaki · 2018-01-02T23:01:17Z

Hello @ecurtin , I am wondering if the legacy version can be compatible with Spark 2.2. I need more workloads for my thesis experiment.

BTW, thank you so much for taking time to answer all my questions!

ecurtin · 2018-01-03T18:33:07Z

@akasaki It depends on what you mean by compatible. Both versions have data generators that output data to disk and workloads that pick up that data and do stuff with it, but they are entirely different code bases. You're totally welcome to try to the legacy version if you think it might suit your needs better, just keep in mind that it is unsupported.

Are there any workloads in particular that are high priority for you?

akasaki · 2018-01-03T19:27:13Z

@ecurtin I focus on the tuning algorithm based on three types of workloads. The journal (Li M, Tan J, Wang Y, Zhang L, Salapura V. SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics. Cluster Computing. 2017:1-5.) classified all workloads into three types, memory intensive, shuffle intensive and all intensive. In the current version, SQL workload is shuffle intensive, and linear regression is memory intensive although linear regression doesn't work under my environment (Issue #134).

I suppose K-means is also memory intensive, isn't it?

I need one or more all-intensive workloads such as MF and SVD++. I am trying to setup the legacy version.

ecurtin · 2018-01-03T20:20:19Z

SparkPi is included in the current version of Spark-Bench. It's extremely compute-intensive (when used with large parameters) while hardly making use of I/O at all. Basically it computes an approximate value of Pi in a deliberately inefficient manner: https://sparktc.github.io/spark-bench/workloads/sparkpi/

akasaki · 2018-01-04T01:49:21Z

@ecurtin I see. I have tried it as the first example, but it doesn't have any shuffle operation. I am looking for some all-intensive (both shuffle intensive and memory intensive) workloads which consume both I/O and memory.

ecurtin added the Type: Zenhub Epic label Sep 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: Legacy Workload Ports #79

EPIC: Legacy Workload Ports #79

ecurtin commented Sep 20, 2017

akasaki commented Jan 2, 2018

ecurtin commented Jan 3, 2018

akasaki commented Jan 3, 2018

ecurtin commented Jan 3, 2018

akasaki commented Jan 4, 2018

EPIC: Legacy Workload Ports #79

EPIC: Legacy Workload Ports #79

Comments

ecurtin commented Sep 20, 2017

akasaki commented Jan 2, 2018

ecurtin commented Jan 3, 2018

akasaki commented Jan 3, 2018

ecurtin commented Jan 3, 2018

akasaki commented Jan 4, 2018