Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding delay to some jobs #140

Closed
Aalnafessah opened this issue Jan 9, 2018 · 7 comments
Closed

Adding delay to some jobs #140

Aalnafessah opened this issue Jan 9, 2018 · 7 comments

Comments

@Aalnafessah
Copy link

Aalnafessah commented Jan 9, 2018

Spark-Bench Version

spark-bench-launch-2.1.1_0.2.2-RELEASE

Spark Version on Your Cluster

Spark.2.2.1

Scala Version on Your Spark Cluster

Scala version 2.11.8
Spark Cluster: Spark Standalone ( 1 master and 2 slaves).

Your Exact Configuration File (with system details anonymized)

spark-bench = {
  spark-submit-config = [{
    spark-args = {
      master = "spark://XX.XX.XX.XX:7077"
    }
    workload-suites = [
    {
      descr = "***  KMeans Workload *******"
      benchmark-output = "console"
      workloads = [
        {

         name = "kmeans"
         input = "/tmp/kmeans-data.csv"
         k = 10
        }
      ]
    }
    ]
  }]
}

Description of Problem, Any Other Info

I have run the SparkBench using KMeans Workload. Using the Spark dashboard i got the following jobs:
screen shot 2018-01-09 at 4 54 08 pm

I am studying the Spark performance. I would like to add some delay inside some jobs ( e.g. adding delay inside job1 5 second). Is there any way to add this delay.

@ecurtin
Copy link
Contributor

ecurtin commented Jan 10, 2018

Hmm, to my knowledge there is not. Spark-Bench sits at the level of a Spark Application, so it has no direct way to access and manipulate the internal jobs described in the picture.

One thing that may or may not help depending on your use case is a SparkListener or tracer. These can be used in conjunction with Spark-Bench but they target deeper internal stuff. One of my colleagues built a configurable Spark tracer: https://github.com/SparkTC/spark-tracing and there are many others out there. #113 is from a user who tied in a tracer with Spark-Bench.

Wish I could be of more help!

@ecurtin
Copy link
Contributor

ecurtin commented Jan 10, 2018

Hi @Aalnafessah I saw you commented on #139 but I think your comment was addressing this issue.

If you want to add delay to the csv load and write functions, those are here in the Spark-Bench codebase.

If you're trying to add delay between tasks such as sum() and collectAsMap() that's much more complicated because those functions are internal to the Spark Kmeans class and often materialized only when an action is called because of lazy evaluation. From the level of Spark-Bench there's not a way to reach down from the KMeans training statement here into those internal functions. A tracer like Spark-Tracing may get you some profiling info on those functions but it won't help you to modify the code to add delays.

The only way I could see do this is by checking out Spark, inserting your delay statements to the KMeans class, and then re-compiling and running Spark-Bench against your re-compiled version of Spark by changing the spark-home parameter.

@Aalnafessah
Copy link
Author

Thanks @ecurtin.
Regarding the profile info about tasks, i am able to get them from parsing the Spark log file from spark history server.

The only way I could see do this is by checking out Spark, inserting your delay statements to the KMeans class, and then re-compiling and running Spark-Bench against your re-compiled version of Spark by changing the spark-home parameter.

Do you mean that i have to reinstall Apache Spark then recompile SparkBench ?

@ecurtin
Copy link
Contributor

ecurtin commented Jan 19, 2018

I mean that you would need to edit the code inside of Spark itself and recompile Spark. You can use Spark-Bench as-is and just change the spark-home parameter to point to your new version of Spark with your custom changes.

@Aalnafessah
Copy link
Author

I got it. I will try your solution then I will let you know about the update.
Many thanks @ecurtin

@ecurtin
Copy link
Contributor

ecurtin commented Jan 19, 2018

Cool, I'm gonna close this issue, feel free to reopen if you have more questions!

@ecurtin ecurtin closed this as completed Jan 19, 2018
@xiandong79
Copy link

@Aalnafessah hi~ Did you succeed in adding delay for some spark-jobs? And how?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants