Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executing Dynamometer workload #104

Open
novosibman opened this issue Oct 18, 2019 · 4 comments
Open

Executing Dynamometer workload #104

novosibman opened this issue Oct 18, 2019 · 4 comments

Comments

@novosibman
Copy link

Hi,

I collected all prerequisites (fsimage, audit log) and prepared local environment (accompanying hdfs, separate yarn manager) according to Dynamometer readme and tried to start workload scripts. Tried Hadoop versions: 2.7.4 and 2.8.4.

${DYN_HOME}/bin/upload-fsimage.sh 0894 ${HDFS_PATH}/fsimage \
    ${BASE_DIR}/fsimage-${HADOOP_VERSION} 

fsimage - passed

${DYN_HOME}/bin/generate-block-lists.sh \
    -fsimage_input_path ${HDFS_PATH}/fsimage/fsimage_0000000000000000894.xml \
    -block_image_output_dir ${HDFS_PATH}/blocks \
    -num_reducers 10 -num_datanodes 3 

generate-block-lists - passed

${DYN_HOME}/bin/start-dynamometer-cluster.sh "" \
    -hadoop_binary_path file://${BASE_DIR}/hadoop-${HADOOP_VERSION}.tar.gz \
    -conf_path file://${BASE_DIR}/conf.zip \
    -fs_image_dir ${HDFS_PATH}/fsimage \
    -block_list_path ${HDFS_PATH}/blocks

start-dynamometer-cluster: looks working according to output:
...
19/10/18 03:56:56 INFO dynamometer.Client: NameNode has started!
19/10/18 03:56:56 INFO dynamometer.Client: Waiting for 2 DataNodes to register with the NameNode...
19/10/18 03:57:02 INFO dynamometer.Client: Number of live DataNodes = 2.00; above threshold of 2.00; done waiting after 6017 ms.
19/10/18 03:57:02 INFO dynamometer.Client: Waiting for MissingBlocks to fall below 0.010199999...
19/10/18 03:57:02 INFO dynamometer.Client: Number of missing blocks: 102.00
19/10/18 04:00:03 INFO dynamometer.Client: Number of missing blocks = 0.00; below threshold of 0.01; done waiting after 180082 ms.
19/10/18 04:00:03 INFO dynamometer.Client: Waiting for UnderReplicatedBlocks to fall below 1.02...
19/10/18 04:00:03 INFO dynamometer.Client: Number of under replicated blocks: 102.00

${DYN_HOME}/bin/start-workload.sh \
    -Dauditreplay.log-start-time.ms=1000 \
    -Dauditreplay.input-path=file://${BASE_DIR}/audit_logs-${HADOOP_VERSION} \
    -Dauditreplay.output-path=${RESULTS_DIR} \
    -Dauditreplay.num-threads=1 \
    -nn_uri hdfs://$HOSTNAME:9000/ \
    -start_time_offset 1m \
    -mapper_class_name AuditReplayMapper

start-workload - it started and never finish during couple of hours repeating 'map > map':
19/10/18 04:07:53 INFO workloadgenerator.WorkloadDriver: The workload will start at 1571396933516 ms (2019/10/18 04:08:53 PDT)
19/10/18 04:07:54 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/10/18 04:07:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/10/18 04:07:55 INFO input.FileInputFormat: Total input files to process : 1
19/10/18 04:07:55 INFO mapreduce.JobSubmitter: number of splits:1
19/10/18 04:07:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local579807884_0001
19/10/18 04:07:55 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/10/18 04:07:55 INFO mapreduce.Job: Running job: job_local579807884_0001
19/10/18 04:07:55 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/10/18 04:07:55 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/10/18 04:07:55 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/10/18 04:07:55 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
19/10/18 04:07:55 INFO mapred.LocalJobRunner: Waiting for map tasks
19/10/18 04:07:55 INFO mapred.LocalJobRunner: Starting task: attempt_local579807884_0001_m_000000_0
19/10/18 04:07:55 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/10/18 04:07:55 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
19/10/18 04:07:55 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/10/18 04:07:55 INFO mapred.MapTask: Processing split: file:/home/rscherba/ws/hadoop/dynamometer-test/audit_logs-2.8.4/hdfs-audit.log:0+251649
19/10/18 04:07:55 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/10/18 04:07:55 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/10/18 04:07:55 INFO mapred.MapTask: soft limit at 83886080
19/10/18 04:07:55 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/10/18 04:07:55 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/10/18 04:07:55 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/10/18 04:07:55 INFO audit.AuditReplayMapper: Starting 1 threads
19/10/18 04:07:55 INFO audit.AuditReplayThread: Start timestamp: 1571396933516
19/10/18 04:07:55 INFO audit.AuditReplayThread: Sleeping for 57526 ms
19/10/18 04:07:56 INFO mapreduce.Job: Job job_local579807884_0001 running in uber mode : false
19/10/18 04:07:56 INFO mapreduce.Job: map 0% reduce 0%
19/10/18 04:08:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:13:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:18:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:23:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:28:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:33:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:38:07 INFO mapred.LocalJobRunner: map > map
19/10/18 04:43:07 INFO mapred.LocalJobRunner: map > map
...

How long Dynamometer workload should work? How run script arguments can affect test run? How to check in logs is there is something wrong in the configuration?

@xkrogen
Copy link
Collaborator

xkrogen commented Oct 18, 2019

Hi @novosibman , thanks for the detailed information!

19/10/18 04:00:03 INFO dynamometer.Client: Waiting for UnderReplicatedBlocks to fall below 1.02...
19/10/18 04:00:03 INFO dynamometer.Client: Number of under replicated blocks: 102.00

Did the number of under replicated blocks ever fall? The start-dynamometer-cluster.sh will wait for this.

Can you also confirm that if you use an HDFS client (say, hdfs dfs) against hdfs://$HOSTNAME:9000/, it works as expected? For one thing, I don't believe 9000 is the correct port number. It will be chosen dynamically; you should look at the NameNode's generated properties file to know what the correct port is. If you run the workload using the start-dynamometer-cluster.sh script, it will handle this automatically.

19/10/18 04:07:55 INFO mapred.LocalJobRunner: Waiting for map tasks
19/10/18 04:07:55 INFO mapred.LocalJobRunner: Starting task: attempt_local579807884_0001_m_000000_0

It looks like your MR job is running locally rather than on a YARN cluster, is that what you intended?

Can you please try visiting the logs for your MR job and seeing if there is any useful information contained within them?
19/10/18 04:07:55 INFO mapreduce.Job: The url to track the job: http://localhost:8080/

@novosibman
Copy link
Author

novosibman commented Oct 22, 2019

Hi Erik, thanks for quick answer!

You are right 9000 is not correct port for Dyno-HDFS namenode. I use 9000 for fs.defaultFS/fs.default.name in conf for starting accompanying HDFS instance and automatically included it in nn_uri param. I corrected it.

As for accompanying YARN cluster the general configuration looks as following, all components are started out-of-the-box using same hadoop-x.y.z version as in Dynamometer scripts:

  • host1: accompanying HDFS (namenode, datanode) for storing fsimage and blocks
  • host2: accompanying YARN (resourcemanager, nodemanager),
    $ upload-fsimage to hdfs:///host1/fsimage
    $ generate-block-lists
    $ start-dynamometer-cluster:
    19/10/22 11:15:07 INFO dynamometer.Client: Submitting application to RM
    19/10/22 11:15:07 INFO impl.YarnClientImpl: Submitted application application_1571768022694_0001
    19/10/22 11:15:08 INFO dynamometer.Client: Track the application at: http://host2:8088/proxy/application_1571768022694_0001/
    19/10/22 11:15:08 INFO dynamometer.Client: Kill the application using: yarn application -kill application_1571768022694_0001
    19/10/22 11:15:34 INFO dynamometer.Client: NameNode can be reached via HDFS at: hdfs://host2:9002/
    19/10/22 11:15:34 INFO dynamometer.Client: NameNode web UI available at: http://host2:50077/
    19/10/22 11:15:34 INFO dynamometer.Client: NameNode can be tracked at: http://host2:8042/node/containerlogs/container_1571768022694_0001_01_000002/rscherba/
    19/10/22 11:15:34 INFO dynamometer.Client: Waiting for NameNode to finish starting up...
    19/10/22 11:16:37 INFO dynamometer.Client: Startup progress = 1.00; above threshold of 1.00; done waiting after 63132 ms.
    19/10/22 11:16:37 INFO dynamometer.Client: NameNode has started!
    19/10/22 11:16:37 INFO dynamometer.Client: Waiting for 2 DataNodes to register with the NameNode...
    19/10/22 11:16:37 INFO dynamometer.Client: Number of live DataNodes = 2.00; above threshold of 2.00; done waiting after 33 ms.
    19/10/22 11:16:37 INFO dynamometer.Client: Waiting for MissingBlocks to fall below 0.010199999...
    19/10/22 11:16:37 INFO dynamometer.Client: Number of missing blocks: 102.00
    19/10/22 11:16:40 INFO dynamometer.Client: Number of missing blocks = 0.00; below threshold of 0.01; done waiting after 3004 ms.
    19/10/22 11:16:40 INFO dynamometer.Client: Waiting for UnderReplicatedBlocks to fall below 1.02...
    19/10/22 11:16:40 INFO dynamometer.Client: Number of under replicated blocks: 102.00
    19/10/22 11:16:46 INFO dynamometer.Client: Number of under replicated blocks: 98.00
    19/10/22 11:16:49 INFO dynamometer.Client: Number of under replicated blocks = 1.00; below threshold of 1.02; done waiting after 9012 ms.
    19/10/22 11:16:49 INFO dynamometer.Client: NameNode is ready for use!

    -- suppose this log indicates that Dyno-HDFS cluster is OK for usage and 9002 is correct port for nn_uri, also DynamometerTest YARN application entity was created inside accompanying YARN
    -- hdfs dfs -ls hdfs://host2:9002/ shows simulated file structure
    $ start-workload:
    19/10/22 11:43:29 INFO mapreduce.JobSubmitter: number of splits:1
    19/10/22 11:43:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local415330183_0001
    19/10/22 11:43:29 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
    19/10/22 11:43:29 INFO mapreduce.Job: Running job: job_local415330183_0001
    ...
    19/10/22 11:58:41 INFO mapred.LocalJobRunner: map > map
    ...

    -- still uses local job, localhost:8080 is not accessible, is it a root cause for never ending workload job? Will it help to start YARN on separate host3 and how Dynamometer will find it (which args should be used)?

@xkrogen
Copy link
Collaborator

xkrogen commented Oct 25, 2019

Great, so it looks like the Dyno-HDFS cluster is working as expected.

Your workload MapReduce job is not actually leveraging YARN at all. Is your mapreduce.framework.name configured to be yarn?

@nshetty0202
Copy link

running into the same issue here - logs indicate the job is running locally instead of yarn..
mapreduce.framework.name is set to yarn in /opt/yarn/conf/mapred-site.xml

    <name>mapreduce.framework.name</name>
    <value>yarn</value>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants