swineherd capture

Jump to bottom Edit New page

Philip (flip) Kromer edited this page May 9, 2012 · 1 revision

swineherd/capture -- turn job output into structured metrics

Logging:

All output from the launched workflow should go to a workflow log file
Hadoop output is special and should be pulled down from the jobtracker
- jobconf.xml
- job details page

Workflow should specify a logdir, defualts to workdir + '/logs'

Fetching hadoop job stats:

Get job id
Use curl to fetch the latest logs listing: "http://jobtracker:50030/logs/history/"
Parse the logs listing and pull out the two urls we want (something-jobid.xml, something-jobid....)
Fetch the two urls we care about and dump into the workflow's log dir.
Possibly parse the results into an ongoing workflow-statistics.tsv file

Other output:

Output that would otherwise go to the terminal (nohup.out or some such) should be collected and dumped into the logdir as well.