Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump output from bhist -l <lsfjobid>` to runpath #7694

Closed
berland opened this issue Apr 18, 2024 · 7 comments · Fixed by #7794
Closed

Dump output from bhist -l <lsfjobid>` to runpath #7694

berland opened this issue Apr 18, 2024 · 7 comments · Fixed by #7794
Assignees

Comments

@berland
Copy link
Contributor

berland commented Apr 18, 2024

Is your feature request related to a problem? Please describe.
The output from bhist -l <jobid> on finished LSF jobs is too interesting not to leave easily accessible, and should be dumped to the runpath:

Job <289263>, User <havb>, Project <default>, Command <sleep 10>
Thu Apr 18 10:00:40: Submitted from host <st-grid03>, to Queue <normal>, CWD <$
                     HOME>;
Thu Apr 18 10:01:28: Dispatched to <st-rst14-03-05>, Effective RES_REQ <select[
                     (cs)&&(type == any )&&(mem>maxmem*1/12)] order[r15s:pg:bjo
                     bs] span[hosts=1] same[model] >;
Thu Apr 18 10:01:28: Starting (Pid 658);
Thu Apr 18 10:01:28: Running with execution home </private/havb>, Execution CWD
                      </private/havb>, Execution Pid <658>;
Thu Apr 18 10:01:38: Done successfully. The CPU time used is 0.1 seconds; 
Thu Apr 18 10:02:01: Post job process done successfully;

MEMORY USAGE:
MAX MEM: 3.9 Gbytes;  AVG MEM: 3.9 Gbytes

Summary of time in seconds spent in various states by  Thu Apr 18 10:02:01
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  48	   0        10       0        0        0        58

Describe the solution you'd like
Dump the output to some filename.

Describe alternatives you've considered*
Do nothing.

@berland
Copy link
Contributor Author

berland commented Apr 18, 2024

The LSF stdout might be sufficient though, but must be fixed in #7695. Examine if there are differences when OOM strikes f.ex.

@jonathan-eq jonathan-eq self-assigned this Apr 29, 2024
@jonathan-eq
Copy link
Contributor

What should the filename be? @berland
I have some ideas:

  • bhist_job_summary.txt
  • lsf_job_summary.txt
  • job_summary.txt (would not be created for other queue systems than lsf anyways)

@jonathan-eq
Copy link
Contributor

jonathan-eq commented Apr 29, 2024

The one on the left is lsf stdout while the right one is the bhist long version.
image

@berland
Copy link
Contributor Author

berland commented Apr 30, 2024

As for filename, we already have <JOBNAME>.LSF-out for stdout, and we might get <JOBNAME>.LSF-err for stderr (that is a potential issue to write). To be in line with that system, what about <JOBNAME>.LSF-bhist-l ?

@jonathan-eq
Copy link
Contributor

The lsf stdout already provides all the information found in bhist -l, so echoing the output to a file wouldn't give us anything extra.

@jonathan-eq
Copy link
Contributor

One field that is not included in lsf stdout is Dispatched to <cluster_node>, Effective RES_REQ <select[(cs)&&(type==any)>.
Maybe getting the resource requirement string would be reason enough to keep the output? @berland

@berland
Copy link
Contributor Author

berland commented May 3, 2024

Yes, I think this is sufficient to warrant also outputting this. There might be other corner-case scenarios where this diff is changed too, and that is when it is the most interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done-Done
Development

Successfully merging a pull request may close this issue.

2 participants