Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wishlist] NeuroDebian use case and inspired features: "local" (to folder) qme, "local" qme as a job for "larger" qme, etc #42

Open
yarikoptic opened this issue Jul 17, 2020 · 3 comments

Comments

@yarikoptic
Copy link
Contributor

yarikoptic commented Jul 17, 2020

Description

NB I know that I am too behind on testing our what is already working and providing feedback, but wanted to write this down before I forget.

For building NeuroDebian packages we have a pretty simple setup. A hierarchy of <package>/<version> and then a simple script https://github.com/neurodebian/neurodebian/blob/master/tools/nd_build4allnd looping through releases building them.
For each build there is a .build file with the log, and summary.build like

datalad_0.13.0-1~nd80+1_amd64.build     FAILED  0:53.24 real, 39.56 user, 17.70 sys, 119536 out
datalad_0.13.0-1~nd90+1_amd64.build     FAILED  0:48.59 real, 36.15 user, 16.44 sys, 135048 out OLD
datalad_0.13.0-1~nd100+1_amd64.build    OK      46:57.42 real, 1741.84 user, 1268.67 sys, 3287816 out
datalad_0.13.0-1~nd110+1_amd64.build    OK      37:25.13 real, 1266.31 user, 1009.24 sys, 3474200 out
datalad_0.13.0-1~nd+1_amd64.build       OK      37:45.63 real, 1285.95 user, 1018.75 sys, 3570312 out
datalad_0.13.0-1~nd14.04+1_amd64.build  FAILED  1:16.53 real, 56.32 user, 26.00 sys, 122384 out
datalad_0.13.0-1~nd16.04+1_amd64.build  FAILED  1:13.09 real, 54.43 user, 24.59 sys, 134224 out
datalad_0.13.0-1~nd18.04+1_amd64.build  FAILED  48:14.25 real, 1797.37 user, 1294.49 sys, 3222536 out
datalad_0.13.0-1~nd20.04+1_amd64.build  FAILED  1:11.46 real, 53.10 user, 24.53 sys, 165680 out
datalad_0.13.0-1~nd90+1_amd64.build     FAILED  0:46.08 real, 34.83 user, 15.18 sys, 135112 out

visualizing what was build OK or FAILED and how long it took.

I am typically using screen to "submit" builds so I could come later to see -- what have I tried to build and what has succeeded/failed.

What I have loved to be able to do

  • For a new package build, in a new <package>/<version> (probably just on the top of that nd_build4all) I would run smth like qme start --local --register=above which would
    • start qme for that directory (not necessarily web UI, but at least the queue manager if that one is separate)
    • "register" it with qme it finds in some "above" directory (would be the one running on the entire user level or a dedicated one ran in the directory hosting <package> directories
  • instead of nd_build $dfamily $drelease $bpdsc "$@" || : do qme run nd_build $dfamily $drelease $bpdsc "$@". Relevant: run (shell specific?): allow for background mode? #30 (specific background for shell), closed in favor of more general Need to integrate async #2 (async).
    • Ideally would be nice to have local (or its global) one have configured to use specific executor (e.g. condor HTCondor executor #23) so script just works the same regardless of the executor
  • switch to do whatever else I need to do
  • come back to that server and be able to
    • "globally": qme ls across all packages I had built the overall status -- what is still running, what is done all OK, what is done partially OK (some builds succeeded, some failed), what has failed entirely
      • if in web UI, clicking on a "job" should lead to that "local" qme dashboard with individual packages (assuming that web ui for it is running, or may be this instance of web ui could just switch "context" to navigate qme stored "locally")
      • qme archive (via CLI or some action in web UI) some "jobs" (local qmes) which I consider "done" ( may be qme clear --archive?)
    • "locally" (per package),
      • I will qme ls or in web UI see which particular backports failed and be able to access build logs in web UI for review. So somewhat very similar to the aforementioned dump of a summary file: but so that qme ls status (ref Add "status" column to dashboard/ls #41) could also say "RUNNING" or "PENDING" (if ran via PBS for actual job execution). For the execution time ref (wishlist: two timestamps #29 closed-without-fix)
      • Be able to perform additional "actions" per each of those package build jobs:
        • qme rerun ("Rerun" action in web UI) - to rerun the build (e.g. if I updated base environment and expect it now to succeed)
        • qme rerun [MORE OPTIONS] or cli rerun --schema debug where [MORE OPTIONS] would be --hookdir=/home/neurodebian/neurodebian/tools/hooks or schema "debug" would have that setting "configured" so I do not need even to memorize what options what I need to debug that build
@vsoch
Copy link
Owner

vsoch commented Jul 18, 2020

I'm not clear why you would need to do qme start before running any commands. You would just start with a build:

qme run nd_build $dfamily $drelease $bpdsc "$@"

And then there would be an nd_build parser that parses those variables, and can provide actions to run. What we would want to add is a flag that would run the action for all runs in some parser subset, and then the archive command. I'm not totally clear on the other flags you exemplified but they seem kind of complicated. I think what would be helpful to start is to show the complete build / query cycle you would do for one specific package, and include the commands and output. I could create a simple parser for that, and then we can test/discuss adding the --archive and -rerun as a group options. I'm also thinking what we need is (instead of one generic central dashboard that is hard to customize per executor) - an executor specific one that exposes the actions to run on an entire group, archive/clear, etc.

@yarikoptic
Copy link
Contributor Author

I'm not clear why you would need to do qme start before running any commands.

Might not be needed, but it was primarily to trigger a local qme instance for that folder, and possibly prepare for the async submission of those jobs. But may be even that is not needed if we introduce some grouping attribute (in my case - folder path like Debian/<package>/<version>) so those jobs could be grouped in the common dash board. I kinda like that idea now, will elaborate on it later.

@vsoch
Copy link
Owner

vsoch commented Jul 18, 2020

That's an interesting idea! There isn't really established yet any concept of a "qme instance" but it might be what would be warranted for some executor to launch and then actively monitor some scoped thing. Looking forward to hearing your design / implementation for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants