-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow --dependency on job name #5917
Comments
Yes, dependencies now are mostly implemented as jobtap plugins that register a callback of the form I guess one thing to look out for is that job names are not guaranteed to be unique like jobids, so the implementation would have to decide how to handle that. Edit: Oh, and the usage would be more like:
to tell the dependency system you're using the |
I can try writing one! And I understand this point:
At least how I'm setting things up, I'm controlling all the And is the idea of "jobtap" like you are tapping a job on the shoulder? tap tap, are you my dependency? |
😆 Yeah, like you're "tapping" in to the job flow control in the job manager. Also I should note that jobtap plugins are true plugins and can be developed without flux-core source, just add the |
@grondo if we put together a PR to flux-core, could it be considered? I took a look at the design this weekend, and it could be that we add a The reason I'm hoping it might be considered for core is that it would be an essential component of our JobSpec next generation library, which tightly controls the job names and (I think) should be able to set dependency relationships without needing to get back jobids to hand around to different tasks. I don't like any approach that requires that (because what if you need to write the logic before you have an ID)? And I don't like the idea that the user will need to configure a custom plugin otherwise the library |
Just FYI, in case you were unaware, there is a design by Tom and Steven for "openmp style" dependencies described in RFC 26 that IIRC tackles the need to express a workflow DAG in advance. What we have now was intended just to reach parity with slurm. Hackathon project here: https://github.com/flux-framework/flux-depend I don't have a problem with dependencies based on job name being built in. I think @grondo just meant that you could probably get something working without waiting on us, since we are totally slammed with el cap problems right now. We can definitely consider a PR but we are experiencing a large call volume right now and you may need to listen to the slack hold #music channel for a few weeks. Changing jobspec in flux-core is a whole other kettle of fish that AFAIK is not on the plan at the moment. |
Very happy to wait! Likely I'd do a hackathon with the rest of the thrust team to work on it because I'm terrible at C++.
Oh gosh, absolutely not - the "JobSpec the next generation" is a different jobspec, a more abstract one that maps into the existing flux jobspec. From the FAQ on the first page: |
Alright, but if you call that "jobspec", we're going to have an operator in flux core with a cute gopher mascot. It does the same thing as the flux operator except it's also helps you make collect phone calls and can read off the time in a robotic female voice. |
flux operator - keeping your jobs on hold since 1967 |
Just to make sure I'm not giving @vsoch bad advice here, I think we can pull this together as a python-based jobtap plugin with cffi producing a regular plugin.so but internals in python. Would either of you have issues with that approach? No additional dependencies required since we already require python and cffi of a sufficient version for our command line interfaces. Then if it ever becomes a performance bottleneck or similar we can optimize, being the thought. Here's the cffi feature I'm talking about for embedding python as a shared library: https://cffi.readthedocs.io/en/latest/embedding.html |
That would be a neat little demo |
Another thought here, I was reading through the |
Well, why not just do that on your submit line? $ flux run --dependency=after:$(flux jobs -ac1 --name=job-a -no {id}) -vvv hostname
jobid: fD1XCrosV
0.000s: job.submit {"userid":6885,"urgency":16,"flags":0,"version":1}
0.013s: job.dependency-add {"description":"after-start=fCC1iVf35"}
0.013s: job.dependency-remove {"description":"after-start=fCC1iVf35"} BTW, one gotcha here is that |
Oof, @v mentioned in slack that part of her reason is so that it can be expressed declaratively in a file, say as part of jobspec. That's my main use-case for the OpenMP-style dependencies that could probably be implemented in a similar way. In the end it's always possible to do that if you have a shell running the flux run command, but interactive submission is not the main use for either of these IMO (though some users will probably want it for that). |
Yeah, I agree. This syntax was borrowed directly from Slurm for ease-of-transition, so it was darned if you, darned if you don't situation. |
Oh, that makes sense. Doing as a frobnicator plugin is compelling and an interesting little trick :-) |
Yeah, I think it would work for the more complex one too, so we might get the dependencies Stephen and I were discussing all that time ago. |
Might be good to review RFC 26 as part of designing this. That spec is a bit of a chimera - the openmp deps were defined earlier, but not implemented, and then the "simple dependencies" were added. Perhaps a proposal could start with a PR on that document, if we're pivoting on the declarative approach. |
Oh man, yeah that stings. At least it's consistent with what people are used to, thanks for the explanation. |
I was planning to go back and implement the OpenMP-style dependencies basically, this issue would be about being able to use the job name instead of a fluid. Some thought on how that part fits I think makes sense now that you say it, not immediately clear if it should actually be a different scheme (thus needing to replicate the ok/failed/... stuff) or something else to indicate that it isn't a fluid and needs to be translated, like |
Yeah, it would be nice to have a different scheme with simpler interface. BTW, there's also #5872. I've been pondering in the back of my mind how to expand the dependencies to allow this kind of logic. Would the OpenMP-style work for that? If so, it sounds like that would be a big win. |
Hmm, not directly. Did they have a use-case for that specific thing? There is something similar, that I thought was the same and actually had to rewrite this when I realized otherwise, called the I don't think I've ever heard someone ask for a way to wait for at least two of the past three, or something like that, other than when what they really meant was "run no more than N at a time" as a kind of end-run to implement a semaphore. Is that what that request was or was there something else there? Either way, I don't think we could express that in terms of what the existing after module provides. It probably wouldn't be too hard to do, but I'd want to know more about the use-case. |
I think it was something like this: every Nth job they're ensemble is potentially converged so they want to release a dependent job after any N jobs to combine the results of the N jobs that have finished so far, while still allowing the compute jobs to continue. The runtime of each job can vary by a factor of 4 or 5 which is why they don't know which N will finish first, nor is it feasible to submit N and wait for them all to finish since resource would go idle. |
Ok that's interesting. It seems like that would still be kinda hard to express with the "or" syntax, since you'd have to enumerate all the permutations. If we had the symbolic dependencies, or some other way to group them, maybe we could enable users to say something like Actually that reminds me, I've been looking over the code for some of these trying to think through how to do each of these. The name dependency translation should be pretty trivial because it doesn't need state. The symbolic dependencies need that hash table to exist, and probably to persist across restarts or be re-generated. Is there a good way to do that in the frobnicator? It could always use the KVS I suppose, but that seems kinda wasteful at first glance. Also, to your note about flux-depend, it never got to the point of actually being a module. That could in principle be resurrected, though I would probably simplify some aspects of the scope and make it a jobtap plugin rather than a module. Also we'd have to work through incorporating rust into core, not that I would argue too hard about that. 😁 |
I really just want to have differently named jobs (that I'm controlling everything for) and say "Mr. Job B, you depend on A." So just flux run --dependency.name="A" --name="B" hostname That's all I need! |
Right now I think I'm required to put a flux job id for a job dependency. I'd ideally like to be able to do this:
It could be the case this is only allowed in environments where the user owns all their jobs (one level in an instance). Having to keep a lookup of ids for different tasks is possible, but hugely adds to the complexity. I'd like to be able to read in a jobspec (where a task to submit has a name) and not to depend on the jobid that is going to be output. Could something like this be possible?
The text was updated successfully, but these errors were encountered: