New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task/WP-80 Implement dynamic execution system selection #921
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #921 +/- ##
==========================================
+ Coverage 64.80% 65.43% +0.62%
==========================================
Files 434 437 +3
Lines 12506 12621 +115
Branches 2618 2623 +5
==========================================
+ Hits 8104 8258 +154
+ Misses 4167 4127 -40
- Partials 235 236 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
+ get available systems + handle client side scenarios
836f65f
to
95558cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool. A few questions
.sort() | ||
.map((exec_system_id) => ( | ||
<option key={exec_system_id} value={exec_system_id}> | ||
{exec_system_id} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on whether we should use getSystemName
here for display? Or a combo of getSystemName
and id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Should we use a semicolon instead of a hyphen to separate the name from id? Might reduce some confusion since colons are not allowed in system ids. i.e. Frontera: frontera1-cyemparala
instead of Frontera - frontera1-cyemparala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, did that I now switched to :
.
Also, while doing this found that I mistakenly skipped jest unit tests, fixed that. And added coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a third axis of info that may be useful: the system's host. Should we present this to #cep-design for feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will sync up with cep-design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing, addressed review comments in this new commit.
.sort() | ||
.map((exec_system_id) => ( | ||
<option key={exec_system_id} value={exec_system_id}> | ||
{exec_system_id} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ get available systems + handle client side scenarios
data = {'definition': app_def} | ||
|
||
# GET EXECUTION SYSTEM INFO TO PROCESS SPECIFIC SYSTEM DATA E.G. QUEUE INFORMATION | ||
data['exec_sys'] = tapis.systems.getSystem(systemId=app_def.jobAttributes.execSystemId) | ||
has_dynamic_exec_system = getattr(app_def.notes, 'dynamicExecSystem', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add one more field to notes
called dynamicExecSystems
, which we could specify if we wanted to further filter the systems we wanted the app to be available on?
Use case scenario: we know a modularized app works on frontera
and ls6
, but not stampede3
or even a VM, so don't want it to show there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, made those changes. Was fixing tests, this PR is updated.
{getSystemName( | ||
getExecSystemFromId(app, exec_system_id)?.host | ||
)}{' '} | ||
: {exec_system_id} :{' '} | ||
{getExecSystemFromId(app, exec_system_id)?.host} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we come up with an alternative for
Frontera : frontera : frontera.tacc.utexas.edu
- Could show only
Frontera
,LoneStar6
, etc if we know the name, otherwise show systemId and assume user understands host? Or host in a separate, read-only, disabled box?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the changes now from the discussion. Below are the changes:
- App definition notes will have this note:
"dynamicExecSystems": {
"frontera": "Frontera",
"ls6": "Lonestar6"
}
This will have a map of system id to the label.
To use this, the search in system list will now look for specific ids.
-
System list is shown even if the size is 1 for dynamic execution list.
-
The initial list of execution systems is based on the portal alloc (if available). This has an issue though, lets say a portal (like 3DEM) has portaAlloc but no execution system assigned to it, then user will never be able to execute anything. What is fallback here? if no execution system matches portalAlloc should we populate the execution system list with full list? This change will definitely break 3DEM unless 3DEM allocation model is changed.
-
Allocation is moved right next to System list. Help text is adjusted, will ping design channel for feedback on this.
-
Just like previously, allocation list is filtered based on what is selected in system list.
-
When dynamic execution list is setup but there is no allocation available, then the error message is adjusted for allocation.
32a87c9
to
ffaf67e
Compare
data = {'definition': app_def} | ||
|
||
# GET EXECUTION SYSTEM INFO TO PROCESS SPECIFIC SYSTEM DATA E.G. QUEUE INFORMATION | ||
data['exec_sys'] = tapis.systems.getSystem(systemId=app_def.jobAttributes.execSystemId) | ||
exec_systems = getattr(app_def.notes, 'dynamicExecSystems', TapisResult()).__dict__.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should dynamicExecSystems
be optional? or have an "ALL" param? Or do we prefer the explicit dynamic exec systems? We may have discussed before, apologies if so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we landed on keeping it optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather, I mean if dynamicExecSystem
is true
, and dynamicExecSystems
is undefined, should that mean that we do no filtering on the systems listed in the dropdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. We wanted to see if one attribute could provide everything.
So, if we choose dynamicExecSystems, then:
- dynamicExecSystems: ["All"] - would do no filtering, would pick all systems available to the user.
- "dynamicExecSystems": map of systems to system label will limit to these systems only.
I can make that change, it would be useful for some apps and not require provisioning of app if a new system is added.
How would we want to track labels in this case. An example is ls6 - exec systems are defined as ls6. There is no end user friendly label is system definition. The only other way is to maintain a global list of system names to label.
This would fit names of well known system names and their labels. Should use notes field in system definition for label?
Overview
There are exec system agnostic apps, and the Portal Job submission form needs to allow the users to pick an execution system for such apps. This PR implements the feature to provide that.
Implementation decisions:
UI Visibility of execution systems:
The alternative is to show all execution systems available to user, which is not going to work well because user will blocked by allocation.
Node and Time limits when exec system is flipped:
When exec system selection is flipped in the form, should the node and time limits match what's in the app defaults or should it keep the user modifications to those defaults.
Bring the app defaults:
Steps:
After step 4, node count goes back to 1 and default queue for ls6.
Keep the current node count on changing exec system (the PR implemented this)
Steps:
After step 4, node count adjust to ls6 default queue and will not go back to 1.
Cannot use tapis job attribute dynamicExecSystem as an indicator of whether app supports dynamic exec system.
Currently dynamic selection of an execution system is not supported. For this reason the job related attribute dynamicExecSystem should be set to false (the default) and execSystemConstraints should not be set.
dynamicExecSystem
Related
Changes
Backend:
Dependent picklist
Add a notes option for each app, to allow a selection of dynamic execution systems. Existance of this attribute indicates that dynamic execution system feature is available. There are two ways to provide dynamic execution systems.
Example:
Specific list of execution systems. Only LoneStar6, Frontera are shown if the user has access to them.
execSystems
.Client:
Testing
App with no dynamicExecSystem in notes section:
App with dynamic execution system
Pre-req:
dynamicExecSystem: ["ls6", "frontera"]
Flip between frontera and ls6. Check the following:
Use an allocation with no available execution systems and see what happens. (I only saw 3DEM based allocation with no hosts).
Test automation was added/updated for the following scenarios:
all exec systems with allocation
some exec systems with no allocation
only one exec system with allocation
only one exec system with allocation
no exec system with allocation
2 exec systems with missing default queue
2 exec systems with different queue names
2 exec systems - queue selection and changing exec system
2 exec systems - modify queue selection and check the corresponding dependencies
2 exec systems - modify queue limits and check if they are set appropriately in the values
2 exec systems - modify queue on new exec system
Verification:
UI
Order of drop down is different. Allocation is first and then Systems.
Added drop down to show exec system with labels from system.
Error message when allocation has no execution system.
Job History display of System Name
Notes