Task/WP-80 Implement dynamic execution system selection #921

chandra-tacc · 2023-12-18T02:53:16Z

Overview

There are exec system agnostic apps, and the Portal Job submission form needs to allow the users to pick an execution system for such apps. This PR implements the feature to provide that.

Implementation decisions:

UI Visibility of execution systems:
- Visibility of execution system is dependent on allocation.
- By default if a portal has allocation it is selected as default.
- Next, the list of execution systems is dependent on the what's available in the allocation.
The alternative is to show all execution systems available to user, which is not going to work well because user will blocked by allocation.
Node and Time limits when exec system is flipped:
When exec system selection is flipped in the form, should the node and time limits match what's in the app defaults or should it keep the user modifications to those defaults.
- Bring the app defaults:
  Steps:
  1. user selects frontera, the app default of node count:1 is shown
  2. user updates the queue to non-default (development)
  3. node count updates to 40
  4. user select ls6
    After step 4, node count goes back to 1 and default queue for ls6.
- Keep the current node count on changing exec system (the PR implemented this)
  Steps:
  1. user selects frontera, the app default of node count:1 is shown
  2. user updates the queue to non-default (development)
  3. node count updates to 40
  4. user select ls6, queue updates to ls6 default
    After step 4, node count adjust to ls6 default queue and will not go back to 1.
Cannot use tapis job attribute dynamicExecSystem as an indicator of whether app supports dynamic exec system.
- Tapis doc note:
  Currently dynamic selection of an execution system is not supported. For this reason the job related attribute dynamicExecSystem should be set to false (the default) and execSystemConstraints should not be set.
- Alternative is to use notes section in the app and add a boolean attribute dynamicExecSystem

Changes

Backend:

Dependent picklist
- Execution System is dependent on allocation selection.
- Queue and node limits are dependent on Execution System (existing feature, just integrated into execution system selection).
- Make necessary app form state handling to keep tracking of picklist changes.
Add a notes option for each app, to allow a selection of dynamic execution systems. Existance of this attribute indicates that dynamic execution system feature is available. There are two ways to provide dynamic execution systems.
- A list of systems. This is the list of restricted systems that this app can execute.
- "All". List of all systems that the user has access to.
- Handle both "All" vs "specific list". Send the execution systems for the app user when the notes indicates dynamic execution system.
Example:
Specific list of execution systems. Only LoneStar6, Frontera are shown if the user has access to them.

notes: 
...
dynamicExecSystem: ["ls6", "frontera"]
...

All available execution systems

notes: 
...
dynamicExecSystem: ["All"]
...

Without dynamic execution Systems. This will use execution system in job attribute.

notes: 
label: "namd",
...

Adjust exec system attribute - change from exec_sys to array execSystems.

Client:

Move allocation drop down to the top of the configuration.
Make execution system drop down dependent on the allocation.
Setup a drop down for exec system.
Handle changes when the exec system selection is updated.
- Update current component to a generic name.
- Update queue and node values.
- Create a state for the form and update it.
Allocation changes:
- Track allocation per exec system.
- When exec system, trigger the allocation check.
Label for execution system
- System definition now has label. This label is used.
- Otherwise, system name is used for execution system.

Testing

App with no dynamicExecSystem in notes section:
- No regression
  - job submission works
  - changing queue will adjust other aspects in the form (node and minutes)
  - default queue is properly selected and form values are correct.
  - Job history page shows right exec system information.
- Existing automated tests work.
App with dynamic execution system
- Pre-req:
  - Hello world or similar app with notes added for dynamicExecSystem: ["ls6", "frontera"]
  - Create exec system in tapis v3 (you can use your own user to do it, so the list has some use cases for testing). This will make testing more interesting.
- Flip between frontera and ls6. Check the following:
  - job submission works
  - changing queue will adjust other aspects in the form (node and minutes)
  - default queue is properly selected and form values are correct.
  - Job history page shows right exec system information.
- Use an allocation with no available execution systems and see what happens. (I only saw 3DEM based allocation with no hosts).
Test automation was added/updated for the following scenarios:
- No dynamic exec system enabled - update existing tests for validation
- App with dynamic stuff
- all exec systems with allocation
- some exec systems with no allocation
- only one exec system with allocation
- only one exec system with allocation
- no exec system with allocation
- 2 exec systems with missing default queue
- 2 exec systems with different queue names
- 2 exec systems - queue selection and changing exec system
- 2 exec systems - modify queue selection and check the corresponding dependencies
- 2 exec systems - modify queue limits and check if they are set appropriately in the values
- 2 exec systems - modify queue on new exec system
Verification:
- if exec system drop down exists, the values are right
- the queue values are right
- the node and rest of the selection value are right

UI

Order of drop down is different. Allocation is first and then Systems.
Added drop down to show exec system with labels from system.
Error message when allocation has no execution system.
Job History display of System Name

Notes

codecov · 2023-12-18T03:03:15Z

Codecov Report

Attention: Patch coverage is 89.17197% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 65.43%. Comparing base (a02b919) to head (cd616d5).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #921      +/-   ##
==========================================
+ Coverage   64.80%   65.43%   +0.62%     
==========================================
  Files         434      437       +3     
  Lines       12506    12621     +115     
  Branches     2618     2623       +5     
==========================================
+ Hits         8104     8258     +154     
+ Misses       4167     4127      -40     
- Partials      235      236       +1

Flag	Coverage Δ
javascript	`70.27% <88.96%> (+0.47%)`	⬆️
unittests	`60.44% <91.66%> (+0.70%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...ns/AppForm/fixtures/AppForm.allocations.fixture.js	`100.00% <ø> (ø)`
...plications/AppForm/fixtures/AppForm.app.fixture.js	`100.00% <ø> (ø)`
...pForm/fixtures/AppForm.executionsystems.fixture.js	`100.00% <100.00%> (ø)`
client/src/redux/reducers/apps.reducers.js	`64.70% <ø> (ø)`
...ient/src/redux/sagas/fixtures/appdetail.fixture.js	`100.00% <ø> (ø)`
...src/redux/sagas/fixtures/appdetailSlurm.fixture.js	`100.00% <ø> (ø)`
...lient/src/redux/sagas/fixtures/compress.fixture.js	`100.00% <ø> (ø)`
...c/redux/sagas/fixtures/executionsystems.fixture.js	`100.00% <100.00%> (ø)`
client/src/redux/sagas/fixtures/extract.fixture.js	`100.00% <ø> (ø)`
client/src/utils/jobsUtil.js	`94.36% <100.00%> (+0.16%)`	⬆️
... and 5 more

... and 1 file with indirect coverage changes

+ get available systems + handle client side scenarios

rstijerina

This is really cool. A few questions

rstijerina · 2024-02-27T21:52:20Z

client/src/components/Applications/AppForm/AppForm.jsx

+                          .sort()
+                          .map((exec_system_id) => (
+                            <option key={exec_system_id} value={exec_system_id}>
+                              {exec_system_id}


Thoughts on whether we should use getSystemName here for display? Or a combo of getSystemName and id?

Yes, like the idea. I tried both (screenshots below). Based on that, I picked the combo option. I think this makes more clear for users who know how a tapis system to hpc system is N:1 relation.

Just SystemName using getSystemName:

combo of getSystemName and id

Nice. Should we use a semicolon instead of a hyphen to separate the name from id? Might reduce some confusion since colons are not allowed in system ids. i.e. Frontera: frontera1-cyemparala instead of Frontera - frontera1-cyemparala

Yes, did that I now switched to :.
Also, while doing this found that I mistakenly skipped jest unit tests, fixed that. And added coverage

There's also a third axis of info that may be useful: the system's host. Should we present this to #cep-design for feedback?

Yes, will sync up with cep-design

server/portal/apps/workspace/api/views.py

client/src/utils/apps.js

client/src/components/Applications/AppForm/AppFormUtils.js

client/src/components/Applications/AppForm/AppForm.jsx

chandra-tacc

Thanks for reviewing, addressed review comments in this new commit.

client/src/components/Applications/AppForm/AppForm.jsx

chandra-tacc · 2024-02-28T17:14:39Z

client/src/components/Applications/AppForm/AppForm.jsx

+                          .sort()
+                          .map((exec_system_id) => (
+                            <option key={exec_system_id} value={exec_system_id}>
+                              {exec_system_id}


Yes, like the idea. I tried both (screenshots below). Based on that, I picked the combo option. I think this makes more clear for users who know how a tapis system to hpc system is N:1 relation.

Just SystemName using getSystemName:

combo of getSystemName and id

client/src/components/Applications/AppForm/AppFormUtils.js

client/src/utils/apps.js

server/portal/apps/workspace/api/views.py

client/src/utils/jobsUtil.js

+ get available systems + handle client side scenarios

…rtal into task/WP-80-execution-system

rstijerina · 2024-04-17T18:28:41Z

server/portal/apps/workspace/api/views.py

    data = {'definition': app_def}

-    # GET EXECUTION SYSTEM INFO TO PROCESS SPECIFIC SYSTEM DATA E.G. QUEUE INFORMATION
-    data['exec_sys'] = tapis.systems.getSystem(systemId=app_def.jobAttributes.execSystemId)
+    has_dynamic_exec_system = getattr(app_def.notes, 'dynamicExecSystem', False)


Should we add one more field to notes called dynamicExecSystems, which we could specify if we wanted to further filter the systems we wanted the app to be available on?

Use case scenario: we know a modularized app works on frontera and ls6, but not stampede3 or even a VM, so don't want it to show there.

Yes, made those changes. Was fixing tests, this PR is updated.

rstijerina · 2024-04-17T18:35:12Z

client/src/components/Applications/AppForm/AppForm.jsx

+                              {getSystemName(
+                                getExecSystemFromId(app, exec_system_id)?.host
+                              )}{' '}
+                              : {exec_system_id} :{' '}
+                              {getExecSystemFromId(app, exec_system_id)?.host}


Did we come up with an alternative for

Frontera : frontera : frontera.tacc.utexas.edu

Could show only Frontera, LoneStar6, etc if we know the name, otherwise show systemId and assume user understands host? Or host in a separate, read-only, disabled box?

I made the changes now from the discussion. Below are the changes:

App definition notes will have this note:

"dynamicExecSystems": { "frontera": "Frontera", "ls6": "Lonestar6" }

This will have a map of system id to the label.
To use this, the search in system list will now look for specific ids.

System list is shown even if the size is 1 for dynamic execution list.

The initial list of execution systems is based on the portal alloc (if available). This has an issue though, lets say a portal (like 3DEM) has portaAlloc but no execution system assigned to it, then user will never be able to execute anything. What is fallback here? if no execution system matches portalAlloc should we populate the execution system list with full list? This change will definitely break 3DEM unless 3DEM allocation model is changed.

Allocation is moved right next to System list. Help text is adjusted, will ping design channel for feedback on this.

Just like previously, allocation list is filtered based on what is selected in system list.

When dynamic execution list is setup but there is no allocation available, then the error message is adjusted for allocation.

rstijerina · 2024-04-18T16:40:01Z

server/portal/apps/workspace/api/views.py

    data = {'definition': app_def}

-    # GET EXECUTION SYSTEM INFO TO PROCESS SPECIFIC SYSTEM DATA E.G. QUEUE INFORMATION
-    data['exec_sys'] = tapis.systems.getSystem(systemId=app_def.jobAttributes.execSystemId)
+    exec_systems = getattr(app_def.notes, 'dynamicExecSystems', TapisResult()).__dict__.keys()


Should dynamicExecSystems be optional? or have an "ALL" param? Or do we prefer the explicit dynamic exec systems? We may have discussed before, apologies if so

I think we landed on keeping it optional.

Rather, I mean if dynamicExecSystem is true, and dynamicExecSystems is undefined, should that mean that we do no filtering on the systems listed in the dropdown?

Oh, I see. We wanted to see if one attribute could provide everything.

So, if we choose dynamicExecSystems, then:

dynamicExecSystems: ["All"] - would do no filtering, would pick all systems available to the user.

"dynamicExecSystems": map of systems to system label will limit to these systems only.

I can make that change, it would be useful for some apps and not require provisioning of app if a new system is added.
How would we want to track labels in this case. An example is ls6 - exec systems are defined as ls6. There is no end user friendly label is system definition. The only other way is to maintain a global list of system names to label.
This would fit names of well known system names and their labels. Should use notes field in system definition for label?

chandra-tacc added 3 commits February 1, 2024 16:42

Working version of execution system changes

19c02c3

+ get available systems + handle client side scenarios

Add tests, fix issues found from unit tests

31ae843

Fix job history and also lint

95558cf

chandra-tacc force-pushed the task/WP-80-execution-system branch from 836f65f to 95558cf Compare February 1, 2024 22:45

chandra-tacc added 3 commits February 2, 2024 16:18

Use one attribute for exec systems instead of two

560aedf

Merge branch 'main' into task/WP-80-execution-system

b64317b

Fix formatting

9c3d577

chandra-tacc requested review from rstijerina and nathanfranklin February 13, 2024 16:34

Merge branch 'main' into task/WP-80-execution-system

e91c26b

chandra-tacc marked this pull request as ready for review February 13, 2024 16:36

chandra-tacc changed the title ~~Draft PR Task/WP-80 Implement dynamic execution system selection~~ Task/WP-80 Implement dynamic execution system selection Feb 13, 2024

chandra-tacc and others added 2 commits February 14, 2024 10:20

Sort system list in UI

4c4251b

Merge branch 'main' into task/WP-80-execution-system

460de43

rstijerina reviewed Feb 27, 2024

View reviewed changes

Address code review comments

e4242b4

chandra-tacc commented Feb 28, 2024

View reviewed changes

Adjusted exec system label text and fixed jest tests

3ef9a53

chandra-tacc requested review from shayanaijaz, edmondsgarrett and sophia-massie March 15, 2024 16:49

rstijerina reviewed Mar 19, 2024

View reviewed changes

client/src/utils/jobsUtil.js Outdated Show resolved Hide resolved

chandra-tacc added 7 commits March 21, 2024 10:07

Working version of execution system changes

9014370

+ get available systems + handle client side scenarios

Add tests, fix issues found from unit tests

ffa33a6

Fix job history and also lint

d1146f6

Use one attribute for exec systems instead of two

27e9476

Fix formatting

5bd7a39

Sort system list in UI

e8acb8e

Address code review comments

f5271ab

chandra-tacc added 5 commits March 21, 2024 10:07

Adjusted exec system label text and fixed jest tests

8be2d32

Fix bug related to job history execution system

b2fad67

Merge branch 'task/WP-80-execution-system' of github.com:TACC/Core-Po…

5de01b2

…rtal into task/WP-80-execution-system

Merge fix

f6dcb19

Redo the fix on job history

c31bc64

rstijerina reviewed Apr 17, 2024

View reviewed changes

Rework commit for exec and allocation

ffaf67e

chandra-tacc force-pushed the task/WP-80-execution-system branch from 32a87c9 to ffaf67e Compare April 17, 2024 20:44

chandra-tacc added 2 commits April 17, 2024 16:12

Prettier, lint and test fix

11b5ed9

Merge branch 'main' into task/WP-80-execution-system

ed22ec7

rstijerina reviewed Apr 18, 2024

View reviewed changes

chandra-tacc added 5 commits April 18, 2024 16:22

Merge branch 'main' into task/WP-80-execution-system

6d0bf31

Fix merge issues

e33ccbf

Make exec system dependent on allocation

7a85c9d

Merge branch 'main' into task/WP-80-execution-system

3a2920f

Fix job history display of system name

cd616d5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task/WP-80 Implement dynamic execution system selection #921

Task/WP-80 Implement dynamic execution system selection #921

chandra-tacc commented Dec 18, 2023 •

edited

codecov bot commented Dec 18, 2023 •

edited

rstijerina left a comment

rstijerina Feb 27, 2024

chandra-tacc Feb 28, 2024

rstijerina Feb 28, 2024

chandra-tacc Mar 1, 2024

rstijerina Mar 19, 2024

chandra-tacc Mar 21, 2024

chandra-tacc left a comment

chandra-tacc Feb 28, 2024

rstijerina Apr 17, 2024

chandra-tacc Apr 17, 2024

rstijerina Apr 17, 2024

chandra-tacc Apr 17, 2024

rstijerina Apr 18, 2024

chandra-tacc Apr 18, 2024

rstijerina Apr 18, 2024

chandra-tacc Apr 18, 2024

Task/WP-80 Implement dynamic execution system selection #921

Are you sure you want to change the base?

Task/WP-80 Implement dynamic execution system selection #921

Conversation

chandra-tacc commented Dec 18, 2023 • edited

Overview

Implementation decisions:

Related

Changes

Backend:

Client:

Testing

UI

Notes

codecov bot commented Dec 18, 2023 • edited

Codecov Report

rstijerina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chandra-tacc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chandra-tacc commented Dec 18, 2023 •

edited

codecov bot commented Dec 18, 2023 •

edited