Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task/WP-80 Implement dynamic execution system selection #921

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

chandra-tacc
Copy link
Collaborator

@chandra-tacc chandra-tacc commented Dec 18, 2023

Overview

There are exec system agnostic apps, and the Portal Job submission form needs to allow the users to pick an execution system for such apps. This PR implements the feature to provide that.

Implementation decisions:

  • UI Visibility of execution systems:

    • Visibility of execution system is dependent on allocation.
    • By default if a portal has allocation it is selected as default.
    • Next, the list of execution systems is dependent on the what's available in the allocation.

    The alternative is to show all execution systems available to user, which is not going to work well because user will blocked by allocation.

  • Node and Time limits when exec system is flipped:
    When exec system selection is flipped in the form, should the node and time limits match what's in the app defaults or should it keep the user modifications to those defaults.

    • Bring the app defaults:
      Steps:

      1. user selects frontera, the app default of node count:1 is shown
      2. user updates the queue to non-default (development)
      3. node count updates to 40
      4. user select ls6
        After step 4, node count goes back to 1 and default queue for ls6.
    • Keep the current node count on changing exec system (the PR implemented this)
      Steps:

      1. user selects frontera, the app default of node count:1 is shown
      2. user updates the queue to non-default (development)
      3. node count updates to 40
      4. user select ls6, queue updates to ls6 default
        After step 4, node count adjust to ls6 default queue and will not go back to 1.
  • Cannot use tapis job attribute dynamicExecSystem as an indicator of whether app supports dynamic exec system.

    • Tapis doc note:
      Currently dynamic selection of an execution system is not supported. For this reason the job related attribute dynamicExecSystem should be set to false (the default) and execSystemConstraints should not be set.
    • Alternative is to use notes section in the app and add a boolean attribute dynamicExecSystem

Related

Changes

Backend:

  • Dependent picklist

    • Execution System is dependent on allocation selection.
    • Queue and node limits are dependent on Execution System (existing feature, just integrated into execution system selection).
    • Make necessary app form state handling to keep tracking of picklist changes.
  • Add a notes option for each app, to allow a selection of dynamic execution systems. Existance of this attribute indicates that dynamic execution system feature is available. There are two ways to provide dynamic execution systems.

    • A list of systems. This is the list of restricted systems that this app can execute.
    • "All". List of all systems that the user has access to.
    • Handle both "All" vs "specific list". Send the execution systems for the app user when the notes indicates dynamic execution system.

    Example:

  • Specific list of execution systems. Only LoneStar6, Frontera are shown if the user has access to them.

notes: 
...
dynamicExecSystem: ["ls6", "frontera"]
...
  • All available execution systems
notes: 
...
dynamicExecSystem: ["All"]
...
  • Without dynamic execution Systems. This will use execution system in job attribute.
notes: 
label: "namd",
...
  • Adjust exec system attribute - change from exec_sys to array execSystems.

Client:

  • Move allocation drop down to the top of the configuration.
  • Make execution system drop down dependent on the allocation.
  • Setup a drop down for exec system.
  • Handle changes when the exec system selection is updated.
    • Update current component to a generic name.
    • Update queue and node values.
    • Create a state for the form and update it.
  • Allocation changes:
    • Track allocation per exec system.
    • When exec system, trigger the allocation check.
  • Label for execution system
    • System definition now has label. This label is used.
    • Otherwise, system name is used for execution system.

Testing

  1. App with no dynamicExecSystem in notes section:

    • No regression
      • job submission works
      • changing queue will adjust other aspects in the form (node and minutes)
      • default queue is properly selected and form values are correct.
      • Job history page shows right exec system information.
    • Existing automated tests work.
  2. App with dynamic execution system

    • Pre-req:

      • Hello world or similar app with notes added for dynamicExecSystem: ["ls6", "frontera"]
      • Create exec system in tapis v3 (you can use your own user to do it, so the list has some use cases for testing). This will make testing more interesting.
    • Flip between frontera and ls6. Check the following:

      • job submission works
      • changing queue will adjust other aspects in the form (node and minutes)
      • default queue is properly selected and form values are correct.
      • Job history page shows right exec system information.
    • Use an allocation with no available execution systems and see what happens. (I only saw 3DEM based allocation with no hosts).

  3. Test automation was added/updated for the following scenarios:

    • No dynamic exec system enabled - update existing tests for validation
    • App with dynamic stuff
    • all exec systems with allocation

    • some exec systems with no allocation

    • only one exec system with allocation

    • only one exec system with allocation

    • no exec system with allocation

    • 2 exec systems with missing default queue

    • 2 exec systems with different queue names

    • 2 exec systems - queue selection and changing exec system

    • 2 exec systems - modify queue selection and check the corresponding dependencies

    • 2 exec systems - modify queue limits and check if they are set appropriately in the values

    • 2 exec systems - modify queue on new exec system

    Verification:

    • if exec system drop down exists, the values are right
    • the queue values are right
    • the node and rest of the selection value are right

UI

  • Order of drop down is different. Allocation is first and then Systems.
    Screenshot 2024-05-03 at 10 13 08 AM

  • Added drop down to show exec system with labels from system.
    Screenshot 2024-05-03 at 10 13 17 AM

  • Error message when allocation has no execution system.
    Screenshot 2024-05-02 at 10 27 50 PM

  • Job History display of System Name

Screenshot 2024-05-03 at 10 21 20 AM

Notes

Copy link

codecov bot commented Dec 18, 2023

Codecov Report

Attention: Patch coverage is 89.17197% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 65.43%. Comparing base (a02b919) to head (cd616d5).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #921      +/-   ##
==========================================
+ Coverage   64.80%   65.43%   +0.62%     
==========================================
  Files         434      437       +3     
  Lines       12506    12621     +115     
  Branches     2618     2623       +5     
==========================================
+ Hits         8104     8258     +154     
+ Misses       4167     4127      -40     
- Partials      235      236       +1     
Flag Coverage Δ
javascript 70.27% <88.96%> (+0.47%) ⬆️
unittests 60.44% <91.66%> (+0.70%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...ns/AppForm/fixtures/AppForm.allocations.fixture.js 100.00% <ø> (ø)
...plications/AppForm/fixtures/AppForm.app.fixture.js 100.00% <ø> (ø)
...pForm/fixtures/AppForm.executionsystems.fixture.js 100.00% <100.00%> (ø)
client/src/redux/reducers/apps.reducers.js 64.70% <ø> (ø)
...ient/src/redux/sagas/fixtures/appdetail.fixture.js 100.00% <ø> (ø)
...src/redux/sagas/fixtures/appdetailSlurm.fixture.js 100.00% <ø> (ø)
...lient/src/redux/sagas/fixtures/compress.fixture.js 100.00% <ø> (ø)
...c/redux/sagas/fixtures/executionsystems.fixture.js 100.00% <100.00%> (ø)
client/src/redux/sagas/fixtures/extract.fixture.js 100.00% <ø> (ø)
client/src/utils/jobsUtil.js 94.36% <100.00%> (+0.16%) ⬆️
... and 5 more

... and 1 file with indirect coverage changes

@chandra-tacc chandra-tacc marked this pull request as ready for review February 13, 2024 16:36
@chandra-tacc chandra-tacc changed the title Draft PR Task/WP-80 Implement dynamic execution system selection Task/WP-80 Implement dynamic execution system selection Feb 13, 2024
Copy link
Member

@rstijerina rstijerina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool. A few questions

.sort()
.map((exec_system_id) => (
<option key={exec_system_id} value={exec_system_id}>
{exec_system_id}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on whether we should use getSystemName here for display? Or a combo of getSystemName and id?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, like the idea. I tried both (screenshots below). Based on that, I picked the combo option. I think this makes more clear for users who know how a tapis system to hpc system is N:1 relation.

  1. Just SystemName using getSystemName:
    Screenshot 2024-02-28 at 10 21 04 AM

  2. combo of getSystemName and id
    Screenshot 2024-02-28 at 10 24 30 AM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Should we use a semicolon instead of a hyphen to separate the name from id? Might reduce some confusion since colons are not allowed in system ids. i.e. Frontera: frontera1-cyemparala instead of Frontera - frontera1-cyemparala

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, did that I now switched to :.
Also, while doing this found that I mistakenly skipped jest unit tests, fixed that. And added coverage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a third axis of info that may be useful: the system's host. Should we present this to #cep-design for feedback?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will sync up with cep-design

server/portal/apps/workspace/api/views.py Outdated Show resolved Hide resolved
client/src/utils/apps.js Outdated Show resolved Hide resolved
client/src/components/Applications/AppForm/AppFormUtils.js Outdated Show resolved Hide resolved
client/src/components/Applications/AppForm/AppForm.jsx Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@chandra-tacc chandra-tacc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing, addressed review comments in this new commit.

client/src/components/Applications/AppForm/AppForm.jsx Outdated Show resolved Hide resolved
.sort()
.map((exec_system_id) => (
<option key={exec_system_id} value={exec_system_id}>
{exec_system_id}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, like the idea. I tried both (screenshots below). Based on that, I picked the combo option. I think this makes more clear for users who know how a tapis system to hpc system is N:1 relation.

  1. Just SystemName using getSystemName:
    Screenshot 2024-02-28 at 10 21 04 AM

  2. combo of getSystemName and id
    Screenshot 2024-02-28 at 10 24 30 AM

client/src/components/Applications/AppForm/AppFormUtils.js Outdated Show resolved Hide resolved
client/src/utils/apps.js Outdated Show resolved Hide resolved
server/portal/apps/workspace/api/views.py Outdated Show resolved Hide resolved
data = {'definition': app_def}

# GET EXECUTION SYSTEM INFO TO PROCESS SPECIFIC SYSTEM DATA E.G. QUEUE INFORMATION
data['exec_sys'] = tapis.systems.getSystem(systemId=app_def.jobAttributes.execSystemId)
has_dynamic_exec_system = getattr(app_def.notes, 'dynamicExecSystem', False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add one more field to notes called dynamicExecSystems, which we could specify if we wanted to further filter the systems we wanted the app to be available on?

Use case scenario: we know a modularized app works on frontera and ls6, but not stampede3 or even a VM, so don't want it to show there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, made those changes. Was fixing tests, this PR is updated.

Comment on lines 752 to 756
{getSystemName(
getExecSystemFromId(app, exec_system_id)?.host
)}{' '}
: {exec_system_id} :{' '}
{getExecSystemFromId(app, exec_system_id)?.host}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we come up with an alternative for

Frontera : frontera : frontera.tacc.utexas.edu
  • Could show only Frontera, LoneStar6, etc if we know the name, otherwise show systemId and assume user understands host? Or host in a separate, read-only, disabled box?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the changes now from the discussion. Below are the changes:

  • App definition notes will have this note:
        "dynamicExecSystems": {
                "frontera": "Frontera",
                "ls6": "Lonestar6"
        }

This will have a map of system id to the label.
To use this, the search in system list will now look for specific ids.

  • System list is shown even if the size is 1 for dynamic execution list.

  • The initial list of execution systems is based on the portal alloc (if available). This has an issue though, lets say a portal (like 3DEM) has portaAlloc but no execution system assigned to it, then user will never be able to execute anything. What is fallback here? if no execution system matches portalAlloc should we populate the execution system list with full list? This change will definitely break 3DEM unless 3DEM allocation model is changed.

  • Allocation is moved right next to System list. Help text is adjusted, will ping design channel for feedback on this.

  • Just like previously, allocation list is filtered based on what is selected in system list.

  • When dynamic execution list is setup but there is no allocation available, then the error message is adjusted for allocation.

data = {'definition': app_def}

# GET EXECUTION SYSTEM INFO TO PROCESS SPECIFIC SYSTEM DATA E.G. QUEUE INFORMATION
data['exec_sys'] = tapis.systems.getSystem(systemId=app_def.jobAttributes.execSystemId)
exec_systems = getattr(app_def.notes, 'dynamicExecSystems', TapisResult()).__dict__.keys()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should dynamicExecSystems be optional? or have an "ALL" param? Or do we prefer the explicit dynamic exec systems? We may have discussed before, apologies if so

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we landed on keeping it optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather, I mean if dynamicExecSystem is true, and dynamicExecSystems is undefined, should that mean that we do no filtering on the systems listed in the dropdown?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. We wanted to see if one attribute could provide everything.

So, if we choose dynamicExecSystems, then:

  • dynamicExecSystems: ["All"] - would do no filtering, would pick all systems available to the user.
  • "dynamicExecSystems": map of systems to system label will limit to these systems only.

I can make that change, it would be useful for some apps and not require provisioning of app if a new system is added.
How would we want to track labels in this case. An example is ls6 - exec systems are defined as ls6. There is no end user friendly label is system definition. The only other way is to maintain a global list of system names to label.
This would fit names of well known system names and their labels. Should use notes field in system definition for label?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants