Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task/WP-80 Implement dynamic execution system selection #921

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
19c02c3
Working version of execution system changes
chandra-tacc Nov 17, 2023
31ae843
Add tests, fix issues found from unit tests
chandra-tacc Dec 17, 2023
95558cf
Fix job history and also lint
chandra-tacc Dec 18, 2023
560aedf
Use one attribute for exec systems instead of two
chandra-tacc Feb 2, 2024
b64317b
Merge branch 'main' into task/WP-80-execution-system
chandra-tacc Feb 10, 2024
9c3d577
Fix formatting
chandra-tacc Feb 10, 2024
e91c26b
Merge branch 'main' into task/WP-80-execution-system
chandra-tacc Feb 13, 2024
4c4251b
Sort system list in UI
chandra-tacc Feb 14, 2024
460de43
Merge branch 'main' into task/WP-80-execution-system
rstijerina Feb 27, 2024
e4242b4
Address code review comments
chandra-tacc Feb 28, 2024
3ef9a53
Adjusted exec system label text and fixed jest tests
chandra-tacc Mar 1, 2024
9014370
Working version of execution system changes
chandra-tacc Nov 17, 2023
ffa33a6
Add tests, fix issues found from unit tests
chandra-tacc Dec 17, 2023
d1146f6
Fix job history and also lint
chandra-tacc Dec 18, 2023
27e9476
Use one attribute for exec systems instead of two
chandra-tacc Feb 2, 2024
5bd7a39
Fix formatting
chandra-tacc Feb 10, 2024
e8acb8e
Sort system list in UI
chandra-tacc Feb 14, 2024
f5271ab
Address code review comments
chandra-tacc Feb 28, 2024
8be2d32
Adjusted exec system label text and fixed jest tests
chandra-tacc Mar 1, 2024
b2fad67
Fix bug related to job history execution system
chandra-tacc Mar 21, 2024
5de01b2
Merge branch 'task/WP-80-execution-system' of github.com:TACC/Core-Po…
chandra-tacc Mar 21, 2024
f6dcb19
Merge fix
chandra-tacc Mar 21, 2024
c31bc64
Redo the fix on job history
chandra-tacc Mar 21, 2024
ffaf67e
Rework commit for exec and allocation
chandra-tacc Apr 4, 2024
11b5ed9
Prettier, lint and test fix
chandra-tacc Apr 17, 2024
ed22ec7
Merge branch 'main' into task/WP-80-execution-system
chandra-tacc Apr 17, 2024
6d0bf31
Merge branch 'main' into task/WP-80-execution-system
chandra-tacc Apr 18, 2024
e33ccbf
Fix merge issues
chandra-tacc Apr 18, 2024
7a85c9d
Make exec system dependent on allocation
chandra-tacc Apr 30, 2024
3a2920f
Merge branch 'main' into task/WP-80-execution-system
chandra-tacc May 3, 2024
cd616d5
Fix job history display of system name
chandra-tacc May 3, 2024
ce7a917
Merge branch 'main' into task/WP-80-execution-system
chandra-tacc May 16, 2024
bfd9869
Handle max memory on a system
chandra-tacc May 17, 2024
0e30fe4
Merge branch 'task/WP-80-execution-system' of github.com:TACC/Core-Po…
chandra-tacc May 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
196 changes: 122 additions & 74 deletions client/src/components/Applications/AppForm/AppForm.jsx
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,13 @@ import {
getMaxMinutesValidation,
getNodeCountValidation,
getCoresPerNodeValidation,
getTargetPathFieldName,
updateValuesForQueue,
getQueueValueForExecSystem,
getAppQueueValues,
matchExecSysWithAllocations,
} from './AppFormUtils';
import { getExecSystemFromId, getDefaultExecSystem } from 'utils/apps';

import DataFilesSelectModal from '../../DataFiles/DataFilesModals/DataFilesSelectModal';
import * as ROUTES from '../../../constants/routes';

Expand All @@ -47,11 +51,13 @@ const appShape = PropTypes.shape({
}),
systemNeedsKeys: PropTypes.bool,
pushKeysSystem: PropTypes.shape({}),
exec_sys: PropTypes.shape({
host: PropTypes.string,
scheduler: PropTypes.string,
batchLogicalQueues: PropTypes.arrayOf(PropTypes.shape({})),
}),
execSystems: PropTypes.arrayOf(
PropTypes.shape({
host: PropTypes.string,
scheduler: PropTypes.string,
batchLogicalQueues: PropTypes.arrayOf(PropTypes.shape({})),
})
),
license: PropTypes.shape({
type: PropTypes.string,
enabled: PropTypes.bool,
Expand Down Expand Up @@ -127,25 +133,46 @@ export const AppDetail = () => {
};

/**
* AdjustValuesWhenQueueChanges is a component that makes uses of
* HandleDependentFieldChanges is a component that makes uses of
* useFormikContext to ensure that when users switch queues, some
* variables are updated to match the queue specifications (i.e.
* correct node count, runtime etc)
*/
const AdjustValuesWhenQueueChanges = ({ app }) => {
const HandleDependentFieldChanges = ({ app, updateFormState }) => {
const [previousValues, setPreviousValues] = useState();

// Grab values and update if queue changes
const { values, setValues } = useFormikContext();
React.useEffect(() => {
if (
previousValues &&
previousValues.execSystemLogicalQueue !== values.execSystemLogicalQueue
) {
setValues(updateValuesForQueue(app, values));
if (previousValues) {
let valueUpdated = false;
let updatedValues = { ...values };
if (previousValues.execSystemId !== values.execSystemId) {
const exec_sys = getExecSystemFromId(app, values.execSystemId);
updatedValues.execSystemLogicalQueue = getQueueValueForExecSystem(
app,
exec_sys
).name;
updatedValues = updateValuesForQueue(app, updatedValues);
valueUpdated = true;

updateFormState.setExecSys(exec_sys);
updateFormState.setAppQueueValues(
getAppQueueValues(app, exec_sys.batchLogicalQueues)
);
updateFormState.setAllocationsForExecSys(exec_sys);
}

if (
previousValues.execSystemLogicalQueue !== values.execSystemLogicalQueue
) {
updatedValues = updateValuesForQueue(app, values);
valueUpdated = true;
}
if (valueUpdated) setValues(updatedValues);
}
setPreviousValues(values);
}, [app, values, setValues]);
}, [app, values, setValues, updateFormState]);
return null;
};

Expand Down Expand Up @@ -194,7 +221,7 @@ export const AppSchemaForm = ({ app }) => {
dispatch({ type: 'GET_SYSTEM_MONITOR' });
}, [dispatch]);
const {
allocations,
execSystemAllocationsMap,
portalAlloc,
jobSubmission,
hasDefaultAllocation,
Expand All @@ -204,11 +231,13 @@ export const AppSchemaForm = ({ app }) => {
execSystem,
defaultSystem,
keyService,
execSystemsWithAllocation,
} = useSelector((state) => {
const matchingExecutionHost = Object.keys(state.allocations.hosts).find(
(host) =>
app.exec_sys.host === host || app.exec_sys.host.endsWith(`.${host}`)
const matchingExecutionHostsMap = matchExecSysWithAllocations(
app,
state.allocations
);
const execSystemsWithAllocation = [...matchingExecutionHostsMap.keys()];
const { defaultHost, configuration, defaultSystem } = state.systems.storage;

const keyService = state.systems.storage.configuration.find(
Expand All @@ -221,9 +250,7 @@ export const AppSchemaForm = ({ app }) => {
defaultHost?.endsWith(s)
);
return {
allocations: matchingExecutionHost
? state.allocations.hosts[matchingExecutionHost]
: [],
execSystemAllocationsMap: matchingExecutionHostsMap,
portalAlloc: state.allocations.portal_alloc,
jobSubmission: state.jobs.submit,
hasDefaultAllocation:
Expand All @@ -238,9 +265,10 @@ export const AppSchemaForm = ({ app }) => {
.filter((currSystem) => !currSystem.is_operational)
.map((downSys) => downSys.hostname)
: [],
execSystem: state.app ? state.app.exec_sys.host : '',
execSystem: getDefaultExecSystem(app) ?? '',
defaultSystem,
keyService,
execSystemsWithAllocation,
};
}, shallowEqual);
const hideManageAccount = useSelector(
Expand All @@ -264,6 +292,29 @@ export const AppSchemaForm = ({ app }) => {
};

const appFields = FormSchema(app);
const [currentValues, setCurrentValues] = useState({
execSys: execSystem,
allocations: execSystemAllocationsMap.get(execSystem.id) ?? [],
appQueueValues: getAppQueueValues(app, execSystem.batchLogicalQueues),
});

const updateFormState = {
setExecSys: (newValue) => {
setCurrentValues((prevState) => ({ ...prevState, execSys: newValue }));
},
setAllocationsForExecSys: (execSys) => {
setCurrentValues((prevState) => ({
...prevState,
allocations: execSystemAllocationsMap.get(execSys?.id) ?? [],
}));
},
setAppQueueValues: (newValue) => {
setCurrentValues((prevState) => ({
...prevState,
appQueueValues: newValue,
}));
},
};

// initial form values
const initialValues = {
Expand All @@ -285,20 +336,17 @@ export const AppSchemaForm = ({ app }) => {

let missingAllocation = false;
if (app.definition.jobType === 'BATCH') {
initialValues.execSystemLogicalQueue = (
(app.definition.jobAttributes.execSystemLogicalQueue
? app.exec_sys.batchLogicalQueues.find(
(q) =>
q.name === app.definition.jobAttributes.execSystemLogicalQueue
)
: app.exec_sys.batchLogicalQueues.find(
(q) => q.name === app.exec_sys.batchDefaultLogicalQueue
)) || app.exec_sys.batchLogicalQueues[0]
initialValues.execSystemLogicalQueue = getQueueValueForExecSystem(
app,
getExecSystemFromId(app, app.definition.jobAttributes.execSystemId)
).name;
if (allocations.includes(portalAlloc)) {
if (currentValues.allocations.includes(portalAlloc)) {
initialValues.allocation = portalAlloc;
} else {
initialValues.allocation = allocations.length === 1 ? allocations[0] : '';
initialValues.allocation =
currentValues.allocations.length === 1
? currentValues.allocations[0]
: '';
}
if (!hasDefaultAllocation && hasStorageSystems) {
jobSubmission.error = true;
Expand All @@ -308,11 +356,11 @@ export const AppSchemaForm = ({ app }) => {
)} to run this application.`,
};
missingAllocation = true;
} else if (!allocations.length) {
} else if (!currentValues.allocations.length) {
jobSubmission.error = true;
jobSubmission.response = {
message: `You need an allocation on ${getSystemName(
app.exec_sys.host
currentValues.execSys.host
)} to run this application.`,
};
missingAllocation = true;
Expand Down Expand Up @@ -437,8 +485,11 @@ export const AppSchemaForm = ({ app }) => {
return Yup.mixed().notRequired();
}
return Yup.lazy((values) => {
const queue = app.exec_sys.batchLogicalQueues.find(
(q) => q.name === values.execSystemLogicalQueue
const exec_sys = getExecSystemFromId(app, values.execSystemId);
const queue = getQueueValueForExecSystem(
app,
exec_sys,
values.execSystemLogicalQueue
);
const schema = Yup.object({
parameterSet: Yup.object({
Expand All @@ -458,7 +509,7 @@ export const AppSchemaForm = ({ app }) => {
.required('Required'),
execSystemLogicalQueue: Yup.string()
.required('Required')
.oneOf(app.exec_sys.batchLogicalQueues.map((q) => q.name)),
.oneOf(exec_sys.batchLogicalQueues.map((q) => q.name)),
nodeCount: getNodeCountValidation(queue, app),
coresPerNode: getCoresPerNodeValidation(queue),
maxMinutes: getMaxMinutesValidation(queue).required('Required'),
Expand All @@ -467,7 +518,7 @@ export const AppSchemaForm = ({ app }) => {
allocation: Yup.string()
.required('Required')
.oneOf(
allocations,
currentValues.allocations,
'Please select an allocation from the dropdown.'
),
});
Expand Down Expand Up @@ -604,7 +655,10 @@ export const AppSchemaForm = ({ app }) => {
(app.definition.jobType === 'BATCH' && missingAllocation);
return (
<Form>
<AdjustValuesWhenQueueChanges app={app} />
<HandleDependentFieldChanges
app={app}
updateFormState={updateFormState}
/>
<FormGroup tag="fieldset" disabled={readOnly || systemNeedsKeys}>
{Object.keys(appFields.fileInputs).length > 0 && (
<div className="appSchema-section">
Expand Down Expand Up @@ -682,52 +736,46 @@ export const AppSchemaForm = ({ app }) => {
<div className="appSchema-header">
<span>Configuration</span>
</div>
{app.execSystems &&
Object.keys(execSystemsWithAllocation).length > 1 && (
<FormField
label="Execution System"
name="execSystemId"
description="Select the system this job will execute on."
type="select"
required
>
{execSystemsWithAllocation
.sort()
.map((exec_system_id) => (
<option key={exec_system_id} value={exec_system_id}>
{exec_system_id}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on whether we should use getSystemName here for display? Or a combo of getSystemName and id?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, like the idea. I tried both (screenshots below). Based on that, I picked the combo option. I think this makes more clear for users who know how a tapis system to hpc system is N:1 relation.

  1. Just SystemName using getSystemName:
    Screenshot 2024-02-28 at 10 21 04 AM

  2. combo of getSystemName and id
    Screenshot 2024-02-28 at 10 24 30 AM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Should we use a semicolon instead of a hyphen to separate the name from id? Might reduce some confusion since colons are not allowed in system ids. i.e. Frontera: frontera1-cyemparala instead of Frontera - frontera1-cyemparala

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, did that I now switched to :.
Also, while doing this found that I mistakenly skipped jest unit tests, fixed that. And added coverage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a third axis of info that may be useful: the system's host. Should we present this to #cep-design for feedback?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will sync up with cep-design

</option>
))}
</FormField>
)}
{app.definition.jobType === 'BATCH' && (
// TODO: Add option for exec system. form field.
//
rstijerina marked this conversation as resolved.
Show resolved Hide resolved

<FormField
label="Queue"
name="execSystemLogicalQueue"
description="Select the queue this job will execute on."
type="select"
required
>
{app.exec_sys.batchLogicalQueues
/*
Hide queues for which the app default nodeCount does not meet the minimum or maximum requirements
while hideNodeCountAndCoresPerNode is true
*/
.filter(
(q) =>
!app.definition.notes
.hideNodeCountAndCoresPerNode ||
(app.definition.jobAttributes.nodeCount >=
q.minNodeCount &&
app.definition.jobAttributes.nodeCount <=
q.maxNodeCount)
)
.map((q) => q.name)
.sort()
.map((queueName) =>
app.definition.notes.queueFilter ? (
app.definition.notes.queueFilter.includes(
queueName
) && (
<option key={queueName} value={queueName}>
{queueName}
</option>
)
) : (
<option key={queueName} value={queueName}>
{queueName}
</option>
)
)
.sort()}
{currentValues.appQueueValues.map((queueName) => (
<option key={queueName} value={queueName}>
{queueName}
</option>
))}
</FormField>
)}
<FormField
label="Maximum Job Runtime (minutes)"
description={`The maximum number of minutes you expect this job to run for. Maximum possible is ${getQueueMaxMinutes(
app,
getExecSystemFromId(app, values.execSystemId),
values.execSystemLogicalQueue
)} minutes. After this amount of time your job will end. Shorter run times result in shorter queue wait times.`}
name="maxMinutes"
Expand Down Expand Up @@ -761,7 +809,7 @@ export const AppSchemaForm = ({ app }) => {
<option hidden disabled>
{' '}
</option>
{allocations.sort().map((projectId) => (
{currentValues.allocations.sort().map((projectId) => (
<option key={projectId} value={projectId}>
{projectId}
</option>
Expand Down