Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Killing LSF jobs while submission is in progress can leave the job running on the cluster. #7907

Open
berland opened this issue May 15, 2024 · 0 comments · May be fixed by #8019
Open

Killing LSF jobs while submission is in progress can leave the job running on the cluster. #7907

berland opened this issue May 15, 2024 · 0 comments · May be fixed by #8019
Assignees
Labels

Comments

@berland
Copy link
Contributor

berland commented May 15, 2024

If an ensemble experiment is started, and 'Terminate experiment' is clicked while realizations are being submitted, the following "Error" is displayed to the user:

if iens not in self._iens2jobid:
logger.error(f"LSF kill failed due to missing jobid for realization {iens}")
return

image

This means that the killing of the realization failed. When the submit() function eventually finished, the realization is likely to be running even if it should have been killed.

@berland berland added the bug label May 15, 2024
@jonathan-eq jonathan-eq self-assigned this May 22, 2024
@berland berland self-assigned this May 28, 2024
@berland berland changed the title Error emitted from LSF driver for realization in waiting state Killing LSF jobs while submission is in progress can leave the job running on the cluster. May 29, 2024
@berland berland linked a pull request May 29, 2024 that will close this issue
9 tasks
@jonathan-eq jonathan-eq removed their assignment Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Ready for Review
2 participants