Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate experiment does not work when jobs are pending #7871

Closed
berland opened this issue May 13, 2024 · 2 comments · Fixed by #7924
Closed

Terminate experiment does not work when jobs are pending #7871

berland opened this issue May 13, 2024 · 2 comments · Fixed by #7924
Assignees
Labels

Comments

@berland
Copy link
Contributor

berland commented May 13, 2024

Describe the bug
Terminate experiment does not do anything when jobs are pending.

To reproduce
Steps to reproduce the behaviour:

  1. Login to an onprem or Azure node.
  2. Run ERT in gui mode (I tried fmu-drogon/drogon_design.ert)
  3. Maybe choose a queue that is very slow to respond, like short, or for LSF use an outrageous memory requirement like QUEUE_OPTION LSF LSF_RESOURCE rusage[mem=200000]
  4. Run an ensemble experiment
  5. While realizations are still in "Pending" state, click "Terminate experiment" and "Yes" in the dialoge box
  6. Observe that nothing happens.
  7. Observe that if the close-button in the upper right corner of the window is clicked, the same dialoge box appears, and if clicking yes, the run-dialogue window is closed. Realizations are still not killed, and the main Ert window has some errors: "Run Experiment" cannot be clicked, and the main window cannot be closed.

Expected behaviour
qdel/bkill commands should be initiated and Ert should tear down the ensemble experient

Environment

  • OS: RHEL7
  • ERT/Komodo release: bleeding as of 2024-05-13
  • Python version: 3.8
  • Remote/HPC execution involved: yes
@berland berland added the bug label May 13, 2024
@berland berland self-assigned this May 15, 2024
@berland berland changed the title Terminate experiment does not do anything on Azure Terminate experiment does not work when jobs are pending May 15, 2024
@berland
Copy link
Contributor Author

berland commented May 15, 2024

This bug is a regression from #6811

@berland
Copy link
Contributor Author

berland commented May 15, 2024

The problem is that killing of realizations depends on events being sent, but there are no events while all realizations are pending. The code is stuck in the async for statement in:

https://github.com/equinor/ert/blob/main/src/ert/run_models/base_run_model.py#L502-L532

and never gets to the lines where it would kill realizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant