Terminate experiment does not work when jobs are pending #7871

berland · 2024-05-13T07:43:23Z

Describe the bug
Terminate experiment does not do anything when jobs are pending.

To reproduce
Steps to reproduce the behaviour:

Login to an onprem or Azure node.
Run ERT in gui mode (I tried fmu-drogon/drogon_design.ert)
Maybe choose a queue that is very slow to respond, like short, or for LSF use an outrageous memory requirement like QUEUE_OPTION LSF LSF_RESOURCE rusage[mem=200000]
Run an ensemble experiment
While realizations are still in "Pending" state, click "Terminate experiment" and "Yes" in the dialoge box
Observe that nothing happens.
Observe that if the close-button in the upper right corner of the window is clicked, the same dialoge box appears, and if clicking yes, the run-dialogue window is closed. Realizations are still not killed, and the main Ert window has some errors: "Run Experiment" cannot be clicked, and the main window cannot be closed.

Expected behaviour
qdel/bkill commands should be initiated and Ert should tear down the ensemble experient

Environment

OS: RHEL7
ERT/Komodo release: bleeding as of 2024-05-13
Python version: 3.8
Remote/HPC execution involved: yes

The text was updated successfully, but these errors were encountered:

berland · 2024-05-15T08:25:45Z

This bug is a regression from #6811

berland · 2024-05-15T08:40:14Z

The problem is that killing of realizations depends on events being sent, but there are no events while all realizations are pending. The code is stuck in the async for statement in:

https://github.com/equinor/ert/blob/main/src/ert/run_models/base_run_model.py#L502-L532

and never gets to the lines where it would kill realizations.

berland added the bug label May 13, 2024

berland self-assigned this May 15, 2024

berland changed the title ~~Terminate experiment does not do anything on Azure~~ Terminate experiment does not work when jobs are pending May 15, 2024

berland mentioned this issue May 16, 2024

Let ERT be able to stop experiment when all realizations are pending #7924

Merged

9 tasks

berland closed this as completed in #7924 May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminate experiment does not work when jobs are pending #7871

Terminate experiment does not work when jobs are pending #7871

berland commented May 13, 2024 •

edited

berland commented May 15, 2024

berland commented May 15, 2024 •

edited

Terminate experiment does not work when jobs are pending #7871

Terminate experiment does not work when jobs are pending #7871

Comments

berland commented May 13, 2024 • edited

berland commented May 15, 2024

berland commented May 15, 2024 • edited

berland commented May 13, 2024 •

edited

berland commented May 15, 2024 •

edited