Let ERT be able to stop experiment when all realizations are pending #7924

berland · 2024-05-16T14:02:26Z

This allows the base_run_model to also being able to check regularly if there are other tasks at hand, like terminating the experiment.

Issue
Resolves #7871

Approach
Add heartbeat to monitor.track()

Add a test to certify that monitor.track() is able to emit None events, aka "heartbeats". A better test would be to test that BaseRunmodel.run_monitor() would be able to exit at any time, but that looks like a big task. If run_monitor() is changed not to request hearbeats, the original bug would reappear but this test will not catch it.

PR title captures the intent of the changes, and is fitting for release notes.
Added appropriate release note label
Commit history is consistent and clean, in line with the contribution guidelines.
Make sure tests pass locally (after every commit!)

When applicable

When there are user facing changes: Updated documentation
New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
Large PR: Prepare changes in small commits for more convenient review
Bug fix: Add regression test for the bug
Bug fix: Create Backport PR to latest release

berland · 2024-05-16T14:20:49Z

~~todo: write test triggering the original bug.~~ kind-of-done

This allows the base_run_model to also being able to check regularly if there are other tasks at hand, like terminating the experiment.

codecov-commenter · 2024-05-22T13:11:14Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.86%. Comparing base (b63ebb1) to head (7d46284).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7924      +/-   ##
==========================================
+ Coverage   85.82%   85.86%   +0.04%     
==========================================
  Files         378      378              
  Lines       23069    23074       +5     
  Branches      636      625      -11     
==========================================
+ Hits        19798    19813      +15     
+ Misses       3198     3180      -18     
- Partials       73       81       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

xjules · 2024-05-23T08:22:45Z

src/ert/ensemble_evaluator/monitor.py

            if isinstance(event, CloseTrackerEvent):
-                timeout = self._receiver_timeout
+                closetracker_received = True
+                _heartbeat_interval = self._receiver_timeout


Just wondering if we need the receiver_timeout at all when having the heartbeat in the first place.

the heartbeat_interval is None by default, so there is not really a heartbeat in place unless you ask for it.

The receiver timeout is 60 second, while a heartbeat is very short. I thought that this could risk logging the error "Evalutor did not send the TERMINATED event" too often and when it should not.

There is only one consumer of the track() function except for tests, so the heartbeat could be there by default. It would require modifications in ca 15 tests though to ignore the None event, but that is doable.

I see. Then it makes sense to keep it - maybe. Nevertheless the None is there only for the sake of tests. It very much depends how trivial this modification of tests would be.

Let's keep it for now. I'll create an issue if it is doable to remove it.

xjules

Nice job @berland ! 🚀

xjules · 2024-05-27T14:12:15Z

Relates to #7993

berland added the release-notes:skip If there should be no mention of this in release notes label May 16, 2024

Let monitor emit optional heartbeats

7d46284

This allows the base_run_model to also being able to check regularly if there are other tasks at hand, like terminating the experiment.

berland force-pushed the monitor_with_heartbeat branch from baf9127 to 7d46284 Compare May 22, 2024 12:49

berland self-assigned this May 22, 2024

berland added release-notes:bug-fix Automatically categorise as bug fix in release notes and removed release-notes:skip If there should be no mention of this in release notes labels May 22, 2024

berland changed the title ~~Let monitor emit optional heartbeats~~ Let ERT be able to stop experiment when all realizations are pending May 22, 2024

xjules reviewed May 23, 2024

View reviewed changes

xjules approved these changes May 27, 2024

View reviewed changes

xjules mentioned this pull request May 27, 2024

Consider to remove self._receiver_timeout from Monitor #7993

Open

berland merged commit 0a55f0a into equinor:main May 27, 2024
38 checks passed

berland deleted the monitor_with_heartbeat branch June 6, 2024 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let ERT be able to stop experiment when all realizations are pending #7924

Let ERT be able to stop experiment when all realizations are pending #7924

berland commented May 16, 2024 •

edited

berland commented May 16, 2024 •

edited

codecov-commenter commented May 22, 2024

xjules May 23, 2024

berland May 23, 2024

berland May 23, 2024

xjules May 24, 2024 •

edited

xjules May 27, 2024

xjules left a comment

xjules commented May 27, 2024

Let ERT be able to stop experiment when all realizations are pending #7924

Let ERT be able to stop experiment when all realizations are pending #7924

Conversation

berland commented May 16, 2024 • edited

When applicable

berland commented May 16, 2024 • edited

codecov-commenter commented May 22, 2024

Codecov Report

xjules May 23, 2024

Choose a reason for hiding this comment

berland May 23, 2024

Choose a reason for hiding this comment

berland May 23, 2024

Choose a reason for hiding this comment

xjules May 24, 2024 • edited

Choose a reason for hiding this comment

xjules May 27, 2024

Choose a reason for hiding this comment

xjules left a comment

Choose a reason for hiding this comment

xjules commented May 27, 2024

berland commented May 16, 2024 •

edited

berland commented May 16, 2024 •

edited

xjules May 24, 2024 •

edited