Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let ERT be able to stop experiment when all realizations are pending #7924

Merged
merged 1 commit into from
May 27, 2024

Conversation

berland
Copy link
Contributor

@berland berland commented May 16, 2024

This allows the base_run_model to also being able to check regularly if there are other tasks at hand, like terminating the experiment.

Issue
Resolves #7871

Approach
Add heartbeat to monitor.track()

Add a test to certify that monitor.track() is able to emit None events, aka "heartbeats". A better test would be to test that BaseRunmodel.run_monitor() would be able to exit at any time, but that looks like a big task. If run_monitor() is changed not to request hearbeats, the original bug would reappear but this test will not catch it.

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure tests pass locally (after every commit!)

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

@berland berland added the release-notes:skip If there should be no mention of this in release notes label May 16, 2024
@berland
Copy link
Contributor Author

berland commented May 16, 2024

todo: write test triggering the original bug. kind-of-done

This allows the base_run_model to also being able to
check regularly if there are other tasks at hand, like terminating
the experiment.
@berland berland self-assigned this May 22, 2024
@berland berland added release-notes:bug-fix Automatically categorise as bug fix in release notes and removed release-notes:skip If there should be no mention of this in release notes labels May 22, 2024
@berland berland changed the title Let monitor emit optional heartbeats Let ERT be able to stop experiment when all realizations are pending May 22, 2024
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.86%. Comparing base (b63ebb1) to head (7d46284).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7924      +/-   ##
==========================================
+ Coverage   85.82%   85.86%   +0.04%     
==========================================
  Files         378      378              
  Lines       23069    23074       +5     
  Branches      636      625      -11     
==========================================
+ Hits        19798    19813      +15     
+ Misses       3198     3180      -18     
- Partials       73       81       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if isinstance(event, CloseTrackerEvent):
timeout = self._receiver_timeout
closetracker_received = True
_heartbeat_interval = self._receiver_timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if we need the receiver_timeout at all when having the heartbeat in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the heartbeat_interval is None by default, so there is not really a heartbeat in place unless you ask for it.

The receiver timeout is 60 second, while a heartbeat is very short. I thought that this could risk logging the error "Evalutor did not send the TERMINATED event" too often and when it should not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one consumer of the track() function except for tests, so the heartbeat could be there by default. It would require modifications in ca 15 tests though to ignore the None event, but that is doable.

Copy link
Contributor

@xjules xjules May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then it makes sense to keep it - maybe. Nevertheless the None is there only for the sake of tests. It very much depends how trivial this modification of tests would be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it for now. I'll create an issue if it is doable to remove it.

Copy link
Contributor

@xjules xjules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job @berland ! 🚀

@xjules
Copy link
Contributor

xjules commented May 27, 2024

Relates to #7993

@berland berland merged commit 0a55f0a into equinor:main May 27, 2024
38 checks passed
@berland berland deleted the monitor_with_heartbeat branch June 6, 2024 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:bug-fix Automatically categorise as bug fix in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Terminate experiment does not work when jobs are pending
3 participants