-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let ERT be able to stop experiment when all realizations are pending #7924
Conversation
|
This allows the base_run_model to also being able to check regularly if there are other tasks at hand, like terminating the experiment.
baf9127
to
7d46284
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #7924 +/- ##
==========================================
+ Coverage 85.82% 85.86% +0.04%
==========================================
Files 378 378
Lines 23069 23074 +5
Branches 636 625 -11
==========================================
+ Hits 19798 19813 +15
+ Misses 3198 3180 -18
- Partials 73 81 +8 ☔ View full report in Codecov by Sentry. |
if isinstance(event, CloseTrackerEvent): | ||
timeout = self._receiver_timeout | ||
closetracker_received = True | ||
_heartbeat_interval = self._receiver_timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if we need the receiver_timeout
at all when having the heartbeat in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the heartbeat_interval is None
by default, so there is not really a heartbeat in place unless you ask for it.
The receiver timeout is 60 second, while a heartbeat is very short. I thought that this could risk logging the error "Evalutor did not send the TERMINATED event" too often and when it should not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one consumer of the track()
function except for tests, so the heartbeat could be there by default. It would require modifications in ca 15 tests though to ignore the None event, but that is doable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then it makes sense to keep it - maybe. Nevertheless the None
is there only for the sake of tests. It very much depends how trivial this modification of tests would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it for now. I'll create an issue if it is doable to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job @berland ! 🚀
Relates to #7993 |
This allows the base_run_model to also being able to check regularly if there are other tasks at hand, like terminating the experiment.
Issue
Resolves #7871
Approach
Add heartbeat to
monitor.track()
Add a test to certify that
monitor.track()
is able to emitNone
events, aka "heartbeats". A better test would be to test thatBaseRunmodel.run_monitor()
would be able to exit at any time, but that looks like a big task. Ifrun_monitor()
is changed not to request hearbeats, the original bug would reappear but this test will not catch it.When applicable