Make sure not to close the evaluator _ws_thread when done is not set yet #7901

xjules · 2024-05-14T14:37:25Z

Issue
(Maybe) Resolves #7642

Approach
Make sure to wait until done is set.

PR title captures the intent of the changes, and is fitting for release notes.
Added appropriate release note label
Commit history is consistent and clean, in line with the contribution guidelines.
Make sure tests pass locally (after every commit!)

When applicable

When there are user facing changes: Updated documentation
New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
Large PR: Prepare changes in small commits for more convenient review
Bug fix: Add regression test for the bug
Bug fix: Create Backport PR to latest release

jonathan-eq · 2024-05-15T06:28:51Z

src/ert/ensemble_evaluator/evaluator.py

+            while not self._done.done():
+                time.sleep(0.1)


As this is repeated three times; should we put it in a separate method called wait_until_done or sleep_until_done?

Yes, we should.

I have replaced the time.sleep with threading.Event()

berland · 2024-05-15T07:10:31Z

With this PR, the test

fix_loop$ pytest tests/unit_tests/ensemble_evaluator/test_ensemble_legacy.py -k run_and_cancel_legacy_ensemble[using_sch --count=10000 -n 48

passed. Looks good! 👍🏻

- Use threading.Event to signal done to the main thread

codecov-commenter · 2024-05-15T11:33:57Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.32%. Comparing base (faa45eb) to head (d52fb7e).
Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7901      +/-   ##
==========================================
- Coverage   85.38%   85.32%   -0.06%     
==========================================
  Files         381      381              
  Lines       23593    23613      +20     
  Branches      881      891      +10     
==========================================
+ Hits        20145    20148       +3     
- Misses       3336     3358      +22     
+ Partials      112      107       -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

eivindjahren · 2024-05-16T08:34:55Z

With this PR, the test

fix_loop$ pytest tests/unit_tests/ensemble_evaluator/test_ensemble_legacy.py -k run_and_cancel_legacy_ensemble[using_sch --count=10000 -n 48

passed. Looks good! 👍🏻

I played a bit with the repeat numbers and the failing behavior becomes consistent at about ~20 repeats. Each time it takes about 4 seconds to run the test so I think it seems reasonable to add

@pytest.repeat(20)

to the test so that we get consistent test results rather than flaky behavior. We run with pytest-xdist anyways so this should take neglible additional time (at least in comparison to the using_scheduler fixture). This also means adding pytest-repeat to the dev dependencies in pyproject.toml.

eivindjahren · 2024-05-24T07:55:47Z

We talked about the repeat idea, and came up with the idea of just parametrizing the test:

@pytest.mark.timeout(60)
@pytest.mark.usefixtures("using_scheduler")
@pytest.mark.parametrize("num_reals, num_jobs", itertools.product(range(2, 6), range(2, 6)))
def test_run_and_cancel_legacy_ensemble(
    tmpdir, make_ensemble_builder, monkeypatch, num_reals, num_jobs
):
    print(num_reals, num_jobs)
    custom_port_range = range(1024, 65535)
    with tmpdir.as_cwd():
        ensemble = make_ensemble_builder(
            monkeypatch, tmpdir, num_reals, num_jobs, job_sleep=40, max_runtime=1
        ).build()
        config = EvaluatorServerConfig(
            custom_port_range=custom_port_range,
            custom_host="127.0.0.1",
            use_token=False,
            generate_cert=False,
        )

        evaluator = EnsembleEvaluator(ensemble, config, 0)

        evaluator.start_running()

        async def _run_monitor():
            async with Monitor(config) as mon:
                cancel = True
                with contextlib.suppress(
                    ConnectionClosed
                ):  # monitor throws some variant of CC if dispatcher dies
                    async for _ in mon.track():
                        # Cancel the ensemble upon the arrival of the first event
                        if cancel:
                            await mon.signal_cancel()
                            cancel = False

        run_monitor_in_loop(_run_monitor)
        assert evaluator._ensemble.status == state.ENSEMBLE_STATE_CANCELLED

        # realisations should not finish, thus not creating a status-file
        for i in range(num_reals):
            assert not os.path.isfile(f"real_{i}/status.txt")

xjules · 2024-05-29T09:51:21Z

Closing this PR as this one is not required anymore it seems. Relates #8000

xjules self-assigned this May 14, 2024

xjules added the ensemble-evaluator label May 14, 2024

xjules force-pushed the fix_loop branch from a851032 to b0d6960 Compare May 14, 2024 14:47

jonathan-eq reviewed May 15, 2024

View reviewed changes

Make sure not to close the evaluator _ws_thread when done is not set yet

b78ad19

- Use threading.Event to signal done to the main thread

xjules force-pushed the fix_loop branch from 35c74a6 to b78ad19 Compare May 15, 2024 08:21

Do not wait in handlers

d52fb7e

xjules mentioned this pull request May 15, 2024

Flaky test integration_tests/storage/test_parameter_sample_types.py::test_surface_param_update #7531

Open

Improvements ...

db6aedf

xjules closed this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure not to close the evaluator _ws_thread when done is not set yet #7901

Make sure not to close the evaluator _ws_thread when done is not set yet #7901

xjules commented May 14, 2024 •

edited by eivindjahren

jonathan-eq May 15, 2024

xjules May 15, 2024

xjules May 15, 2024

berland commented May 15, 2024

codecov-commenter commented May 15, 2024

eivindjahren commented May 16, 2024 •

edited

eivindjahren commented May 24, 2024 •

edited

xjules commented May 29, 2024

Make sure not to close the evaluator _ws_thread when done is not set yet #7901

Make sure not to close the evaluator _ws_thread when done is not set yet #7901

Conversation

xjules commented May 14, 2024 • edited by eivindjahren

When applicable

jonathan-eq May 15, 2024

Choose a reason for hiding this comment

xjules May 15, 2024

Choose a reason for hiding this comment

xjules May 15, 2024

Choose a reason for hiding this comment

berland commented May 15, 2024

codecov-commenter commented May 15, 2024

Codecov Report

eivindjahren commented May 16, 2024 • edited

eivindjahren commented May 24, 2024 • edited

xjules commented May 29, 2024

xjules commented May 14, 2024 •

edited by eivindjahren

eivindjahren commented May 16, 2024 •

edited

eivindjahren commented May 24, 2024 •

edited