Non-deterministic test: float comparisons in some tests make them flaky #5209

pkoz · 2024-01-28T20:27:25Z

Expected behavior

I found the following tests randomly failing in the GitHub Actions:

TestLightGBMTuner.test_tune_best_score_reproducibility
TestLightGBMTunerCV.test_tune_best_score_reproducibility
test_optimize_parallel_timeout

Expected behavior

Tests should be deterministic.

Suggestion:

We can fix assertions like:

optuna/tests/integration_tests/lightgbm_tuner_tests/test_optimize.py

Line 766 in 073abfc

assert best_score_second_try == best_score_first_try

by using pytest.approx that accepts numbers with a tolerance (default relative tolerance: 1e-6)

        assert best_score_second_try == pytest.approx(best_score_first_try)

Environment

Optuna version: 3.6.0.dev
Python version: 3.10.11
OS: macOS-14.2.1-arm64-arm-64bit
(Optional) Other libraries and their versions: n/a

Error messages, stack traces, or logs

>           assert first_trial.value == second_trial.value
E           AssertionError: assert 0.21086425862654534 == 0.21086425862654531
E            +  where 0.21086425862654534 = FrozenTrial(number=27, state=1, values=[0.21086425862654534], datetime_start=datetime.datetime(2024, 1, 27, 23, 47, 9,...alse, low=0.4, step=None), 'bagging_freq': IntDistribution(high=7, log=False, low=1, step=1)}, trial_id=27, value=None).value
E            +  and   0.21086425862654531 = FrozenTrial(number=27, state=1, values=[0.21086425862654531], datetime_start=datetime.datetime(2024, 1, 27, 23, 47, 10...alse, low=0.4, step=None), 'bagging_freq': IntDistribution(high=7, log=False, low=1, step=1)}, trial_id=27, value=None).value

Steps to reproduce

By the nature of the problem, there is no deterministic way to observe the problem.

Please take a look at this job log to see the example of the failed run.

Additional context (optional)

No response

The text was updated successfully, but these errors were encountered:

pkoz added the bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself. label Jan 28, 2024

pkoz mentioned this issue Jan 30, 2024

Skip the reproducibility tests for lightgbm #5214

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-deterministic test: float comparisons in some tests make them flaky #5209

Non-deterministic test: float comparisons in some tests make them flaky #5209

pkoz commented Jan 28, 2024

Non-deterministic test: float comparisons in some tests make them flaky #5209

Non-deterministic test: float comparisons in some tests make them flaky #5209

Comments

pkoz commented Jan 28, 2024

Expected behavior

Expected behavior

Suggestion:

Environment

Error messages, stack traces, or logs

Steps to reproduce

Additional context (optional)