Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic test: float comparisons in some tests make them flaky #5209

Open
pkoz opened this issue Jan 28, 2024 · 0 comments
Open

Non-deterministic test: float comparisons in some tests make them flaky #5209

pkoz opened this issue Jan 28, 2024 · 0 comments
Labels
bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.

Comments

@pkoz
Copy link
Contributor

pkoz commented Jan 28, 2024

Expected behavior

I found the following tests randomly failing in the GitHub Actions:

  • TestLightGBMTuner.test_tune_best_score_reproducibility
  • TestLightGBMTunerCV.test_tune_best_score_reproducibility
  • test_optimize_parallel_timeout

Expected behavior

Tests should be deterministic.

Suggestion:

We can fix assertions like:

assert best_score_second_try == best_score_first_try

by using pytest.approx that accepts numbers with a tolerance (default relative tolerance: 1e-6)

        assert best_score_second_try == pytest.approx(best_score_first_try)

Environment

  • Optuna version: 3.6.0.dev
  • Python version: 3.10.11
  • OS: macOS-14.2.1-arm64-arm-64bit
  • (Optional) Other libraries and their versions: n/a

Error messages, stack traces, or logs

>           assert first_trial.value == second_trial.value
E           AssertionError: assert 0.21086425862654534 == 0.21086425862654531
E            +  where 0.21086425862654534 = FrozenTrial(number=27, state=1, values=[0.21086425862654534], datetime_start=datetime.datetime(2024, 1, 27, 23, 47, 9,...alse, low=0.4, step=None), 'bagging_freq': IntDistribution(high=7, log=False, low=1, step=1)}, trial_id=27, value=None).value
E            +  and   0.21086425862654531 = FrozenTrial(number=27, state=1, values=[0.21086425862654531], datetime_start=datetime.datetime(2024, 1, 27, 23, 47, 10...alse, low=0.4, step=None), 'bagging_freq': IntDistribution(high=7, log=False, low=1, step=1)}, trial_id=27, value=None).value

Steps to reproduce

By the nature of the problem, there is no deterministic way to observe the problem.

Please take a look at this job log to see the example of the failed run.

Additional context (optional)

No response

@pkoz pkoz added the bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself. label Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.
Projects
None yet
Development

No branches or pull requests

1 participant