Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing potential unbound local variable in Study._optimize_sequential #5280

Open
lumip opened this issue Feb 26, 2024 · 6 comments
Open

Fixing potential unbound local variable in Study._optimize_sequential #5280

lumip opened this issue Feb 26, 2024 · 6 comments
Labels
code-fix Change that does not change the behavior, such as code refactoring.

Comments

@lumip
Copy link

lumip commented Feb 26, 2024

Motivation

Study._optimize_sequential encounters an issue if _run_trial errors and callbacks are present, as seen in the following relevant code block:

        try:
            frozen_trial = _run_trial(study, func, catch)
        finally:
            # The following line mitigates memory problems that can be occurred in some
            # environments (e.g., services that use computing containers such as GitHub Actions).
            # Please refer to the following PR for further details:
            # https://github.com/optuna/optuna/pull/325.
            if gc_after_trial:
                gc.collect()

        if callbacks is not None:
            for callback in callbacks:
                callback(study, frozen_trial)

If _run_trial raises, frozen_trial is never set, leading to an UnboundLocalError: local variable 'frozen_trial' referenced before assignment error.

Suggestion

Consider moving the callback loop into the try block? Not sure if there might be some side effects in moving this before the garbage collection, however.

        try:
            frozen_trial = _run_trial(study, func, catch)
    
            if callbacks is not None:
                for callback in callbacks:
                    callback(study, frozen_trial)
        finally:
            # The following line mitigates memory problems that can be occurred in some
            # environments (e.g., services that use computing containers such as GitHub Actions).
            # Please refer to the following PR for further details:
            # https://github.com/optuna/optuna/pull/325.
            if gc_after_trial:
                gc.collect()

Additional context (optional)

This really is a bug report, I saw this as a cascading error after the file based journal encountered a json.decoder.JSONDecodeError. However, the bug report template is asking for so many details irrelevant to this that filling this template is the more feasible alternative.

@lumip lumip added the code-fix Change that does not change the behavior, such as code refactoring. label Feb 26, 2024
@nzw0301
Copy link
Member

nzw0301 commented Feb 28, 2024

Could you share the minimal reproducible code to see the error with us?

@lumip
Copy link
Author

lumip commented Mar 1, 2024

I don't have anything minimal right now but I'll try to come up with something

@lumip
Copy link
Author

lumip commented Mar 25, 2024

It seems I was hasty with the above code change suggestion: when _run_trial does encounter an error/exception, it would of course raise out of the _optimize_sequential call as well and never access the frozen_trial, so the fix suggested above wouldn't achieve anything. In fact, it's not where the error occurred and I don't know anymore why I was looking at that spot in the code for a fix - I guess I was just being stupid.

Here is the relevant snippet from the logs of one of my failing runs

Traceback (most recent call last):
  File "pyenv/lib/python3.9/site-packages/optuna/study/_optimize.py", line 220, in _run_trial
    frozen_trial = study._storage.get_trial(trial._trial_id)
  File "pyenv/lib/python3.9/site-packages/optuna/storages/_journal/storage.py", line 369, in get_trial
    self._sync_with_backend()
  File "pyenv/lib/python3.9/site-packages/optuna/storages/_journal/storage.py", line 149, in _sync_with_backend
    logs = self._backend.read_logs(self._replay_result.log_number_read)
  File "pyenv/lib/python3.9/site-packages/optuna/storages/_journal/file.py", line 177, in read_logs
    raise last_decode_error
  File "pyenv/lib/python3.9/site-packages/optuna/storages/_journal/file.py", line 191, in read_logs
    logs.append(json.loads(line))
  File "pyenv/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "pyenv/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "pyenv/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "search_hyperparams.py", line 179, in <module>
    study.optimize(trial_inference, n_trials=args.num_trials, gc_after_trial=True)
  File "pyenv/lib/python3.9/site-packages/optuna/study/study.py", line 451, in optimize
    _optimize(
  File "pyenv/lib/python3.9/site-packages/optuna/study/_optimize.py", line 66, in _optimize
    _optimize_sequential(
  File "pyenv/lib/python3.9/site-packages/optuna/study/_optimize.py", line 163, in _optimize_sequential
    frozen_trial = _run_trial(study, func, catch)
  File "pyenv/lib/python3.9/site-packages/optuna/study/_optimize.py", line 223, in _run_trial
    if frozen_trial.state == TrialState.COMPLETE:
UnboundLocalError: local variable 'frozen_trial' referenced before assignment

The error originates within JournalStorage._sync_with_backend, probably due to file system issues on the cluster I was running this, and can be reproduced with the following code:

from typing import Optional, Sequence
import optuna
from json import JSONDecodeError
from optuna.storages._journal.base import BaseJournalLogStorage

from optuna.trial import TrialState

def trial_fn(trial: optuna.Trial):
    foo = trial.suggest_float("foo", 0., 1.)
    return foo
    
class ErroneousJournalStorage(optuna.storages.JournalStorage):

    def __init__(self, log_storage: BaseJournalLogStorage) -> None:
        self.trial_state_values_set = False
        super().__init__(log_storage)

    def set_trial_state_values(self, trial_id: int, state: TrialState, values: Optional[Sequence[float]] = None) -> bool:
        self.trial_state_values_set = True
        return super().set_trial_state_values(trial_id, state, values)

    def _sync_with_backend(self) -> None:
        if self.trial_state_values_set:
            raise JSONDecodeError("some json error", "foo", 0)
        return super()._sync_with_backend()


sampler = optuna.samplers.TPESampler()
storage = ErroneousJournalStorage(
    optuna.storages.JournalFileStorage("foo"),
)

study = optuna.create_study(storage=storage, sampler=sampler)
study.optimize(trial_fn, n_trials=100, gc_after_trial=True)

Related problem

Another issue I was seeing appears to be when set_trial_state_values itself fails (instead of the _sync_with_backend call after it). In that case the completed trial is never written and the block

    try:
        frozen_trial = _tell_with_warning(
            study=study,
            trial=trial,
            value_or_values=value_or_values,
            state=state,
            suppress_warning=True,
        )
    except Exception:
        frozen_trial = study._storage.get_trial(trial._trial_id)
        raise

in _run_trial reloads it from the journal with state RUNNING. This causes the if statements in the following finally block to fall into the Should not reach else-branch (in line 243,244), where the error is not handled, and therefore results in a crash.

This can be reproduced by using the above code with a modified

class ErroneousJournalStorage(optuna.storages.JournalStorage):

    def set_trial_state_values(self, trial_id: int, state: TrialState, values: Optional[Sequence[float]] = None) -> bool:
        raise JSONDecodeError("some json error", "foo", 0)

@nzw0301
Copy link
Member

nzw0301 commented Mar 26, 2024

So can we close this issue?

@lumip
Copy link
Author

lumip commented Apr 4, 2024

Well, no. There is an issue with unhandled errors here after all.. You can rename/relabel as you wish, but the problem will not go away by closing this, aye?

@rafaelsavi
Copy link

I am also experiencing this error constantly, however it's not so easy to reproduce since it happens after some hours of training. It started when I migrated from a local sqlite DB to a postgresql accessed through the RDBStorage object. My traceback is very similar but not identical:


        study.optimize(self.__call__, gc_after_trial=True, n_trials=n_trials, **kwargs)
      File "/opt/venv/lib/python3.10/site-packages/optuna/study/study.py", line 451, in optimize
        _optimize(
      File "/opt/venv/lib/python3.10/site-packages/optuna/study/_optimize.py", line 99, in _optimize
        f.result()
      File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
        return self.__get_result()
      File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
        raise self._exception
      File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/opt/venv/lib/python3.10/site-packages/optuna/study/_optimize.py", line 159, in _optimize_sequential
        frozen_trial = _run_trial(study, func, catch)
      File "/opt/venv/lib/python3.10/site-packages/optuna/study/_optimize.py", line 219, in _run_trial
        if frozen_trial.state == TrialState.COMPLETE:

Message:

    UnboundLocalError: local variable 'frozen_trial' referenced before assignment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-fix Change that does not change the behavior, such as code refactoring.
Projects
None yet
Development

No branches or pull requests

3 participants