Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fidelity_index doesn't support nested param #1125

Open
FrancoisPgm opened this issue Dec 5, 2023 · 0 comments
Open

fidelity_index doesn't support nested param #1125

FrancoisPgm opened this issue Dec 5, 2023 · 0 comments
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@FrancoisPgm
Copy link

Describe the bug
I am runnig orion with the hydra plugin, and when I use a nested param of the config for the fidelity space for BOHB, e.g. hydra.sweeper.params.model.trainer.max_epochs: "fidelity(low=1, high=2)", the fidelity_index gets set as "model.trainer.max_epochs", but the trial.params dict keeps the nested structure :

{'model': {'params': {'lr': 0.0001783,
                      'lr_scheduler_args': {'T_max': 72312},
                      'weight_decay': 0.01001},
           'trainer': {'max_epochs': 1.0}}}

So I get :

  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/algo/base.py", line 308, in has_suggested_all_possible_values
    fidelity_value = trial.params[fidelity_index]
KeyError: 'model.trainer.max_epochs'

Expected behavior
I'd expect either the fidelity_index to keep the nested structure somehow, or the trial.params dict to get flattened keys, something like:

{
    'model.params.lr': 0.0001783,
    'model.params.lr_scheduler_args.T_max': 72312,
    'model.params.weight_decay': 0.01001,
    'model.trainer.max_epochs': 1.0
}

For now I can easily avoid the issue by using a non-nested param in my config file:
hydra.sweeper.params.max_epochs: "fidelity(low=1, high=2)"

Steps to reproduce
Define a fidelity dimension with a nested param.

Environment (please complete the following information):

  • OS: MacOS Sonoma 14.1.1
  • Python version: 3.9
  • Oríon version: 0.2.4.post1+computecanada
  • Database: PickleDB

Additional context
The full error log :

[2023-12-05 08:13:00,956][HYDRA] Orion Optimizer {'type': 'bohb', 'config': {'seed': 1, 'min_points_in_model': 4, 'top_n_percent': 40, 'num_samples': 5}}
[2023-12-05 08:13:00,956][HYDRA] with parametrization {'model.params.lr': 'loguniform(1e-05, 0.01)', 'model.params.lr_scheduler_args.T_max': 'uniform(1000, 100000, discrete=True)', 'model.params.weight_decay': 'loguniform(0.01, 100)', 'model.trainer.max_epochs': 'fidelity(1, 2)'}
Traceback (most recent call last):
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 353, in clientctx
    yield client
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 510, in sweep
    raise e
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 507, in sweep
    self.optimize(self.client)
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 525, in optimize
    trials = self.sample_trials()
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 555, in sample_trials
    trials = self.suggest_trials(self.n_workers())
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 434, in suggest_trials
    trial = self.client.suggest(pool_size=count)
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/client/experiment.py", line 563, in suggest
    if self.is_done:
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/client/experiment.py", line 167, in is_done
    return self._experiment.is_done
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/core/worker/experiment.py", line 541, in is_done
    self.algorithms.is_done and num_pending_trials == 0
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/core/worker/primary_algo.py", line 277, in is_done
    return super().is_done or self.algorithm.is_done
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/algo/base.py", line 293, in is_done
    return self.has_completed_max_trials or self.has_suggested_all_possible_values()
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/algo/base.py", line 308, in has_suggested_all_possible_values
    fidelity_value = trial.params[fidelity_index]
KeyError: 'model.trainer.max_epochs'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra/_internal/utils.py", line 466, in <lambda>
    lambda: hydra.multirun(
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 162, in multirun
    ret = sweeper.sweep(arguments=task_overrides)
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/orion_sweeper.py", line 79, in sweep
    return self.sweeper.sweep(arguments)
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 510, in sweep
    raise e
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Core/python/3.9.6/lib/python3.9/contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/hydra_plugins/hydra_orion_sweeper/implementation.py", line 355, in clientctx
    client.close()
  File "/scratch/fpaugam/test_orion_env39/lib/python3.9/site-packages/orion/client/experiment.py", line 828, in close
    raise RuntimeError(
RuntimeError: There is still reserved trials: dict_keys(['7ba7eed37ff08c60dc9bad9341405be4'])
Release all trials before closing the client, using client.release(trial).
@FrancoisPgm FrancoisPgm added the bug Indicates an unexpected problem or unintended behavior label Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

1 participant