Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing in a discrete configspace #1091

Open
mallanos opened this issue Jan 19, 2024 · 1 comment
Open

Optimizing in a discrete configspace #1091

mallanos opened this issue Jan 19, 2024 · 1 comment

Comments

@mallanos
Copy link

mallanos commented Jan 19, 2024

Description

I want to optimize a function that takes in 3 float parameters. However, not all combinations of the 3 parameters could exist.
Is there a way to define the configspace as a pool of possible solutions, so smac samples configs as three-dimensional points from that pool?

Steps/Code to Reproduce

What I'm doing now is defining the config space in the regular way:

    def configspace(self) -> ConfigurationSpace:

        cs = ConfigurationSpace(name="myspace", seed=seed)
        x0 = Float("x0", (np.min(embedding), np.max(embedding)), default=-3)
        x1 = Float("x1", (np.min(embedding), np.max(embedding)), default=-4)
        x2 = Float("x2", (np.min(embedding), np.max(embedding)), default=5)

        cs.add_hyperparameters([x0, x1, x2])

        return cs

Then, I use the Ask-and-Tell interface to:

  1. Ask for a config or point in the three-dimensional space
  2. Find the closest existing point to the suggested point
  3. Get the score or value associated with that point
  4. Tell smac3 the resulting TrialValue and TrialInfo
for _ in range(search_iterations):
    info = smac.ask()
    assert info.seed is not None
    score, point = model.sample(info.config, ec=ec, seed=info.seed)
    value = TrialValue(cost=score, time=0.5)
    true_info = TrialInfo(config=Configuration(configuration_space=model.configspace,
            values={
                    'x0': float(point[0]),
                    'x1': float(point[1]),
                    'x2': float(point[2]),
                    }), seed=info.seed)
    smac.tell(true_info, value)

Expected Results

all_scores = [smac.runhistory.average_cost(config) for config in smac.runhistory.get_configs()]
I would expect the length of all_scores to be equal to the number of search_iterations, and no 'nan' values

Actual Results

When I inspect the results by running:
all_scores = [smac.runhistory.average_cost(config) for config in smac.runhistory.get_configs()]
I get several 'nan' scores and the number of values samples is greater than the max number of evaluations (search_iterations)

Versions

smac version 2.0.2

Thanks!

@alexandertornede
Copy link
Contributor

Hi @mallanos,

thanks for posting this!

The approach you performed has a conceptual problem from my perspective: There is no guarantee that the closest point (depending on the distance metric you use) actually has a comparable acquisition function value.

Moreover, without looking into this, I assume that the nan values arise from the fact that you do not provide a proper value for the configuration that you obtained by the ask call, which is why SMAC just fills it automatically with a nan value. I would need to look into this, to confirm this assumption, though.

Depending on the concrete constraints you want to apply to your search space, you can try to work with conditions (https://automl.github.io/ConfigSpace/main/api/conditions.html) and forbidden clauses (https://automl.github.io/ConfigSpace/main/api/forbidden_clauses.html). Just be aware that these are internally resolved by rejection sampling meaning that a large number of constraints or forbidden clauses can make the sampling of configurations slow.

Does that help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants