Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bias from independent within-trial restarts when instances are not uniform #2257

Open
nikohansen opened this issue Dec 20, 2023 · 1 comment

Comments

@nikohansen
Copy link
Contributor

nikohansen commented Dec 20, 2023

Following up on #1117 (comment). In example_experiment2.py we conduct independent within-trial restarts on each instance thereby exhausting the budget. The expected number of repetitions depends on the instance success rate. In example_experiment3.py we (plan to) have uniform instance repetitions where, depending on number of successes over all instances, each instance is repeated the same number of times to exhaust the budget.

When instances do not "behave" similarly (the original assumption is that they do), we currently have more repetitions on the difficult instances which does not seem right: assuming for simplicity two different instances, one with success rate one, the other with success rate zero and a solver terminating these instances with the evals1 and evals0 number of evaluations, respectively, and the experimentation budget > evals0. Then, with within-trial restarts, we compute the expected runtime over both instances (AKA the evaluations per single success) to be budget + evals1, whereas with uniform instance repetitions we get evals0 + evals1. The discrepancy becomes the larger the earlier the solver terminates on unsuccessful instances (the smaller evals0) and the larger the experimentation budget. The observation under within-trial restarts contradicts the idea that we have a budget-independent setup and the observation is dominated by an (somewhat arbitrary) input parameter, the budget, which renders the observation somewhat meaningless. Consequently, the results based on with-trial restarts crucially depend on the assumption that instance "behave similarly" (are uniform). With uniform instance repetitions as in example_experiment3.py this assumption is considerably less crucial.

The second advantage of uniform repetitions is that it can more easily recover the observation like from the old setup than vice versa. We also collect more runtimes for easier targets which are bypassed in within-trial restarts when they were already reached with a previous run.

A third advantage of uniform repetitions is that we can infer the number of independent restarts from the displayed number of instances in the figure. EDIT: is this relevant when we do not have the success rate?

Disadvantages of making success-dependent uniform instance repetitions:

  • before each "restart", we need to read in all data to decide which instances to repeat.

  • we can not chose a specific initial solution for the first run only (on each instance). This requires one within-trial restart or breaking the experimental setup by running some (or one) instance(s) with a different initialization and some "magic" to choose the right instance as the first for simulated restarts. This may be considered a feature, as it requires to make this distinction more explicit.

@nikohansen
Copy link
Contributor Author

By introducing duplicates in DataSet._evals_appended_compute, we draw instance numbers uniformly at random rather than instances appearances to simulate runtimes (in pproc.DataSet.evals_with_simulated_restarts). This is however not a remedy when we already have with-trial restarts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant