Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The proper way of combining HMC with batch acquisition strategies for Bayesian Optimization #361

Open
BrunoKM opened this issue Mar 22, 2021 · 5 comments

Comments

@BrunoKM
Copy link
Contributor

BrunoKM commented Mar 22, 2021

This issue is meant to ask two questions: (1) Does the combination of Local Penalization and IntegratedHyperparameterAcquisition behave in the expected way, and (1) What's the right way to approach Batch Bayesian Optimization using multiple Hamiltonian Monte-Carlo (HMC) samples for model hyperparameters in EmuKit?

(1) Combining Local Penalization and IntegratedHyperparameterAcquisition

Currently, when doing sequential Bayesian Optimization (BO) in EmuKit, one can wrap the acquisition function in a wrapper – IntegratedHyperParameterAcquisition – and that wrapper remains responsible for generating and marginalizing-out the HMC samples:

acquisition = IntegratedHyperparameterAcquisition(emukit_model, ExpectedImprovement, n_burnin=100, n_samples=20)

When using the IntegratedHyperparameterAcquisition abstraction, the samples are only being marginalised-out when the acquisition function is called:

# All the HMC samples are used to evaluate the one-point acquisition
acquisition.evaluate(x)

But not when any of the model's predict functions are called:

# Only 1 HMC sample is used (whichever one the model happened to be updated with last) 
emukit_model.predict(x)

This means that when an IntegratedHyperParameterAcquisition is passed into a batch point calculator — such as the LocalPenalizaitonPointCalculator — the HMC samples won't necessarily be used in all places during computation of the batch acquisition.


What happens when IntegratedHyperparameterAcquisitionFunction is combined with LocalPenlizationPointCalculator

In the specific case of the LocalPenlizationPointCalculator, this means that HMC samples will be used to evaluate the single-point acquisition function that's being penalized, but only a single arbitrary sample will be used to estimate the Lipschitz constant:

lipschitz_constant = _estimate_lipschitz_constant(self.parameter_space, self.model)

or to evaluate the penalizer:

local_penalization_acquisition = LocalPenalization(self.model)

This doesn't seem like the behaviour the user would intend in this case. Using an arbitrary HMC sample for estimation of the Lipschitz constant could lead to undesirable, arbitrary behaviour. Depending on which HMC sample happens to be last in the collection of generated samples, the effect of local penalization will be vastly different. This seems to have a significant effect on performance of Batch BO + HMC.

At the same time, there isn't an obvious way to properly consider all HMC samples in combination with the LocalPenlizationPointCalculator in Emukit, without rewriting that class yourself.


This brings me to another question: where in the hierarchy of abstractions in EmuKIt should generating samples for hyperparameters, and marginalising them out, fit in when doing Batch Bayesian Optimisation? This is an issue not just for the Local Penalization calculator, but also for hypothetical new batch point calculators the user might want to implement:

Combining IntegratedHyperparameterAcquisition with other batch acquisition methods

In another use-case, when someone might want to implement their own batch point calculator. As an example, they might want to use the MultipointExpectedImprovement, but select points for the batch by sequentially optimizing 1-point Expected Improvement (EI), then 2-point EI wrt. the second point, 3-point EI wrt. the third point, etc. (as described in Fast Computation of Expected Improvement).

In such cases, as well as in the case of a locally penalized acquisition function, the acquisition function changes at each iteration of the batch. One could wrap this new acquisition in an IntegratedHyperParameterAcquisition wrapper for every time, but this would require re-running HMC every time. One could also try to construct an acquisition_generator that returns the right acquisition function at each batch-step, but this seems feels like a complicated work-around. IntegratedHyperparameterAcquisition seems unsuitable for marginalising out hyperparameter samples for Batch Bayesian Optimisation, including for Local Penalization.

This to me hints at the possibility that storing the HMC samples in an abstraction related to the model, rather than the acquisition, might be a more generalisable approach? (I don't have a concrete suggestion for what that might look like at the moment) If there is a relatively simple way to overcome these difficulties with Emukit methods, I'd love to know.

@BrunoKM
Copy link
Contributor Author

BrunoKM commented Mar 23, 2021

One small possible change would be to cache hyperparameter samples from model.generate_hyperparameter_samples(). This way each new instantiation of IngregratedHyperparameterAcquisition wouldn't needlessly regenerate the samples unless the model's been updated.

@apaleyes
Copy link
Collaborator

I can't claim understanding all details of what this issue is talking about. But as far as I can see, you indeed need a custom batch point calculator implementation. It can be based on existing LP implementation, or completely new. That's in line with the general idea of the library - the idea was never to support every possible use case out of the box, but to give people flexibility to twist and bend the library to adapt it to their needs.

@BrunoKM
Copy link
Contributor Author

BrunoKM commented Apr 1, 2021

I guess the main point was that combining LocalPenlizationPointCalculator and IntegratedHyperParameterAcquisition can lead to unexpected (although logical) and undesirable behaviour, since LocalPenlizationPointCalculator won't marginalise out the samples when computing the Lipschitz constant or the penalizer.

I have failed to realise that myself when first combining the two. I think at the very least it would be great to have a note in the documentation for LocalPenlizationPointCalculator warning of this behaviour.

@BrunoKM
Copy link
Contributor Author

BrunoKM commented Apr 1, 2021

If the idea is that it's on the user to construct a batch point calculator that marginalises out all the HMC samples when computing the penalized acquisition, I was wondering if there is a proper way of going about this without rewriting much of EmuKit functionality.

@apaleyes
Copy link
Collaborator

Well, it is and it isn't. Emukit is designed in a way that it shall be relatively easy for users to define their own behavior, if something isn't built in the package already.

Hw much of functionality exacly we are talking about? Again, at the moment it seems to be as though you need new acquisition implementation (that's essentially what point calculator is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants