Skip to content

Epsilon Decay (Nonstationary Exploration)

Griffin Bassman edited this page Feb 6, 2023 · 2 revisions

Note: this reduction is experimental and is subject to change

Goal:

The goal of the Epsilon Decay reduction is to gradually taper off exploration in a well understood setting, while being able to increase epsilon (exploration rate) again if there is a change in the distribution of the input data (nonstationary). In some cases a fixed epsilon can be wasteful in long-term stationary settings by making exploratory predictions which are not necessary for improving the model. If these settings become nonstationary however, it is important that epsilon can quickly be increased to learn the new setting.

Epsilon as Function of Examples Processed

This reduction defines epsilon as a function of the number of examples that have been processed in a given model with the following formula: $$\epsilon=\frac{1}{\sqrt[3]{numExamples}}$$ A model will thus start with a very high rate of exploration, which will quickly taper off and approach $0$ with time.

Running Multiple Models in Parallel with Estimators

Similar to the Automl algorithm, Epsilon Decay will run multiple models in parallel where one model (the champion) is used for predictions (the policy), and the other models (challengers) are periodically compared with the champion, and swapped into the champion position if they prove superior. Models are compared against each other using confidence sequence estimators which return confidence intervals with a $lower_bound$ and an $upper_bound$. If the $lower_bound$ of one of the challengers is better than the $upper_bound$ of the champion, it can confidently be said that this challenger is better and will become the new champion.

Models Varying by Scope (Number of Examples Processed)

Epsilon Decay runs multiple models in parallel which vary by the number of examples that each has processed (scope). The idea is that the champion will have processed the largest number of examples (and thus have the lowest epsilon based on the formula above), while the challengers will have processed a smaller number of examples (and thus have larger epsilons). If there are $N$ models indexed $1...N$, we will consider the champion to be model $N$, where the challengers are models $1...N-1$. The champion is free to processes as many examples as it can until overtaken, but each challenger is constrained to only process a limited number of examples based on the current example count of the champion. Specifically, given that the champion has currently processed $E$ examples, each challenger model $i$ out of $N$ models will have a maximum example count of: $$E^{i/N}$$ For example, if there are 5 models $(N=5)$ and the champion has processed $1000$ examples $(E=1000)$, then challenger $1$ will have a maximum example count of $1000^{1/5}≈3$, challenger $2$ will have a maximum of $1000^{2/5}≈16$, challenger $3$ has maximum $1000^{3/5}≈63$, and challenger $4$ has maximum $1000^{4/5}≈251$.

Maintaining Model Scope Constraints

Consider a scenario where $N=5$ and a new example is processed leaving the example counts of each model $(2,11,55,252,1000)$. As seen above, the maximum example count of model $4$ here is $251$ and since it has exceeded that (with a count of $252$) it will be removed. This removal is done by promoting all of the models beneath it to a higher position, and instantiating a new model at the start. The new model counts after this process will be $(0,2,11,55,1000)$. Note that the promoted models will all fit into the constraints of their new positions as they are smaller than the model that was previously there.

Overtaking the Champion

Each of the challengers is compared to the champion as each new example is processed. If a challenger has a greater lower_bound than the champions upper_bound, then that challenger will overtake the champion, and all challengers below that challenger will be promoted, and all challengers above it (and the champion) will be removed. Say for example we have 5 models $(M1,M2,M3,M4,M5)$ where $M5$ would be the champion. If model $M3$ overtakes the champion, then the new set of models will look like $(NEW,NEW,M1,M2,M3)$. Here $M4$ and $M5$ were cleared, $M1,M2,M3$ were each promoted 2 spots, and two new models were initialized in the first two spots.

Comparing Models at Equivalent Scopes

When comparing two models, it is important that the estimators being compared have processed the same scope of examples. For this reason, each model will actually have multiple estimators of different scopes so they can be compared against each other. More specifically, if there are $N$ models from $1...N$, then each model $i$ will have $i$ estimators to compare against the models underneath it. In the example with $5$ models $M1...M5$, the estimators would look like:

$M1: E11$

$M2: E21\ E22$

$M3: E31\ E32\ E33$

$M4: E41\ E42\ E43\ E44$

$M5: E51\ E52\ E53\ E54\ E55$

Where $EXY$ is an estimator for model $X$ which tracks the scope of model $Y$. When comparing a challenger against the champion, the largest estimator of that challenger will be used with the corresponding scope of the champion. For instance, we would compare $E11$ vs $E51$, $E22$ vs $E52$, $E33$ vs $E53$, $E44$ vs $E54$. It is important however to maintain the other intermediary estimators in case a given model is promoted to champion. If a challenger overtakes the champion, its estimators are promoted diagonally as the model is promoted upwards. For example, if $M3$ overtakes the champion in the example above, then the estimators will look like:

$MX: EXX$

$MX: EXX\ EXX$

$M1: EXX\ EXX\ E11$

$M2: EXX\ EXX\ E21\ E22$

$M3: EXX\ EXX\ E31\ E32\ E33$

Where $EXX$ represents a new estimator and $MX$ represents a new model. Here you can see $M4$ and $M5$ are removed while $M1,M2,M3$ are promoted. Similarly, the relevant estimators of $M1,M2,M3$ are promoted diagonally, while new estimators are instantiated to track the new models. A similar operation is done when a model exceeds its maximum scope and is removed. Say for example $M3$ exceeds its maximum scope and needs to be removed, the new arrangement of models and estimators will look like:

$MX: EXX$

$M1: EXX\ E11$

$M2: EXX\ E21\ E22$

$M4: EXX\ E41\ E42\ E44$

$M5: EXX\ E51\ E52\ E54\ E55$

Here models $M1$ and $M2$ are promoted up and their estimators are promoted diagonally. $M3$ is removed and a new model is instantiated at the start with a new estimator. Models $M4$ and $M5$ also need to shift their estimators for models $M1$ and $M2$ while also instantiating new estimators to track the new model.

Clone this wiki locally