GitHub - mgpopinjay/bandit-algorithms: A small collection of Bandit Algorithms (ETC, E-Greedy, Elimination, UCB, Exp3, LinearUCB, and Thompson Sampling)

Bandit Algorithms

Multi-arm bandit algorithms are fundamental to enabling recommendation systems, online advertising, clinical trials, online resource allocation and online search. Here is a small collection of them in Python Implementation: ETC, E-Greedy, Elimination, UCB, Exp3, LinearUCB, and Thompson Sampling.

Vis-a-vis Bandits used in Reinforcement Learning settings, three key assumptions apply:

Reward observed only conrrespond to the action taken (feedback).
Action does not alter the environment.
Taking an action does not restrict future actions.

Furthermore, these algorithms differ in their requirement of the time horizon and suboptimality gap $\Delta$ , and in their scaling of regret bounds.

Algorithm	Environment	Requirement	Regret Scaling
Explore-then-Commit (ETC)	Unstructured / Stochastic	$\Delta$ ,	$R_{n} = \frac{4}{n} ln(n) + C$
Epsilon-Greedy (E-Greedy)	Unstructured / Stochastic	$\Delta$	$R_{n} = \frac{1}{\Delta } ln(n) + C$
Elimination	Unstructured / Stochastic		$R_{n} = \frac{C_{1}}{\Delta } ln(n) + C_{2}$
Upper-Confidence Bound (UCB) - Infinite Horizon	Unstructured / Stochastic	---	$R_{n} = \frac{2}{\Delta } ln(n)$
Exponantiate-Explore-Exploit (Exp3) - Partial feedback	Adversarial / Non-stochastic	---	$R_{n} = \sqrt{2nk ln(k)}$
Linear UCB (LinUCB)	Contextual / Linear	---	$R_{n} = Cd \sqrt{n} ln(nL)$
Thompson Sampling	Structured	---	$R_{n} = ln(n)$

For detailed background of each algorithm, refer to Bandit Algorithms (2020) by Lattimore & Szepesvari.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
img		img
.DS_Store		.DS_Store
README.md		README.md
elimination.py		elimination.py
eps_greedy.py		eps_greedy.py
etc.py		etc.py
exp3.py		exp3.py
lin_ucb.py		lin_ucb.py
ucb.py		ucb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

.DS_Store

.DS_Store

README.md

README.md

elimination.py

elimination.py

eps_greedy.py

eps_greedy.py

etc.py

etc.py

exp3.py

exp3.py

lin_ucb.py

lin_ucb.py

ucb.py

ucb.py

Repository files navigation

Bandit Algorithms

About

Languages

mgpopinjay/bandit-algorithms

Folders and files

Latest commit

History

Repository files navigation

Bandit Algorithms

About

Topics

Resources

Stars

Watchers

Forks

Languages