Skip to content

robinbeura/Reinforcement-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Reinforcement-Learning

• Applied the concept of reinforcement learning by learning over the performance of a multi-armed bandit problem to predict which arm will be most successful to bet upon using approaches like epsilon greedy, UCB and Thompson sampling.

  1. ε-greedy is a simpler method that focuses more heavily on current wins with a deterministic approach to incorporating more information. A consequence of this approach is that current wins and more information are sometimes not well-balanced (i.e. concentrating on local maxima), but a benefit can be that if the method happens upon a global maximum early on it is more willing to trust it and can produce early-wins.
  2. UCB: This works better than the epsilon greedy because it removes the inefficient exploration (i.e., randomly) and use upper confidence bound.
  3. Thompson Sampling: At each time t, we sample an expected reward, Q~(a), from the prior distribution Beta(αi,βi) for every action. The best action is selected among samples: a(t)=argmax(Q(a)). After the true reward is observed, we can update the Beta distribution accordingly, which is essentially doing Bayesian inference to compute the posterior with the known prior and the likelihood of getting the sampled data.

About

Multi-Armed bandit problem to predict which arm will be most successful to bet upon using approaches like epsilon greedy, UCB, and Thompson sampling.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published