Skip to content

alextanhongpin/go-bandit

Repository files navigation

Multi-Armed Bandit Algorithm

Bandit algorithm balances exploration and exploitation, and is part of *reinforcement learning.

Reinforcement Learning

In Reinforcement Learning, the system makes a decision based on current situation. Unlike supervised learning, there is no training data available providing the correct decision to the system. We just get a reward back from the environment, indicating the quality of the decison that was made.

Reinforcement learning problems involve the following artiofacts.

  • State
  • Action or Decision
  • Reward

Implementations

  • Random greedy
  • Upper confidence bound one
  • Upper confidence bound two
  • Softmax
  • Interval estimate
  • Thompson sampling
  • Reward comparison
  • Action pursuit
  • Exponential weight

TODO

  • implement test for the different algorithms
  • plots the different algorithm
  • pros/cons for each of them
  • use cases and examples

References

Usage

package main

import (
	"log"
	"math/rand"
	"sync"
	"time"

	bandit "github.com/alextanhongpin/go-bandit"
)

func init() {
	rand.Seed(time.Now().UnixNano())
}

func main() {
	b, err := bandit.NewEpsilonGreedy(0.1, nil, nil)
	if err != nil {
		log.Println(err)
	}

	b.Init(5)

  N := 1000

	var wg sync.WaitGroup
	wg.Add(N)
	for i := 0; i < N; i++ {
		go func() {
			defer wg.Done()
			chosenArm := b.SelectArm(rand.Float64())
			reward := float64(rand.Intn(2))
			b.Update(chosenArm, reward)
		}()
	}

	wg.Wait()
	log.Printf("bandit: %+v", b)
	log.Println("done")
}

Test for data race:

$ go run -race main.go

Output:

2018/06/04 23:43:27 bandit: &{RWMutex:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0} Epsilon:0.1 Counts:[233 220 512 19 16] Rewards:[0.4592274678111587 0.48181818181818176 0.5097656249999998 0.3684210526315789 0.25]}
2018/06/04 23:43:27 done

Stats

$ brew install tokei
$ tokei

Output:

-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
 Go                     12         1120          879           74          167
 Markdown                1          103          103            0            0
 Python                  1           18           11            2            5
-------------------------------------------------------------------------------
 Total                  15         1241          993           76          172
-------------------------------------------------------------------------------

About

Multi-Armed Bandit (MAB) algorithm implementation in go

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published