Multi-Armed Bandit Algorithm

Bandit algorithm balances exploration and exploitation, and is part of *reinforcement learning.

Reinforcement Learning

In Reinforcement Learning, the system makes a decision based on current situation. Unlike supervised learning, there is no training data available providing the correct decision to the system. We just get a reward back from the environment, indicating the quality of the decison that was made.

Reinforcement learning problems involve the following artiofacts.

State
Action or Decision
Reward

Implementations

Random greedy
Upper confidence bound one
Upper confidence bound two
Softmax
Interval estimate
Thompson sampling
Reward comparison
Action pursuit
Exponential weight

TODO

implement test for the different algorithms
plots the different algorithm
pros/cons for each of them
use cases and examples

References

Usage

package main

import (
	"log"
	"math/rand"
	"sync"
	"time"

	bandit "github.com/alextanhongpin/go-bandit"
)

func init() {
	rand.Seed(time.Now().UnixNano())
}

func main() {
	b, err := bandit.NewEpsilonGreedy(0.1, nil, nil)
	if err != nil {
		log.Println(err)
	}

	b.Init(5)

  N := 1000

	var wg sync.WaitGroup
	wg.Add(N)
	for i := 0; i < N; i++ {
		go func() {
			defer wg.Done()
			chosenArm := b.SelectArm(rand.Float64())
			reward := float64(rand.Intn(2))
			b.Update(chosenArm, reward)
		}()
	}

	wg.Wait()
	log.Printf("bandit: %+v", b)
	log.Println("done")
}

Test for data race:

$ go run -race main.go

Output:

2018/06/04 23:43:27 bandit: &{RWMutex:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0} Epsilon:0.1 Counts:[233 220 512 19 16] Rewards:[0.4592274678111587 0.48181818181818176 0.5097656249999998 0.3684210526315789 0.25]}
2018/06/04 23:43:27 done

Stats

$ brew install tokei
$ tokei

Output:

-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
 Go                     12         1120          879           74          167
 Markdown                1          103          103            0            0
 Python                  1           18           11            2            5
-------------------------------------------------------------------------------
 Total                  15         1241          993           76          172
-------------------------------------------------------------------------------

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
annealing_softmax.go		annealing_softmax.go
annealing_softmax_test.go		annealing_softmax_test.go
bandit.go		bandit.go
bandit_test.go		bandit_test.go
epsilon_greedy.go		epsilon_greedy.go
epsilon_greedy_test.go		epsilon_greedy_test.go
go.mod		go.mod
go.sum		go.sum
helper.go		helper.go
helper_test.go		helper_test.go
plot.py		plot.py
softmax.go		softmax.go
softmax_test.go		softmax_test.go
ucb.go		ucb.go
ucb_test.go		ucb_test.go

alextanhongpin/go-bandit

Folders and files

Latest commit

History

Repository files navigation

Multi-Armed Bandit Algorithm

Reinforcement Learning

Implementations

TODO

References

Usage

Stats

About

Topics

Resources

Stars

Watchers

Forks

Languages