GitHub - pat-alt/reinforcement_learning: Contains all code and course work for a module on reinforcement learning.

NOTE: Exercises were either proposed or designed by the course instructores, Gergely Neu and Hrvoje Stojic.

Dynamic Programming

Problem Set 1 is about implementing policy evaluation and iteration methods for a simple Markov Decision Process. Implementation in R closely follows Sutton and Barto (2018). The chart below demonstrates how the optimal policy and value function is gradually learned through value iteration.

knitr::include_graphics("www/value_iteration.png")

Multi-Armed Bandit problems

Problem Set 2 replicates an empirical evaluation of Thompson Sampling (Chapelle and Li 2011). Code for this problem has been implemented in R and performance-enhanced through Rcpp (C++).

To run the simulation from the command line, simply execute the command below. This will clone the git repo to your device and run the simulation with parameters specified in run_simulation.

git clone https://github.com/pat-alt/reinforcement_learning.git
cd reinforcement_learning
Rscript -e 'source("requirements.R"); run_simulation(n_sim=5,horizon=1e6,update_every = 100)'  
open 'www/user_sim.png'

With the given set of parameters the computations should only take a few minutes and the resulting chart should looks something like this:

knitr::include_graphics("www/user_sim.png")

The results from the full simulation are shown below. All details and documentation can be found in the HTML document.

knitr::include_graphics("www/ps1_sim.png")

Gaussian Processes

The first part of Problem Set 3 is about Gaussian Process Regression. The plot below shows the output from a Gaussian Process Regression with 20 training points.

knitr::include_graphics("www/gp_reg.png")

Bayesian Optimization

The Bayes Optimizer gradually gets better at estimated the true function values. As it explores different points on the test grid uncertainty around these points shrinks. Sometimes the overall magnitude of the confidence interval suddenly appears to change which corresponds to occasions when the estimates of optimal hyperparameters change significantly. Eventually the learned function values are very close to true function values and the proposed optimum corresponds to the true optimum (among the test points).

knitr::include_graphics("www/bayes_opt.gif")

The Figure below shows the loss evolution with different acquisition functions for varying exploration parameters (by row). For UCB and EI the results are intuitive with the former performing better overall: as the exploration rate increases they are more likely to explore and hence at times commit errors. For PI the results are poor across the board. The choices of the eploration rate may be too high for PI. Another explanation may be that due to computational constraints, I have run the multi-started hyperparameter optimization only every five iterations.

Comparison of loss evolution with different acquisition functions for varying exploration parameters (by row).

Approximate Dynamic Programming

There are two fundamental challenges of reinforcement learning (RL):

Reward and transition functions are unknown.
The state and action space are large.

Approximate Dynamic Programming deals with both challenges.

The chart below shows the approximate evaluations of the two deterministic policies for different numbers of sample transitions (across columns) and different feature maps (across rows). Broadly speaking the estimates tend to be less noisy as the number of sample transitions increases.

Approximate policy iteration is noisy for high states. Nonetheless the proposed policies are close two optimal:

References

Chapelle, Olivier, and Lihong Li. 2011. “An Empirical Evaluation of Thompson Sampling.” Advances in Neural Information Processing Systems 24: 2249–57.

Sutton, Richard S, and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT press.

Session Info

utils::sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.3.3     data.table_1.14.0
## 
## loaded via a namespace (and not attached):
##  [1] knitr_1.30        magrittr_2.0.1    tidyselect_1.1.0  munsell_0.5.0    
##  [5] colorspace_2.0-1  R6_2.5.0          rlang_0.4.11      fansi_0.5.0      
##  [9] highr_0.8         dplyr_1.0.2       stringr_1.4.0     tools_4.0.3      
## [13] grid_4.0.3        gtable_0.3.0      xfun_0.23         utf8_1.2.1       
## [17] withr_2.4.2       htmltools_0.5.1.1 ellipsis_0.3.2    yaml_2.2.1       
## [21] digest_0.6.27     tibble_3.1.2      lifecycle_1.0.0   crayon_1.4.1     
## [25] farver_2.1.0      purrr_0.3.4       vctrs_0.3.8       glue_1.4.2       
## [29] evaluate_0.14     rmarkdown_2.6     labeling_0.4.2    stringi_1.5.3    
## [33] compiler_4.0.3    pillar_1.6.1      generics_0.1.0    scales_1.1.1     
## [37] pkgconfig_2.0.3

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
R		R
README_files/figure-gfm		README_files/figure-gfm
papers		papers
python		python
slides		slides
www		www
.DS_Store		.DS_Store
.gitignore		.gitignore
Methodology.Rmd		Methodology.Rmd
README.Rmd		README.Rmd
README.md		README.md
before_body.tex		before_body.tex
bib.bib		bib.bib
discounted_thompson.Rmd		discounted_thompson.Rmd
preamble.tex		preamble.tex
problem1.pdf		problem1.pdf
problem4.pdf		problem4.pdf
project.Rmd		project.Rmd
project.html		project.html
project.log		project.log
project.pdf		project.pdf
ps1_answers.Rmd		ps1_answers.Rmd
ps1_answers.html		ps1_answers.html
ps1_answers.pdf		ps1_answers.pdf
ps2.pdf		ps2.pdf
ps2_answers.Rmd		ps2_answers.Rmd
ps2_answers.html		ps2_answers.html
ps3.pdf		ps3.pdf
ps3_answers.Rmd		ps3_answers.Rmd
ps3_answers.html		ps3_answers.html
ps4_answers.Rmd		ps4_answers.Rmd
ps4_answers.html		ps4_answers.html
ps4_answers.pdf		ps4_answers.pdf
reinforcement_learning.Rproj		reinforcement_learning.Rproj
requirements.R		requirements.R
results.Rmd		results.Rmd
softmax.Rmd		softmax.Rmd

pat-alt/reinforcement_learning

Folders and files

Latest commit

History

Repository files navigation

Dynamic Programming

Multi-Armed Bandit problems

Gaussian Processes

Bayesian Optimization

Approximate Dynamic Programming

References

Session Info

About

Topics

Resources

Stars

Watchers

Forks

Languages