BAC-DAC

An OpenAI Gym toolkit for continuous control with Bayesian Actor-critic reinforcement learning.

https://youtu.be/nkaAULbHVV4

Run sample/mountain_car_v0_no_jupyter.py

^^Notice - I am working on CUDA accelerated branch. I will update it here ASAP.

Pre-requisites

Packages

NumPy, SciPy
OpenAI Gym gym.py (no Mujoco yet)
Pandas, matplotlib
CUDA Toolkit 11.3 (for gpu-accelerated branch)
CuPy for CUDA 11.3 (for gpu-accelerated branch)

Hardware

At least Intel Core i3 3rd Gen (~ 1 hour simulation time for 500 BAC updates)
At least 4 GB DDR3 RAM
(only for GPU branch) Dedicated Nvidia GPU with Compute Capability > 3.0 (https://developer.nvidia.com/cuda-gpus)

Results

5 episodes per batch <

Thoughts

We see that it smoothly achieves the goal. Since this is continuous control, action_space = [-1.0, 1.0]. The agents above is more inclined to take action ~= 1.0. Running the sim for higher BAC updates would probably see the agent figure out how to take action ~= -1.0 once it is up-slope towards the GOAL. Currently, the sim is processor heavy, thus slow. I am working on CUDA acceleration to speed up the NumPy and SciPy operations.

References

Ghavamzadeh, Mohammad, Yaakov Engel, and Michal Valko. "Bayesian policy gradient and actor-critic algorithms." The Journal of Machine Learning Research 17.1 (2016): 2319-2371. Main ref
Ghavamzadeh, Mohammad, and Yaakov Engel. "Bayesian actor-critic algorithms." Proceedings of the 24th international conference on Machine learning. 2007.
Ciosek, Kamil, et al. "Better exploration with optimistic actor-critic." arXiv preprint arXiv:1910.12807 (2019).
Ghavamzadeh, Mohammad, et al. "Bayesian reinforcement learning: A survey." arXiv preprint arXiv:1609.04436 (2016).
Kurenkov, Andrey, et al. "Ac-teach: A bayesian actor-critic method for policy learning with an ensemble of suboptimal teachers." arXiv preprint arXiv:1909.04121 (2019).
Bhatnagar, Shalabh, et al. "Natural actor–critic algorithms." Automatica 45.11 (2009): 2471-2482.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
__pycache__		__pycache__
env		env
results		results
sample		sample
.gitignore		.gitignore
500_updates.gif		500_updates.gif
500_updates.mp4		500_updates.mp4
BAC.py		BAC.py
LICENSE		LICENSE
MSE_vs_MAE.png		MSE_vs_MAE.png
README.md		README.md
__init__.py		__init__.py
avg_length.png		avg_length.png
avg_reward.png		avg_reward.png

License

SSubhnil/BAC-DAC-gym

Folders and files

Latest commit

History

Repository files navigation

BAC-DAC

Pre-requisites

Packages

Hardware

Results

5 episodes per batch <

Thoughts

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages