Adaptive MCTS modifications in AlphaZero algorithm for single player, deterministic environments

AlphaZero algorithm modified to be compatible with single-player deterministic environments with adaptive return normalization in Monte Carlo tree search (MCTS) and optionally adaptive root return variance based MCTS iterations.

Requirements

Code runs with Tensorflow 1.4, requirements listed in requirements.txt

Tutorial

Training scripts are assumed to be specified in json format in /hypes folder.
Training can be run in parallel with loop_hyper_iters.py script.

example how to run on racecar problem:

python loop_hyper_iters.py --environment 'hypes' --base_json hypes/alphago_racecar.json --loop_json hypes/loop_config.json --train_file train_pybullet.py --outdir /out/path/ --thread_count 3

In loop_json hyperparameters to be used in grid search can be optimized. In base_json all the rest of hyperparamers can be specified.

Result visualizations

Results visualization tools can be found in the notebooks folder.

Examples of visualization for pybullet's 'Racecar-v0' environment. MCTS iterations is varied N={16,32,64,128}, 'adaptive' denotes adaptive MCTS iterations based on root return variance:

policy and value network predictions - changes in policy network prediction entropy and value network prediction during fixed training time and state space grid:
state space exploration - visualization of visited state space points in last 100 episodes of training

episodic rewards

References

AlphaZero implementation is based on https://github.com/tmoer/alphazero_singleplayer. @tmoer

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
hypes		hypes
models		models
notebooks		notebooks
optimizer		optimizer
results		results
tree_search		tree_search
utils		utils
.gitignore		.gitignore
README.md		README.md
loop_hyper_iters.py		loop_hyper_iters.py
requirements.txt		requirements.txt
train_openai.py		train_openai.py
train_pybullet.py		train_pybullet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hypes

hypes

models

models

notebooks

notebooks

optimizer

optimizer

results

results

tree_search

tree_search

utils

utils

.gitignore

.gitignore

README.md

README.md

loop_hyper_iters.py

loop_hyper_iters.py

requirements.txt

requirements.txt

train_openai.py

train_openai.py

train_pybullet.py

train_pybullet.py

Repository files navigation

Adaptive MCTS modifications in AlphaZero algorithm for single player, deterministic environments

Requirements

Tutorial

Result visualizations

References

About

Releases

Packages

Languages

ratponto/tree-rl-adaptive

Folders and files

Latest commit

History

Repository files navigation

Adaptive MCTS modifications in AlphaZero algorithm for single player, deterministic environments

Requirements

Tutorial

Result visualizations

References

About

Resources

Stars

Watchers

Forks

Languages