Distributed reinforcement learning based on Ray

A PyTorch implementation of reinforcement lerning algorithms

What implemented

DQN
DDQN
Distributed
Prior replaybuffer

What implementing

R2D2

What to implement

Multi-step learning
...

What special compared to other implementions

Compressing is used before observations saved into replay buffer, so only 1~3GB RAM will be used to keep 1M history observations.

Some result

DQN
DDQN

Game Zaxxon Asterix WizardOfWor

Result

Noop Start Score 10910.5 25500.75 6034.0
Prior DDQN

Game Zaxxon Asterix WizardOfWor

Result

Noop Start Score 16021.5 41296.25 10928.0

How to use

DQN with an uniform replay buffer

nohup python -u main.py Asterix --alg=DQN --buffer=mmdb --num_agents=4 --num_loaders=6 --batch_size=256 --lr=0.625e-4 --suffix="DQN" --speed=8 >train.txt 2>&1 &
DDQN with an uniform replay buffer

nohup python -u main.py Asterix --alg=DDQN --buffer=mmdb --num_agents=4 --num_loaders=6 --batch_size=256 --lr=0.625e-4 --suffix="DDQN" --speed=8 >train.txt 2>&1 &
DDQN with a prior replay buffer

nohup python -u main.py Asterix --alg=DDQN --buffer=pmdb --num_agents=4 --num_loaders=6 --batch_size=256 --lr=0.150e-4 --suffix="DDQN" --speed=8 >train.txt 2>&1 &
Test with a trained model

python main.py WizardOfWor --test --suffix="DDQN_gn_normal0" --resume ./model/DQN_BasicNet/WizardOfWorNoFrameskip-v4/DDQN_gn_normal0/iter_3600000K.pkl

Some notes

Group norm is added to origin Deepmind's network for less hyperparameter tuning and more consistent learning curve. You will have a insight to the consistent problem in Origin network with ut test "test_convengence" in ./test/test_opt.py even simply using Pong env.
The code supports one machine multi-gpu training, but not supports multi-machine training
On AMD 2700 and 2070S it takes about 2 days to gather 200M enveriment steps with 4 agents and train the model.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
exp		exp
network		network
test		test
utils		utils
Zaxxon.gif		Zaxxon.gif
agent.py		agent.py
main.py		main.py
policy_optimizer.py		policy_optimizer.py
readme.md		readme.md
run.sh		run.sh
schedule.py		schedule.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp

exp

network

network

test

test

utils

utils

Zaxxon.gif

Zaxxon.gif

agent.py

agent.py

main.py

main.py

policy_optimizer.py

policy_optimizer.py

readme.md

readme.md

run.sh

run.sh

schedule.py

schedule.py

train.py

train.py

Repository files navigation

Distributed reinforcement learning based on Ray

What implemented

What implementing

What to implement

What special compared to other implementions

Some result

How to use

Some notes

About

Releases

Packages

Languages

Game	Zaxxon	Asterix	WizardOfWor
Result
Noop Start Score	10910.5	25500.75	6034.0

Game	Zaxxon	Asterix	WizardOfWor
Result
Noop Start Score	16021.5	41296.25	10928.0

fangyuedong/rainbow-with-ray

Folders and files

Latest commit

History

Repository files navigation

Distributed reinforcement learning based on Ray

What implemented

What implementing

What to implement

What special compared to other implementions

Some result

How to use

Some notes

About

Topics

Resources

Stars

Watchers

Forks

Languages