Reinforcement Learning Methods with PyTorch

Try different reinforcement learning methods with PyTorch on the OpenAI Gym! All the algorithms are validated on Pendulum-v0.

Requirement

To run the code, you need:

torch 0.4
gym 0.10

Method

There are four versions of algorithms realized:

DDQN with discretized action space
DDPG with continuous action space
PPO with discretized action space
PPO with continuous action space Note that in PPO using value function to estimate advantages, which is different from the original one.

Result

The moving averaged episode rewards are shown as below:

The heatmaps of value and action are shown as below:

From the results, we find that value-based algorithums are data-efficient for they are off-policy. Discretized action space is easier to train but the result looks ugly (trembling).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
img		img
log		log
param		param
README.md		README.md
ddpg.py		ddpg.py
dqn.py		dqn.py
plot_heatmap.py		plot_heatmap.py
ppo.py		ppo.py
ppo_d.py		ppo_d.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

log

log

param

param

README.md

README.md

ddpg.py

ddpg.py

dqn.py

dqn.py

plot_heatmap.py

plot_heatmap.py

ppo.py

ppo.py

ppo_d.py

ppo_d.py

Repository files navigation

Reinforcement Learning Methods with PyTorch

Requirement

Method

Result

Reference

About

Releases

Packages

Languages

xtma/simple-pytorch-rl

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Methods with PyTorch

Requirement

Method

Result

Reference

About

Resources

Stars

Watchers

Forks

Languages