Skip to content
/ PPO Public

A concise PyTorch implementation of Proximal Policy Optimization(PPO) solving CartPole-v0

License

Notifications You must be signed in to change notification settings

RPC2/PPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyTorch Implementation of Proximal Policy Optimization (PPO)

Result

result

OpenAI defines CartPole as solved "when the average reward is greater than or equal to 195.0 over 100 consecutive trials."

Hyperparameter used

gamma = 0.99

lambda = 0.95

update_freq = 1

k_epoch = 3

initial_learning_rate = 0.02

eps_clip = 0.2

v_coef = 1

entropy_coef = 0.01

References

Proximal Policy Optimization Algorithms

seungeunrho/minimalRL

About

A concise PyTorch implementation of Proximal Policy Optimization(PPO) solving CartPole-v0

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages