reinforcement-learning-algorithm

object oriented: all of the rl agents using the same framework(base class Agent), makes it easy to read and understand
perfect reproduction: training result would be exactly the same under the same random seed

algorithm implemented and corresponding paper

DQN Playing Atari with Deep Reinforcement Learning, Human-level control through deep reinforcement learning
DDQN Deep Reinforcement Learning with Double Q-learning
Dueling DQN Dueling Network Architectures for Deep Reinforcement Learning
DDQN with prioritized experience replay Prioritized Experience Replay
REINFORCE(Monte-Carlo Policy Gradient, Vanilla Policy Gradient)
REINFORCE with BASELINE
DDPG Continuous control with deep reinforcement learning
TD3 Addressing Function Approximation Error in Actor-Critic Methods
SAC Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , Soft Actor-Critic Algorithms and Applications
PPO Proximal Policy Optimization Algorithms
A3C Asynchronous Methods for Deep Reinforcement Learning

online result

training result of the agent trying to solve a problem from a scratch

CartPole-v1

MountainCar-v0

the original environment is hard to converge,
so I modify the reward to solve this problem and get the result below

LunarLander-v2

Acrobot-v1

Pendulum-v0

Note that there is no goal for Pendulum-v0, but as you can see in the result, the agent did learn something

HalfCheetah-v3

offline result

online training is not always stable
sometimes the agent gets a high reward(or running reward)
then its performance would decline rapidly.
so I choose some policy during the training to test the agent's performance

CartPole-v1

MountainCar-v0

though training on a modified environment,
I still use the original one to test the policy,
thus illustrate the result the learning

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
policy_based		policy_based
results		results
test		test
utils		utils
value_based		value_based
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

policy_based

policy_based

results

results

test

test

utils

utils

value_based

value_based

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

reinforcement-learning-algorithm

algorithm implemented and corresponding paper

online result

CartPole-v1

MountainCar-v0

LunarLander-v2

Acrobot-v1

Pendulum-v0

HalfCheetah-v3

offline result

CartPole-v1

MountainCar-v0

LunarLander-v2

Acrobot-v1

Pendulum-v0

HalfCheetah-v3

inspired by

About

Languages

Git-123-Hub/reinforcement-learning-algorithm

Folders and files

Latest commit

History

Repository files navigation

reinforcement-learning-algorithm

algorithm implemented and corresponding paper

online result

CartPole-v1

MountainCar-v0

LunarLander-v2

Acrobot-v1

Pendulum-v0

HalfCheetah-v3

offline result

CartPole-v1

MountainCar-v0

LunarLander-v2

Acrobot-v1

Pendulum-v0

HalfCheetah-v3

inspired by

About

Topics

Resources

Stars

Watchers

Forks

Languages