On-Policy Actor-Critic methods

An implementation of the following on-policy actor-critic methods: Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO). The implementation is based on the following papers:

Examples of Trained Agents

InvertedDoublePendulum-v4 (PPO ~ 300k frames)

InvertedDoublePendulum-v4-300k.mp4

HalfCheetah-v4 (PPO ~ 200k frames)

HalfCheetah-v4-200k.mp4

Running the code

Installation

To install all dependencies, run the following command:

pip install -r requirements.txt

Training

To train the agent, run the following command:

python source/train_agent.py [Training Options] [PPO Options]

Training Options:

--run_name (str): Name of the run.
--algorithm ({A2C,PPO}) : Type of algorithm to use for training.
--env_id (str): Id of the environment to train on.
--perform_testing: Whether to perform testing after training.
--log_video: Whether to log video of agent's performance.
--max_epochs (int) (default: 3): Maximum number of steps to train for.
--steps_per_epoch (int): Number of steps to train for per epoch.
--num_envs (int): Number of environments to train on.
--num_rollout_steps (int): Number of steps to rollout policy for.
--optimizer ({Adam,RMSprop,SGD}): Optimizer to use for training.
--learning_rate (float): Learning rate for training.
--lr_decay (float): Learning rate decay for training.
--weight_decay (float): Weight decay (L2 regularization) for training.
--gamma (float): Discount factor.
--gae_lambda (float): Lambda parameter for Generalized Advantage Estimation (GAE).
--value_coef (float): Coefficient for value loss.
--entropy_coef (float): Coefficient for entropy loss.
--max_grad_norm (float): Maximum gradient norm for clipping.
--init_std (float): Initial standard deviation for policy.
--hidden_size (int): Hidden size for policy.
--shared_extractor: Whether to use a shared feature extractor for policy.

PPO Options:

--ppo_batch_size (int): Batch size for Proximal Policy Optimization (PPO).
--ppo_epochs (int): Number of epochs to train PPO for.
--ppo_clip_ratio (float): Clip ratio for PPO.
--ppo_clip_anneal: Whether to anneal the clip ratio for PPO.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.vscode		.vscode
notebooks		notebooks
results		results
run_scripts		run_scripts
source		source
videos		videos
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

notebooks

notebooks

results

results

run_scripts

run_scripts

source

source

videos

videos

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

On-Policy Actor-Critic methods

Examples of Trained Agents

InvertedDoublePendulum-v4 (PPO ~ 300k frames)

HalfCheetah-v4 (PPO ~ 200k frames)

Running the code

Installation

Training

Training Options:

PPO Options:

About

Releases

Packages

Languages

Rmko4/RL-On-Policy-Actor-Critic

Folders and files

Latest commit

History

Repository files navigation

On-Policy Actor-Critic methods

Examples of Trained Agents

InvertedDoublePendulum-v4 (PPO ~ 300k frames)

HalfCheetah-v4 (PPO ~ 200k frames)

Running the code

Installation

Training

Training Options:

PPO Options:

About

Topics

Resources

Stars

Watchers

Forks

Languages