Skip to content

Releases: vwxyzjn/cleanrl

v1.0.0 CleanRL Release 🎉

14 Nov 04:06
c37a3ec
Compare
Choose a tag to compare

🎉 We are thrilled to announce the v1.0.0 CleanRL Release. Along with our CleanRL paper's recent publication in Journal of Machine Learning Research, our v1.0.0 release includes reworked documentation, new algorithm variants, support for google's new ML framework JAX, hyperparameter tuning utilities, and more. CleanRL has come a long way making high-quality deep reinforcement learning implementations easy to understand and reproducible. This release is a major milestone for the project and we are excited to share it with you. Over 90 PRs were merged to make this release possible. We would like to thank all the contributors who made this release possible.

Reworked documentation

One of the biggest change of the v1 release is the added documentation at docs.cleanrl.dev. Having great documentation is important for building a reliable and reproducible project. We have reworked the documentation to make it easier to understand and use. For each implemented algorithm, we have documented as much as we can to promote transparency:

Here is a list of the algorithm variants and their documentation:

Algorithm Variants Implemented
Proximal Policy Gradient (PPO) ppo.py, docs
ppo_atari.py, docs
ppo_continuous_action.py, docs
ppo_atari_lstm.py, docs
ppo_atari_envpool.py, docs
ppo_atari_envpool_xla_jax.py, docs
ppo_procgen.py, docs
ppo_atari_multigpu.py, docs
ppo_pettingzoo_ma_atari.py, docs
ppo_continuous_action_isaacgym.py, docs
Deep Q-Learning (DQN) dqn.py, docs
dqn_atari.py, docs
dqn_jax.py, docs
dqn_atari_jax.py, docs
Categorical DQN (C51) c51.py, docs
c51_atari.py, docs
Soft Actor-Critic (SAC) sac_continuous_action.py, docs
Deep Deterministic Policy Gradient (DDPG) ddpg_continuous_action.py, docs
ddpg_continuous_action_jax.py, docs
Twin Delayed Deep Deterministic Policy Gradient (TD3) td3_continuous_action.py, docs
td3_continuous_action_jax.py, docs
Phasic Policy Gradient (PPG) ppg_procgen.py, docs
Random Network Distillation (RND) ppo_rnd_envpool.py, docs

We also improved the contribution guide to make it easier for new contributors to get started. We are still working on improving the documentation. If you have any suggestions, please let us know in the GitHub Issues.

New algorithm variants, support for JAX

We now support JAX-based learning algorithm variants, which are usually faster than the torch equivalent! Here are the docs of the new JAX-based DQN, TD3, and DDPG implementations:

For example, below are the benchmark of DDPG + JAX (see docs here for further detail):

Other new algorithm variants include multi-GPU PPO, PPO prototype that works with Isaac Gym, multi-agent Atari PPO, and refactored PPG and PPO-RND implementations:

Read more

v1.0.0b2 JAX Support and Hyperparameter Tuning

03 Oct 19:41
49168b8
Compare
Choose a tag to compare

🎉 I am thrilled to announce the v1.0.0b2 CleanRL Beta Release. This new release comes with exciting new features. First, we now support JAX-based learning algorithms, which are usually faster than the torch equivalent! Here are the docs of the new JAX-based DQN, TD3, and DDPG implementations:

image

Also, we now have preliminary support for hyperparameter tuning via optuna (see docs), which is designed to help researchers to find a single set of hyperparameters that work well with a kind of games. The current API looks like below:

import optuna
from cleanrl_utils.tuner import Tuner
tuner = Tuner(
    script="cleanrl/ppo.py",
    metric="charts/episodic_return",
    metric_last_n_average_window=50,
    direction="maximize",
    aggregation_type="average",
    target_scores={
        "CartPole-v1": [0, 500],
        "Acrobot-v1": [-500, 0],
    },
    params_fn=lambda trial: {
        "learning-rate": trial.suggest_loguniform("learning-rate", 0.0003, 0.003),
        "num-minibatches": trial.suggest_categorical("num-minibatches", [1, 2, 4]),
        "update-epochs": trial.suggest_categorical("update-epochs", [1, 2, 4, 8]),
        "num-steps": trial.suggest_categorical("num-steps", [5, 16, 32, 64, 128]),
        "vf-coef": trial.suggest_uniform("vf-coef", 0, 5),
        "max-grad-norm": trial.suggest_uniform("max-grad-norm", 0, 5),
        "total-timesteps": 100000,
        "num-envs": 16,
    },
    pruner=optuna.pruners.MedianPruner(n_startup_trials=5),
    sampler=optuna.samplers.TPESampler(),
)
tuner.tune(
    num_trials=100,
    num_seeds=3,
)

Besides, we added support for new algorithms/environments, which are

I would like to cordially thank the core dev members @dosssman @yooceii @dipamc @kinalmehta for their efforts in helping maintain the CleanRL repository. I would also like to give a shout-out to our new contributors @cool-RR, @Howuhh, @jseppanen, @joaogui1, @kinalmehta, and @ALPH2H.

New CleanRL Supported Publications

Jiayi Weng, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor Makoviychuk, Zichen Liu, Yufan Song, Ting Luo, Yukun Jiang, Zhongwen Xu, & Shuicheng YAN (2022). EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=BubxnHpuMbG

New Features PR

Bug Fixes PR

Documentation PR

Misc PR

New Contributors

Full Changelog: v1.0.0b1...v1.0.0b2

v1.0.0b1 CleanRL Beta Release 🎉

07 Jun 00:18
ee262da
Compare
Choose a tag to compare

🎉 I am thrilled to announce the v1.0.0b1 CleanRL Beta Release. CleanRL has come a long way making high-quality deep reinforcement learning implementations easy to understand. In this release, we have put a huge effort into revamping our documentation site, making our implementation friendly to use for new users.

I would like to cordially thank the core dev members @dosssman @yooceii @Dipamc77 @bragajj for their efforts in helping maintain the CleanRL repository. I would also like to give a shout-out to our new contributors @ElliotMunro200 and @Dipamc77.

New CleanRL supported publications

New algorithm variants

Refactoring changes

Documentation changes

A significant amount of documentation changes (tracked by #121).

See the overview documentation page here: https://docs.cleanrl.dev/rl-algorithms/overview/

Misclanouse changes

Utility changes

New Contributors

Full Changelog: v0.6.0...v1.0.0b1

v0.6.0 Major Refactoring

16 Mar 15:07
d5256e4
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.5.0...v0.6.0

v0.5.0

12 Nov 16:01
679e498
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.4.8...v0.5.0

v0.4.8

16 May 02:41
Compare
Choose a tag to compare
update docs

v0.4.7

12 May 14:26
Compare
Choose a tag to compare
Merge branch 'master' of https://github.com/vwxyzjn/cleanrl

v0.4.6

12 May 13:39
Compare
Choose a tag to compare
cloud integration

v0.4.5

19 Apr 04:32
Compare
Choose a tag to compare
update setup.py

v0.4.4

16 Apr 16:28
Compare
Choose a tag to compare
v0.4.4 Pre-release
Pre-release
add reproduce utility script