Skip to content

SB3-Contrib v2.3.0: New defaults hyperparameters for QR-DQN

Latest
Compare
Choose a tag to compare
@araffin araffin released this 31 Mar 18:41
· 1 commit to master since this release
5102922

Breaking Changes:

  • Upgraded to Stable-Baselines3 >= 2.3.0
  • The default learning_starts parameter of QRDQN have been changed to be consistent with the other offpolicy algorithms
# SB3 < 2.3.0 default hyperparameters, 50_000 corresponded to Atari defaults hyperparameters
# model = QRDQN("MlpPolicy", env, learning_starts=50_000)
# SB3 >= 2.3.0:
model = QRDQN("MlpPolicy", env, learning_starts=100)

New Features:

  • Added rollout_buffer_class and rollout_buffer_kwargs arguments to MaskablePPO
  • Log success rate rollout/success_rate when available for on policy algorithms

Others:

  • Fixed train_freq type annotation for tqc and qrdqn (@Armandpl)
  • Fixed sb3_contrib/common/maskable/*.py type annotations
  • Fixed sb3_contrib/ppo_mask/ppo_mask.py type annotations
  • Fixed sb3_contrib/common/vec_env/async_eval.py type annotations

Documentation:

  • Add some additional notes about MaskablePPO (evaluation and multi-process) (@icheered)

Full Changelog: v2.2.1...v2.3.0