Skip to content

Latest commit

 

History

History

week09_policy_II

Materials

This section covers some steroids for policy gradient methods, along with a cool general trick called

Practice

  • TRPO: Open In Colab

  • PPO: Open In Colab

More: Reinforcement learning in large/continuous action spaces

While you already know algorithms that will work with continuously many actions, it can't hurt to learn something more specialized.

  • Lecture by J. Schulman - video
  • Q-learning with normalized advantage functions - article, code1, code2
  • Deterministic policy gradient - article, post+code
  • Stochastic value gradient - article
  • Embedding large discrete action spaces for RL - article
  • Lecture by A. Seleznev, 5vision (russian) - video