DDPG

Continuous control with deep reinforcement learning

The goal of these algorithms is to perform policy iteration by alternatively performing policy evaluation on the current policy with Q-learning, and then improving upon the current policy by following the policy gradient.