Resources used are listen in each ipynb file
The ES + A2C shows early convergence and also more stability over episodes.
The ES algorithm used is from Evolution-Guided Policy Gradient in Reinforcement Learning - https://arxiv.org/abs/1805.07917
Implementating preliminary RL Algoirthms:
- DQN (https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)
- Sample Efficient Actor-Critic with Experience Replay (https://arxiv.org/abs/1611.01224)
- Evolutionary Strategies (https://arxiv.org/abs/1703.03864)