Skip to content

CharlotteMorrison/Baxter-Research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Baxter-Research

Multi-agent deep reinforcement learning research project

TD3 Algorithm

The TD3 algorithm uses two critic networks and selects the smallest value for the target network. To prevent overestimation of policies propogating errorthe policy network is updated after a set number of timesteps and the value network is updated after each time step. Variance will be lower in policy network leading to more stable and efficient training and ultimately a better quality policy. For this implementation, the actor network is updated every 2 timesteps. The policy is smoothed by adding random noise and averaging over mini-batches to reduce the variance caused by overfitting.

TD3 Algorithm from https://towardsdatascience.com/td3-learning-to-run-with-ai-40dfc512f93

  1. Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with Double Q-Learning. In Thirtieth AAAI conference on artificial intelligence(2016).
    In order to reduce bias this method estimates the current Q value by using a separate target value function.
  2. Hasselt, H. V. Double Q-Learning. In Advances in Neural Information ProcessingSystems(2010), 2613–2621.
    In actor-critic networks the policy is updated very slowly making bias a concern. The older version of Double Q Learning uses clipped double Q learning. This takes the smaller value of the two critic networks (the better choice). Even though this promotes underestimation, this is not a concern because the small values will not propogate through the whole algorithm.
  3. Fujimoto, S., van Hoof, H., and Meger, D. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477, 2018.
    Original citation for the PyTorch implementation fo Twin Delayed Deep Deterministic Policy Gradients (TD3), source code
  • Add to Overleaf summaries
  • Upload to shared articles
  1. Schaul, T., Quan, J., Antonoglou, I., and Silver, D. Prioritized experience replay.arXiv preprint arXiv:1511.05952(2015).
    Prioritized experience replay- See Overleaf article summary.

Code References

  1. TD3 Algorithm Code from Towards Data Science implementation of Addressing function approximation error in actor-critic methods.
  2. OpenAI Gym, Replay Buffer and Priority Replay Buffer
  3. ROS Robotics by Example Baxter reference for ROS including: joint angles,... (download the book)[https://drive.google.com/open?id=11UpOH1fZd1qhXr9i8tEyVa1g4NVmL-me]
  4. TD3 Implementation Used for TD3 algorithm implementation.

About

Multi-agent deep reinforcement learning research project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages