Baxter-VREP

TD3 Algorithm

The TD3 algorithm uses two critic networks and selects the smallest value for the target network. To prevent overestimation of policies propogating errorthe policy network is updated after a set number of timesteps and the value network is updated after each time step. Variance will be lower in policy network leading to more stable and efficient training and ultimately a better quality policy. For this implementation, the actor network is updated every 2 timesteps. The policy is smoothed by adding random noise and averaging over mini-batches to reduce the variance caused by overfitting.

Project assumptions

Papers Consulted

Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with Double Q-Learning. In Thirtieth AAAI conference on artificial intelligence(2016).
In order to reduce bias this method estimates the current Q value by using a separate target value function.
Hasselt, H. V. Double Q-Learning. In Advances in Neural Information ProcessingSystems(2010), 2613–2621.
In actor-critic networks the policy is updated very slowly making bias a concern. The older version of Double Q Learning uses clipped double Q learning. This takes the smaller value of the two critic networks (the better choice). Even though this promotes underestimation, this is not a concern because the small values will not propogate through the whole algorithm.
Fujimoto, S., van Hoof, H., and Meger, D. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477, 2018.
Original citation for the PyTorch implementation fo Twin Delayed Deep Deterministic Policy Gradients (TD3), source code
Schaul, T., Quan, J., Antonoglou, I., and Silver, D. Prioritized experience replay.arXiv preprint arXiv:1511.05952(2015).
Prioritized experience replay- See Overleaf article summary.

Code References

TD3 Algorithm Code from Towards Data Science implementation of Addressing function approximation error in actor-critic methods.
OpenAI Gym, Replay Buffer and Priority Replay Buffer
TD3 Implementation Used for TD3 algorithm implementation.
DQN code from Richard Lenz, UNF

Bellman Equation Notes

State: what the agent is observing at a time step
Action: the input the agent provides to the the environment, calculated by applying a policy to the state
Reward: the feedback for the action

Links

Getting started with Vrep
CoppeliaSim User Manual
Vrep/Python instructions
ROS Robotics by Example Baxter reference for ROS including: joint angles,... (download the book)[https://drive.google.com/open?id=11UpOH1fZd1qhXr9i8tEyVa1g4NVmL-me]

Exporting Virtual Environment Packages

Export a list of packages

pip freeze > requirements.txt

Install packages

$ virtualenv <env_name>
$ source <env_name>/bin/activate
(<env_name>)$ pip install -r path/to/requirements.txt

Run VREP Headless

Launch VREP with following command. (You'll need to update the path to the vrep file.)

V-REP/vrep.sh -h -q /home/cislocal/Jupyter/V-REP_Scenes/baxter.ttt -gREMOTEAPISERVERSERVICE_19999_FALSE_FALSE

Then in the Vrep_SIM class I had the following line to start the simulation:

errorCode = vrep.simxStartSimulation(self.clientID, vrep.simx_opmode_oneshot_wait)

I placed this line after the print('Connected to remote API server') in the Vrep_SIM class.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.idea		.idea
analysis		analysis
diagrams		diagrams
td3		td3
x-archive -simulations		x-archive -simulations
x-archive-code-snips		x-archive-code-snips
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
baxter-with-suction_ver4.ttt		baxter-with-suction_ver4.ttt
evaluate.py		evaluate.py
model-tester.py		model-tester.py
notes.txt		notes.txt
remoteApi.dll		remoteApi.dll
requirements.txt		requirements.txt
run-model.py		run-model.py
sim.py		sim.py
simConst.py		simConst.py
td3_main.py		td3_main.py
utils.py		utils.py
vrepsim.py		vrepsim.py

CharlotteMorrison/Baxter-VREP-Version-2

Folders and files

Latest commit

History

Repository files navigation

Baxter-VREP

TD3 Algorithm

Project assumptions

Papers Consulted

Code References

Bellman Equation Notes

Links

Exporting Virtual Environment Packages

Run VREP Headless

About

Resources

Stars

Watchers

Forks

Languages