GitHub - tedmoskovitz/TOP: Implementation of Tactical Optimistic and Pessimistic value estimation

Tactical Optimistic and Pessimistic estimation (TOP)

Implementation of TOP, an off-policy deep actor-critic framework for continuous control, from our paper Tactical Optimism and Pessimism for Deep Reinforcement Learning.

Running Mujoco:

python train_top_agent.py

We've also included the saved runs across 10 seeds for each environment from the paper in the runs folder. Each file contains the reward curves used for Figure 3, and is structured as a 10 x 1000 matrix, with each row representing a different seed.

TOP-TD3 is built on top of the fantastic TD3 implementation by Philip Ball.

Running DM Control Suite

python top_train.py

TOP-RAD is built on top of the original RAD implementation by Misha Laskin--the majority of the files are unchanged from the original repository.

We plan to add the saved training data from the DM Control experiments (as we have for the Mujoco experiments) soon!

Requirements:

PyTorch >= 1.6.0
Tensorboard
Mujoco_py >= 2.0.2.13 (Mujoco only)
OpenAI Gym >= 0.15.7
DM Control suite (DM Control only)
dmc2gym (DM Control only)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
dmc		dmc
extras		extras
mujoco		mujoco
runs		runs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dmc

dmc

extras

extras

mujoco

mujoco

runs

runs

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Tactical Optimistic and Pessimistic estimation (TOP)

About

Releases

Packages

Languages

tedmoskovitz/TOP

Folders and files

Latest commit

History

Repository files navigation

Tactical Optimistic and Pessimistic estimation (TOP)

About

Resources

Stars

Watchers

Forks

Languages