Planning #1

bstee615 · 2020-11-24T07:09:32Z

Components

Implement the main agent with Trust Region Policy Optimization (TRPO, see Link)
- Use InvertedPendulum environment in OpenAI Gym
- Use Stable Baselines implementation for TRPO
- Use hyperparameters below
Extend InvertedPendulum's Gym environment with adversarial actions
- See new environment docs, MuJoCoPy docs, and source
Add adversarial training to the main agent training loop (see Section 3.3, Algorithm 1)

OpenAI Gym & MuJoCo
- Tasks are InvertedPendulum, HalfCheetah, Swimmer, Hopper, Walker2d, and Ant
Custom Gym environment for adversarial actions
Custom training code built with rllab

bstee615 mentioned this issue Nov 24, 2020

Main agent #2

Open

3 tasks