OPOLO: Off-Policy Learning from Observations

Research code to accompany the paper: Off-Policy Imitation Learning from Observations.

Supported Algorithms:

OPOLO (the proposed algorithm in the paper).
Discriminator Actor Critic (DAC): paper and official code.
Generative Adversarial Imitation Learning (GAIL): paper and official code.
Behavior Cloning from Observations (BCO): paper and official code.
Generative Adversarial Imitation from Observation (GAIfO): paper and code (from other repositories ).
DACfO (proposed as a baseline in the paper).

Installation:

All code is built on the stable-baseline framework.

Prerequisites

Python(>=3.5), Cmake, and OpenMPI.
- Please install prerequisite by following this guideline.
Mujoco:
- Please follow this official instruction.

Install using pip:

cd opolo
pip install -e .

Training OPOLO:

Example: run OPOLO on the HalfCheetah-v2 task, using 4 expert trajectories, seed = 1:

cd opolo-code/opolo-baselines/run
python train_agent.py --env HalfCheetah-v2 --seed 1 --algo opolo  --task td3-opolo-idm-decay-reg --n-episodes 4 --log-dir your/absolute/log/path  --n-timesteps -1

The task tag must contain strings of idm, decay, and reg:
- idm: use inverse-action model.
- reg: use forward KL-divergence as regularization.
- decay: reduce the effects of the regularization over time.

Training Other Baselines:

Run DAC on the Hopper-v2 task, using 4 expert trajectories, seed = 3:

cd opolo-code/opolo-baselines/run
python train_agent.py --env Hopper-v2 --seed 3 --algo td3dac --log-dir your/absolute/log/path --task td3-dac --n-timesteps -1  --n-episodes 4

Run DACfO on the Walker2d-v2 task, using 10 expert trajectories, seed = 1:

cd opolo-code/opolo-baselines/run
python train_agent.py --env Walker2d-v2 --seed 1 --algo td3dacfo --log-dir your/absolute/log/path --task td3-dacfo --n-timesteps -1 --n-episodes 10

Run BCO on the Swimmer-v2 task, using 4 expert trajectories, seed = 1:

cd opolo-code/opolo-baselines/run
python train_agent.py --env Swimmer-v2 --seed 1 --algo td3bco --log-dir your/absolute/log/path --task td3-bco --n-timesteps -1 --n-episodes 4

Run GAIL on the Swimmer-v2 task, using 4 expert trajectories, seed = 1:

cd opolo-code/opolo-baselines/run
python train_agent.py --env Swimmer-v2 --seed 1 --algo trpogail --log-dir your/absolute/log/path --task trpo-gail --n-timesteps -1 --n-episodes 4

Run GAIfO on the Swimmer-v2 task, using 4 expert trajectories, seed = 3:

cd opolo-code/opolo-baselines/run
python train_agent.py --env Swimmer-v2 --seed 3 --algo trpogaifo --log-dir your/absolute/log/path --task trpo-gaifo --n-timesteps -1 --n-episodes 4

Evaluating Models

Assuming that you have completed the training of OPOLO on HalfCheetah using the above commands, with task = td3-opolo-idm-decay-reg .
Then you can run the following commands to evaluate the model:

cd opolo-code/opolo-baselines/run
python train_agent.py --env HalfCheetah-v2 --seed 1 --algo opolo --log-dir your/absolute/log/path --task eval-td3-opolo-idm-decay-reg  --n-timesteps -1 --n-episodes 4

Commands are same as training, except for the task flag, with task = eval- + {task-used-for-training}.

Reminders:

Expert Trajecotries can be found at:

opolo-code/opolo-baselines/expert_logs

Hyper-parameter settings can be found at:

opolo-code/opolo-baselines/hyperparams/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
opolo-baselines		opolo-baselines
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opolo-baselines

opolo-baselines

.gitignore

.gitignore

README.md

README.md

Repository files navigation

OPOLO: Off-Policy Learning from Observations

Supported Algorithms:

Installation:

Prerequisites

Install using pip:

Training OPOLO:

Training Other Baselines:

Evaluating Models

Reminders:

About

Releases

Packages

Languages

illidanlab/opolo-code

Folders and files

Latest commit

History

Repository files navigation

OPOLO: Off-Policy Learning from Observations

Supported Algorithms:

Installation:

Prerequisites

Install using pip:

Training OPOLO:

Training Other Baselines:

Evaluating Models

Reminders:

About

Resources

Stars

Watchers

Forks

Languages