Imitating Unknown Policies via Exploration (IUPE)

Official Pytorch implementation of Imitating Unknown Policies via Exploration

Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations. Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions. However, the iterative learning scheme from these techniques are prone to getting stuck into bad local minima. We address these limitations incorporating a two-phase model into the original framework, which learns from unlabeled observations via exploration, substantially improving traditional behavioral cloning by exploiting (i) a sampling mechanism to prevent bad local minima, (ii) a sampling mechanism to improve exploration, and (iii) self-attention modules to capture global features.

Imitating Unknown Policies via Exploration (IUPE) combines both an Inverse Dynamics Model (IDM) to infer actions in a self-supervised fashion, and a Policy Model (PM), which is a function that tells the agent what to do in each possible state of the environment. IUPE further augments the Behavioral Cloning from Observations framework with two strategies for avoiding local minima, sampling and exploration, and with self-attention modules for improving the learning of global features and, hence, generalization.

Downloading the data

You can download the data we used to train our models here.

Training IUPE

After downloading the expert demonstration, you can then train IUPE. There are several training scripts in the directory.

./scripts/iupe_3  # Maze 3x3
./scripts/iupe_5  # Maze 5x5
./scripts/iupe_10  # Maze 10x10
./scripts/iupe_acrobot  # Acrobot
./scripts/iupe_cartpole  # Cartpole
./scripts/iupe_mountaincar  # Mountaincar

We ran IUPE on a server, if you are running locally you might want to remove xvfb-run -a -s "-screen 0 1400x900x24" from the scripts.

Results

Performance and Average Episode Reward for our approach and related work:

Models	Metrics	CartPole	Acrobot	MountainCar	Maze 3x3	Maze 5x5	Maze 10x10
Expert	P AER	1.000 442.628	1.000 -110.109	1.000 -147.265	1.000 0.963	1.000 0.970	1.000 0.981
Random	P AER	0.000 18.700	0.000 -482.600	0.000 -200.000	0.000 0.557	0.000 0.166	0.000 -0.415
BC	P AER	1.135 500.000	1.071 -83.590	1.560 -117.720	-1.207 0.180	-0.921 -0.507	-0.470 -1.000
BCO	P AER	1.135 500.000	0.980 -117.600	0.948 -150.000	0.883 0.927	-0.112 0.104	-0.416 -0.941
ILPO	P AER	1.135 500.000	1.067 -85.300	0.626 -167.000	-1.711 -0.026	-0.398 -0.059	0.257 -0.020
IUPE	P AER	1.135 500.000	1.086 -78.100	1.314 -130.700	1.361 0.927	1.000 0.971	1.000 0.981

Citation

@inproceedings{GavenskiEtAl2020bmvc,
  author    = {Gavenski, Nathan and
               Monteiro, Juarez and 
               Granada, Roger and 
               Meneguzzi, Felipe and 
               Barros, Rodrigo C.},
  title     = {Imitating Unknown Policies via Exploration},
  booktitle = {Proceedings of the 31st British Machine Vision Conference},
  series    = {BMVC 2020},
  location  = {Manchester, UK},
  pages     = {1--8},
  url       = {},
  month     = {September},
  year      = {2020},
  publisher = {BMVA Press}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
datasets		datasets
images		images
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

images

images

models

models

scripts

scripts

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

test.py

test.py

train.py

train.py

Repository files navigation

Imitating Unknown Policies via Exploration (IUPE)

Downloading the data

Training IUPE

Results

Citation

About

Releases

Packages

Contributors 3

Languages

License

NathanGavenski/IUPE

Folders and files

Latest commit

History

Repository files navigation

Imitating Unknown Policies via Exploration (IUPE)

Downloading the data

Training IUPE

Results

Citation

About

Resources

License

Stars

Watchers

Forks

Languages