Skip to content

A new version of world models using Echo-state networks and random weight-fixed CNNs

Notifications You must be signed in to change notification settings

Shahdsaf/Semi-Supervised-World-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

World Models with PyTorch

A new version of world models using Echo-state networks and random weight-fixed CNNs in Pytorch. Also, the controller leverages RL algorithms, e.g. PPO methods.

Requirement

To run the code, you need

Semi-Supervised-World-Models Framework

Environment:

CarRacing-v0 from gym.

Models:

  • VM Model: The original world models idea by David Ha and Jürgen Schmidhuber. This model relies on training a VAE for encoding the static environment (single images as inputs). Then, a variant of Recurrent Neural Networks is trained to learn the temporal dynamics of the environment. Finally, an optimization algorithm, the so-called "Controller", is used for planning and learning an optimal policy, e.g. CMA-ES used in the original paper. However, in this project one of the RL methods is used for the controller, namely Proximal Policy Optimization Algorithms (PPO).
  • RCRC Model: A modified version of world models by Hanten Chang, Katsuya Futagami. It relies on a random fixed-weight CNN for encoding the static environment (single images as inputs). Then, a variant of Reservoir Computing Models, namely Echo-State Networks (ESNs), is used to learn the temporal dynamics of the environment, considering it embeds recurrent features and doesn't require training. Finally, an optimization algorithm, the so-called "Controller", is used for planning and learning an optimal policy, e.g. CMA-ES used in the original paper. However, in this project one of the RL methods is used for the controller, namely Proximal Policy Optimization Algorithms (PPO).
  • VRC Model: A novel modified version of RCRC, we implemented in this project and tested it out. It combines both VM and RCRC in one approach using a pretrained VAE for encoding the static environment (single images as inputs) and Echo-State Networks to learn the temporal dynamics of the environment. Furthermore, PPO is used for the controller.

For further information and details, please refer to our paper. Also, for up-to-date results, please refer to our presentation.

Training

To train the agent, run python train.py --render, given each model has its own train.py script.

To test, run python test.py --render under each model subfolder.

Performance

Acknowledgement

Our project relied highly on the following repos:

Authors

Shahd Safarani and Yiyao Wei