TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching (L4DC 2023)

Jason Yecheng Ma*, Kausik Sivakumar*, Jason Yan, Osbert Bastani, Dinesh Jayaraman

University of Pennsylvania

This is the official repository of the L4DC 2023 paper TOM, a policy aware model learning method for Model-Based reinforcement learning. This repository also contains examples of running TOM as well as other baselines mentioned in the paper on standard Mujoco environments.

Setup instructions

We recommend installing required packages over a virtual environment. This repo requires Python v3.7.

Install appropriate version of PyTorch
Download mjpro 150 binaries from https://www.roboti.us/download.html
Extract the file to ~/.mujoco/mjpro150
Install mujoco_py 1.50 from source https://github.com/openai/mujoco-py/releases/tag/1.50.1.0
Install gym - conda install -c conda-forge gym
conda install -c conda-forge tqdm
conda install -c conda-forge matplotlib=3.5.1

Mujoco experiments

Run the following command with different values for the <env_name> and <mehod_name>

python mbpo.py --env <env_name> --method <method_name> --seed $SEED --use_disc  --f "chi"

<env_name> could be set to any of "Hopper-v2","Walker2d-v2","HalfCheetah-v2","Ant-v2","Humanoid-v2"
<method_name> could be set to any of "tom","mbpo","litm","vaml" to run the method TOM(ours), MBPO, PMAC, and VaGram respectively

We also support running TOM without a discriminator. This just presets a reward 1 for transitions from the current policy's rollouts and 0 to historical rollouts in the buffer. To run TOM without a discriminator, run

python mbpo.py --env <env_name> --method "tom" --seed $SEED --no_disc  --f "chi"

Methods MBPO, PMAC, and VaGram are the baselines against which we have evaluated our policy aware model learning method TOM in the paper.

Contact

If you have any questions regarding the code or paper, feel free to send all correspondences to kausik@seas.upenn.edu or jasonyma@seas.upenn.edu.

Acknowledgment

This code has been partially adapted from MBPO; We thank the authors and contributors for open-sourcing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
env		env
sac		sac
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
discriminator_pytorch.py		discriminator_pytorch.py
mbpo.py		mbpo.py
mdn.py		mdn.py
networks_pytorch.py		networks_pytorch.py
predict_env.py		predict_env.py
run_sac.py		run_sac.py
sample_env.py		sample_env.py
smodice_pytorch.py		smodice_pytorch.py
train_tom_disc .sh		train_tom_disc .sh
train_tom_nodisc.sh		train_tom_nodisc.sh
utils.py		utils.py
vaml.py		vaml.py

kausiksivakumar/TOM

Folders and files

Latest commit

History

Repository files navigation

TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching (L4DC 2023)

Setup instructions

Mujoco experiments

Contact

Acknowledgment

About

Topics

Resources

Stars

Watchers

Forks

Languages