Skip to content

facebookresearch/go-fresh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Go-Fresh: Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

This is the original implementation of the paper

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping [Project Page] [Paper]

by Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari

Prerequisites

1. Install MuJoCo

  • Download MuJoCo binaries v2.2.0 here
  • Unzip the downloaded archive into ~/.mujoco/
  • Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH

2. Create conda environment

conda env create -f conda_env.yml
conda activate go-fresh

3. Install mujoco-maze from source

git clone https://github.com/kngwyu/mujoco-maze.git
cd mujoco-maze
pip install --no-deps -e .

Generate Data

The data used to train our model and baselines can be generated as follows:

Maze

python -m go_fresh.generate_data --env maze --ep-len 1000

Pusher

python -m go_fresh.generate_data --env pusher --ep-len 200

Walker

Execute the following steps (from here) to download the dataset of exploration trajectories collected on the walker environment with the proto algorithm.

git clone https://github.com/denisyarats/exorl.git
cd exorl/
./download.sh walker proto
cd ..
mv ./datasets/walker/proto/buffer data/walker

Run the code

Baselines

To run baselines mentioned in the paper, HER, HER + random uniform action and Actionable Models, run the following command:

python -m go_fresh.main +exp=<ENV>_baseline replay_buffer.algo=<ALGO>

where ENV can be chosen in maze, walker, pusher, and ALGO in HER, HERu, AM.

Our Method

To reproduce our method's results, run

python -m go_fresh.main +exp=<ENV>_ours

where ENV can be chosen in maze, walker, pusher.

Details

  • To visualize training info with Weights&Biases, simply set parameter wandb.enable=True.

  • The seed can be chosen by setting the main.seed parameter. All experiments presented in the paper were ran with the following random seeds: [123, 234, 345].

Citation

If you use this repo in your research, please consider citing the paper as follows:

@inproceedings{mezghani2022learning,
  title={Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping},
  author={Mezghani, Lina and Sukhbaatar, Sainbayar and Bojanowski, Piotr and Lazaric, Alessandro and Alahari, Karteek},
  booktitle={CoRL-Conference on Robot Learning},
  year={2022}
}

License

go-fresh is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

About

Original code for the paper "Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping" by Mezghani et al.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages