This is the original implementation of the paper
Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping [Project Page] [Paper]
by Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari
1. Install MuJoCo
- Download MuJoCo binaries v2.2.0 here
- Unzip the downloaded archive into
~/.mujoco/
- Append the MuJoCo subdirectory bin path into the env variable
LD_LIBRARY_PATH
conda env create -f conda_env.yml
conda activate go-fresh
3. Install mujoco-maze from source
git clone https://github.com/kngwyu/mujoco-maze.git
cd mujoco-maze
pip install --no-deps -e .
The data used to train our model and baselines can be generated as follows:
python -m go_fresh.generate_data --env maze --ep-len 1000
python -m go_fresh.generate_data --env pusher --ep-len 200
Execute the following steps (from here) to download the dataset of exploration trajectories collected on the walker
environment with the proto
algorithm.
git clone https://github.com/denisyarats/exorl.git
cd exorl/
./download.sh walker proto
cd ..
mv ./datasets/walker/proto/buffer data/walker
To run baselines mentioned in the paper, HER, HER + random uniform action and Actionable Models, run the following command:
python -m go_fresh.main +exp=<ENV>_baseline replay_buffer.algo=<ALGO>
where ENV
can be chosen in maze
, walker
, pusher
, and ALGO
in HER
, HERu
, AM
.
To reproduce our method's results, run
python -m go_fresh.main +exp=<ENV>_ours
where ENV
can be chosen in maze
, walker
, pusher
.
-
To visualize training info with Weights&Biases, simply set parameter
wandb.enable=True
. -
The seed can be chosen by setting the
main.seed
parameter. All experiments presented in the paper were ran with the following random seeds: [123, 234, 345].
If you use this repo in your research, please consider citing the paper as follows:
@inproceedings{mezghani2022learning,
title={Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping},
author={Mezghani, Lina and Sukhbaatar, Sainbayar and Bojanowski, Piotr and Lazaric, Alessandro and Alahari, Karteek},
booktitle={CoRL-Conference on Robot Learning},
year={2022}
}
go-fresh
is CC-BY-NC 4.0 licensed, as found in the LICENSE
file.