Go-Fresh: Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

This is the original implementation of the paper

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping [Project Page] [Paper]

by Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari

Prerequisites

1. Install MuJoCo

Download MuJoCo binaries v2.2.0 here
Unzip the downloaded archive into ~/.mujoco/
Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH

2. Create conda environment

conda env create -f conda_env.yml
conda activate go-fresh

3. Install mujoco-maze from source

git clone https://github.com/kngwyu/mujoco-maze.git
cd mujoco-maze
pip install --no-deps -e .

Generate Data

The data used to train our model and baselines can be generated as follows:

Maze

python -m go_fresh.generate_data --env maze --ep-len 1000

Pusher

python -m go_fresh.generate_data --env pusher --ep-len 200

Walker

Execute the following steps (from here) to download the dataset of exploration trajectories collected on the walker environment with the proto algorithm.

git clone https://github.com/denisyarats/exorl.git
cd exorl/
./download.sh walker proto
cd ..
mv ./datasets/walker/proto/buffer data/walker

Run the code

Baselines

To run baselines mentioned in the paper, HER, HER + random uniform action and Actionable Models, run the following command:

python -m go_fresh.main +exp=<ENV>_baseline replay_buffer.algo=<ALGO>

where ENV can be chosen in maze, walker, pusher, and ALGO in HER, HERu, AM.

Our Method

To reproduce our method's results, run

python -m go_fresh.main +exp=<ENV>_ours

where ENV can be chosen in maze, walker, pusher.

Details

To visualize training info with Weights&Biases, simply set parameter wandb.enable=True.
The seed can be chosen by setting the main.seed parameter. All experiments presented in the paper were ran with the following random seeds: [123, 234, 345].

Citation

If you use this repo in your research, please consider citing the paper as follows:

@inproceedings{mezghani2022learning,
  title={Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping},
  author={Mezghani, Lina and Sukhbaatar, Sainbayar and Bojanowski, Piotr and Lazaric, Alessandro and Alahari, Karteek},
  booktitle={CoRL-Conference on Robot Learning},
  year={2022}
}

License

go-fresh is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
conf		conf
go_fresh		go_fresh
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
conda_env.yml		conda_env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conf

conf

go_fresh

go_fresh

.gitignore

.gitignore

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

conda_env.yml

conda_env.yml

Repository files navigation

Go-Fresh: Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

Prerequisites

1. Install MuJoCo

2. Create conda environment

3. Install mujoco-maze from source

Generate Data

Maze

Pusher

Walker

Run the code

Baselines

Our Method

Details

Citation

License

About

Releases

Packages

Contributors 2

Languages

License

facebookresearch/go-fresh

Folders and files

Latest commit

History

Repository files navigation

Go-Fresh: Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

Prerequisites

1. Install MuJoCo

2. Create conda environment

3. Install mujoco-maze from source

Generate Data

Maze

Pusher

Walker

Run the code

Baselines

Our Method

Details

Citation

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages