DrS

Official implementation of

DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks by

Tongzhou Mu, Minghua Liu, Hao Su, (UC San Diego)

Overview

DrS (Dense reward learning from Stages) is a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner. By leveraging the stage structures of the task, DrS learns a high-quality dense reward from sparse rewards and demonstrations if given. The learned rewards can be reused in unseen tasks, thus reducing the human effort for reward engineering.

Extensive experiments on three physical robot manipulation task families with 1000+ task variants demonstrate that our learned rewards can be reused in unseen tasks, resulting in improved performance and sample efficiency of RL algorithms. The learned rewards even achieve comparable performance to human-engineered rewards on some tasks.

Installation

Install all dependencies via mamba or conda by running the following command:

mamba env create -f environment.yml
mamba activate drs

Note: mamba is a drop-in replacement for conda. Feel free to use conda if you prefer it.

Download and link the necessary assets for ManiSkill

python -m mani_skill2.utils.download_asset partnet_mobility_faucet
python -m mani_skill2.utils.download_asset ycb
python -m mani_skill2.utils.download_asset egad
python -m mani_skill2.utils.download_asset partnet_mobility_cabinet

which downloads assets to ./data. You may move these assets to any location. Then, add the following line to your ~/.bashrc or ~/.zshrc:

export MS2_ASSET_DIR=<path>/<to>/<data>

and restart your terminal.

Run Experiments

DrS has two phases:

Reward Learning Phase: learn the dense reward function using training tasks.
Reward Reuse Phase: reuse the learned dense reward to train new RL agents in test tasks.

We provide examples on how to reuse our pre-trained reward checkpoints, as well as how to learn your own dense reward functions, below.

Reward Reuse

You can skip the reward learning phase and directly use the pre-trained reward checkpoints provided by us. See below examples on how to reuse the pre-trained reward checkpoints to train an RL agent from scratch.

The following commands should be run under the repo root dir.

python drs/drs_reuse_reward_maniskill2.py --env-id TurnFaucet_DrS_reuse-v0 --n-stages 2 --control-mode pd_ee_delta_pose --disc-ckpt reward_checkpoints/TurnFaucet.pt

python drs/drs_reuse_reward_maniskill2.py --env-id PickAndPlace_DrS_reuse-v0 --n-stages 3 --control-mode pd_ee_delta_pos --disc-ckpt reward_checkpoints/PickAndPlace.pt

python drs/drs_reuse_reward_maniskill2.py --env-id OpenCabinetDoor_DrS_reuse-v0 --n-stages 3 --control-mode base_pd_joint_vel_arm_pd_joint_vel --disc-ckpt reward_checkpoints/OpenCabinetDoor.pt

Note:

If you want to use Weights and Biases (wandb) to track learning progress, please add --track to your commands.
To run experiments on the task PickAndPlace_DrS_reuse-v0, you will probably need around 96GB memory since it loads a lot of objects.

Reawrd Learning

Instead of using our pre-trained reward checkpoints, you can also train reward functions by yourself.

The following commands should be run under the repo root dir.

python drs/drs_learn_reward_maniskill2.py --env-id TurnFaucet_DrS_learn-v0 --n-stages 2 --control-mode pd_ee_delta_pose --demo-path demo_data/TurnFaucet_100.pkl

python drs/drs_learn_reward_maniskill2.py --env-id PickAndPlace_DrS_learn-v0 --n-stages 3 --control-mode pd_ee_delta_pos --demo-path demo_data/PickAndPlace_100.pkl

python drs/drs_learn_reward_maniskill2.py --env-id OpenCabinetDoor_DrS_learn-v0 --n-stages 3 --control-mode base_pd_joint_vel_arm_pd_joint_vel --demo-path demo_data/OpenCabinetDoor_200.pkl

Citation

If you find our work useful, please consider citing our paper as follows:

@inproceedings{mu2024drs,
  title={DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks},
  author={Mu, Tongzhou and Liu, Minghua and Su, Hao},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024}
}

Acknowledgments

This codebase is built upon CleanRL repository.

License

This project is licensed under the MIT License - see the LICENSE file for details. Note that the repository relies on third-party code, which is subject to their respective licenses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

demo_data

demo_data

drs

drs

reward_checkpoints

reward_checkpoints

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

Repository files navigation

DrS

Overview

Installation

Run Experiments

Reward Reuse

Reawrd Learning

Citation

Acknowledgments

License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
demo_data		demo_data
drs		drs
reward_checkpoints		reward_checkpoints
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

License

tongzhoumu/DrS

Folders and files

Latest commit

History

Repository files navigation

DrS

Overview

Installation

Run Experiments

Reward Reuse

Reawrd Learning

Citation

Acknowledgments

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages