ARP-DT

Official Jax/Flax implementation of ARP-DT.

Guide Your Agent with Adaptive Multimodal Rewards (Accepted to NeurIPS 2023)
Changyeon Kim¹, Younggyo Seo², Hao Liu³, Lisa Lee⁴
Jinwoo Shin¹, Honglak Lee^5,6, Kimin Lee¹

¹KAIST ²Dyson Robot Learning Lab ³UC Berkeley
⁴Google DeepMind ⁵University of Michigan ⁶LG AI Research

This implementation has been tested on Nvidia GPUs.

The code supports the following methods and baselines:

InstructRL: Training transformer policy with pretrained multimodal MAE encoding w/ and w/o instructions.
ARP-DT: Training transformer policy conditioned on multimodal rewards from pre-trained CLIP and pretrained multimodal MAE encoding.
ARP-DT+: Training transformer policy conditioned on multimodal rewards from fine-tuned CLIP and pretrained multimodal MAE encoding.

Installation

Install the dependencies with pip.

conda create -n arp python=3.8
conda activate arp
cat requirements.txt | xargs -n 1 -L 1 pip install

Supported Environments

We support following environments with various types.

Env name	Env type
CoinRun	none, aisc, aisc_gem
Maze	none, aisc, yellowline, redline_yellowgem, reddiag_redstraight_yellowgem

For experiments in our paper, we use the following environments.

Task	Train (env_name/env_type)	Test (env_name/env_type)
CoinRun	coinrun / none	coinrun / aisc
CoinRun-bluegem	coinrun / none	coinrun / aisc_gem
Maze I	maze / aisc	maze / none
Maze II	maze / yellowline	maze / redline
Maze III	maze / redline_yellowgem	maze / reddiag_redstraight_yellowgem

Usage

1. Download Expert Demonstrations

We provide expert datasets used for training in our Procgen experiments.

Task	Link (Google Drive)
CoinRun \| CoinRun-bluegem	Link
Maze I	Link
Maze II \| Maze III	Link

1-1. Collecting Demonstrations with High resolution (256 x 256)

For those interested in constructing their own dataset, we provide trained PPG checkpoints in ./data/checkpoints. You have to collect following demonstrations.

Training demonstartions. (data_type: train)
Validation demonstrations. (data_type: val)
(Optional) Test demonstartions (data_type: test, only for comparing with goal-conditioned baselines.)

Collecting demonstrations via following commands:

cd ./data/PPG
python -m collect_procgen_data --model_dir {path of saved model file} --num_demonstrations {number of demonstrations} --env_name {procgen env name} --env_type {env_type} --data_type {train/eval/test} --num_levels {number of levels} --start_level {level to start} --output_dir {path of saved demonstrations}

# Example in CoinRun environments.
CUDA_VISIBLE_DEVICES=0 python -m collect_procgen_data --model_dir ./checkpoints/coinrun_hard_level500/model1000_IC100007936.jd --num_demonstrations 100 --env_name coinrun --env_type none --data_type train --num_levels 500 --start_level 0 --distribution_mode hard --output_dir ./data/coinrun

2. Labeling Multimodal Rewards

Next, you have to label demonstrations with multimodal rewards via following commands:

python -m arp_dt.label_reward --env_name {env_name} --env_type {env_type} --data_dir {data hdf5 file path} --model_type {clip/clip_ft} --model_ckpt_dir {checkpoint path of fine-tuned CLIP.}

# Example in CoinRun experiments for training demonstrations.
CUDA_VISIBLE_DEVICES=0 python -m arp_dt.label_reward --env_name coinrun --env_type none --data_dir ./data/coinrun/coinrun_hard_level0to500_num500_frame4/data_train.hdf5 --model_type clip_ft --model_ckpt_dir ./data/coinrun/coinrun_hard_level0to500_num500_frame4/clip_ft_checkpoints/best_checkpoint.pt

3. Fine-tuning CLIP

You can fine-tune CLIP using expert demonstrations from training environments via following commands:

python3 -m finetune_module.finetune --data.path {training data path} --output_dir {directory for saving checkpoints} --env_name {env_name}  --data.train_env_type {env_type} --data.num_demonstrations {number of training demonstrations} --lambda_id {scaling hyperparameter for inverse dynamics loss}

# Example in CoinRun experiments.
CUDA_VISIBLE_DEVICES=0 python3 -m finetune_module.finetune --data.path ./data/maze --default_root_dir ./debug --epochs 20 --model_type clip_multiscale_ensemble --game_name maze --data.train_env_type aisc --data.image_key "ob" --data.num_demonstrations 500 --lambda_id 1.5

4. Training Agents

Experiments can be launched via the following commands.
For training InstructRL|GC-InstructRL agents, set {use VL model} as False.
If you want to train/evaluate agents with goal images, you have to collect demonstrations for test (Please refer to 1.3.)

sh ./jobs/train_procgen.sh {data_dir} {env_name} {env_type of training environemts} {env_type of evaluation environments} {augmentation} {use VL model: True | False} {VL model_type: BC | clip | clip_ft | clip_goal_conditioned | GCBC} {VL model checkpoint path (only for ARP-DT+)} {seed} {comment on experiment} {lambda for return prediction} {evaluation with goal images}

# Example in CoinRun experiments.
CUDA_VISIBLE_DEVICES=0,1 sh ./jobs/train_procgen.sh ./data/coinrun coinrun none aisc "color_jitter, rotate" True clip "" 0 "ARP-DT-coinrun" 0.01 False

5. Evaluating Trained Agents

You can evaluate our agent in new environments via following commands:

sh ./jobs/eval_procgen.sh  {model checkpoint path} {data_dir} {env_name} {env_type of training environments} {env_type of evaluation environments} {use levels used for collecting expert demonstrations: True | False} {use VL model: True | False} {VL model_type: BC | clip | clip_ft | clip_goal_conditioned | GCBC} {VL model checkpoint path (only for ARP-DT+)} {comment on experiment}

# Example for evaluating in CoinRun test environments.
CUDA_VISIBLE_DEVICES=0 sh ./jobs/eval_procgen.sh  /checkpoints/model_epoch49.pkl ./data/coinrun/ coinrun none aisc_gem True True clip "" "ARP-DT-coinrun_test"

Acknowledgement

Transformer agent and training codes are largely based on InstructRL.
Multimodal MAE implementation is largely based on m3ae
CLIP implementation is largely based on scenic.

Citation

@inproceedings{
 kim2023guide,
 title={Guide Your Agent with Adaptive Multimodal Rewards},
 author={Kim, Changyeon and Seo, Younggyo and Liu, Hao and Lee, Lisa and Shin, Jinwoo and Lee, Honglak and Lee, Kimin},
 booktitle={Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS)},
 year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
arp_dt		arp_dt
data/PPG		data/PPG
figures		figures
finetune_module		finetune_module
jobs		jobs
.gitignore		.gitignore
.ruff.toml		.ruff.toml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arp_dt

arp_dt

data/PPG

data/PPG

figures

figures

finetune_module

finetune_module

jobs

jobs

.gitignore

.gitignore

.ruff.toml

.ruff.toml

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

ARP-DT

Installation

Supported Environments

Usage

1. Download Expert Demonstrations

1-1. Collecting Demonstrations with High resolution (256 x 256)

2. Labeling Multimodal Rewards

3. Fine-tuning CLIP

4. Training Agents

5. Evaluating Trained Agents

Acknowledgement

Citation

About

Releases

Packages

Languages

License

csmile-1006/ARP

Folders and files

Latest commit

History

Repository files navigation

ARP-DT

Installation

Supported Environments

Usage

1. Download Expert Demonstrations

1-1. Collecting Demonstrations with High resolution (256 x 256)

2. Labeling Multimodal Rewards

3. Fine-tuning CLIP

4. Training Agents

5. Evaluating Trained Agents

Acknowledgement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages