Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

This is a Pytorch implementation of the REDQ+AdaptiveBC method proposed in the paper "Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning" by Yi Zhao, Rinu Boney, Alexander Ilin, Juho Kannala, and Joni Pajarinen.

Setup

conda env create -f environment.yaml
conda activate adaptive

Note: For d4rl, the MuJoCo license is still needed, you can get the license for free from http://roboti.us/, and copy it into your MuJoCo installation foler. The instruction of the mujoco_py might be useful.

Run the code

The training includes two stages: pretraining on the d4rl dataset and finetuning on the corresponding task. The run the experiment:

python3 main.py --env=<TASK_NAME> --seed=<SEED>

We use wandb for logging. Please check the documentation for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
main.py		main.py
redq_bc.py		redq_bc.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

environment.yaml

environment.yaml

main.py

main.py

redq_bc.py

redq_bc.py

utils.py

utils.py

Repository files navigation

Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

Setup

Run the code

About

Releases

Packages

Languages

License

zhaoyi11/adaptive_bc

Folders and files

Latest commit

History

Repository files navigation

Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

Setup

Run the code

About

Resources

License

Stars

Watchers

Forks

Languages