H2O

H2O (https://arxiv.org/abs/2206.13464) is the first Hybrid Offline-and-Online Reinforcement Learning framework, that enables simultaneous policy learning with offline real-world datasets and simulation rollouts, while also addressing the sim-to-real dynamics gaps in imperfect simulation. H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q-values as well as fixes the Bellman error on simulated samples with large dynamics gaps. Through extensive simulation and real-world tasks, as well as theoretical analysis, we demonstrate the superior performance of H2O against other cross-domain online and offline RL algorithms. This repository provides the codebase on which we benchmark H2O and baselines in MuJoCo environments.

Installation and Setups

To install the dependencies, run the command:

    pip install -r requirements.txt

Add this repo directory to your PYTHONPATH environment variable:

    export PYTHONPATH="$PYTHONPATH:$(pwd)"

Run Benchmark Experiments

We benchmark H2O and its baselines on MuJoCo simulation environment and D4RL datasets. To begin, enter the folder SimpleSAC:

    cd SimpleSAC

Then you can run H2O experiments using the following example commands.

Simulated in HalfCheetah-v2 with 2x gravity and Medium Replay dataset

    python sim2real_sac_main.py \
        --env_list HalfCheetah-v2 \
        --data_source medium_replay \
        --unreal_dynamics gravity \
        --variety_list 2.0

Simulated in Walker-v2 with .3x friction and Medium Replay dataset

    python sim2real_sac_main.py \
        --env_list Walker-v2 \
        --data_source medium_replay \
        --unreal_dynamics friction \
        --variety_list 0.3

Simulated in HalfCheetah-v2 with joint noise N(0,1) and Medium dataset

    python sim2real_sac_main.py \
        --env_list HalfCheetah-v2 \
        --data_source medium \
        --variety_list 1.0 \
        --joint_noise_std 1.0

Visualization of Learning Curves

You can resort to wandb to login your personal account with your wandb API key.

    export WANDB_API_KEY=YOUR_WANDB_API_KEY

and run wandb online to turn on the online syncronization.

Citation

If you are using H2O framework or code for your project development, please cite the following paper:

@inproceedings{
    niu2022when,
    title={When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning},
    author={Haoyi Niu and Shubham Sharma and Yiwen Qiu and Ming Li and Guyue Zhou and Jianming HU and Xianyuan Zhan},
    booktitle={Advances in Neural Information Processing Systems},
    year={2022},
    url={https://openreview.net/forum?id=zXE8iFOZKw}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Network		Network
SimpleSAC		SimpleSAC
viskit		viskit
xml_path		xml_path
H2O.png		H2O.png
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network

Network

SimpleSAC

SimpleSAC

viskit

viskit

xml_path

xml_path

H2O.png

H2O.png

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

H2O

Installation and Setups

Run Benchmark Experiments

Simulated in HalfCheetah-v2 with 2x gravity and Medium Replay dataset

Simulated in Walker-v2 with .3x friction and Medium Replay dataset

Simulated in HalfCheetah-v2 with joint noise N(0,1) and Medium dataset

Visualization of Learning Curves

Citation

About

Releases

Packages

Languages

t6-thu/H2O

Folders and files

Latest commit

History

Repository files navigation

H2O

Installation and Setups

Run Benchmark Experiments

Simulated in HalfCheetah-v2 with 2x gravity and Medium Replay dataset

Simulated in Walker-v2 with .3x friction and Medium Replay dataset

Simulated in HalfCheetah-v2 with joint noise N(0,1) and Medium dataset

Visualization of Learning Curves

Citation

About

Resources

Stars

Watchers

Forks

Languages