Generative Reinforcement Learning (GRL)

GenerativeRL, short for Generative Reinforcement Learning, is a Python library for solving reinforcement learning (RL) problems using generative models, such as diffusion models and flow models. This library aims to provide a framework for combining the power of generative models with the decision-making capabilities of reinforcement learning algorithms.

Features

Integration of diffusion models and flow models for state representation, action representation or policy learning in RL
Implementation of popular RL algorithms tailored for generative models, such as Q-guided policy optimization (QGPO)
Support for various RL environments and benchmarks
Easy-to-use API for training and evaluation

Installation

pip install grl

Or, if you want to install from source:

git clone https://github.com/zjowowen/GenerativeRL_Preview.git
cd GenerativeRL_Preview
pip install -e .

Or you can use the docker image:

docker pull zjowowen/grl:torch2.3.0-cuda12.1-cudnn8-runtime
docker run -it --rm --gpus all zjowowen/grl:torch2.3.0-cuda12.1-cudnn8-runtime /bin/bash

Quick Start

Here is an example of how to train a diffusion model for Q-guided policy optimization (QGPO) in the LunarLanderContinuous-v2 environment using GenerativeRL.

Install the required dependencies:

pip install gym[box2d]==0.23.1

Download dataset from here and save it as data.npz in the current directory.

GenerativeRL uses WandB for logging. It will ask you to log in to your account when you use it. You can disable it by running:

wandb offline

import gym

from grl.algorithms.qgpo import QGPOAlgorithm
from grl.datasets import QGPOCustomizedDataset
from grl.utils.log import log
from grl_pipelines.diffusion_model.configurations.lunarlander_continuous_qgpo import config

def qgpo_pipeline(config):
    qgpo = QGPOAlgorithm(config, dataset=QGPOCustomizedDataset(numpy_data_path="./data.npz", device=config.train.device))
    qgpo.train()

    agent = qgpo.deploy()
    env = gym.make(config.deploy.env.env_id)
    observation = env.reset()
    for _ in range(config.deploy.num_deploy_steps):
        env.render()
        observation, reward, done, _ = env.step(agent.act(observation))

if __name__ == '__main__':
    log.info("config: \n{}".format(config))
    qgpo_pipeline(config)

For more detailed examples and documentation, please refer to the GenerativeRL documentation.

Contributing

We welcome contributions to GenerativeRL! If you are interested in contributing, please refer to the Contributing Guide.

License

GenerativeRL is licensed under the Apache License 2.0. See LICENSE for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 370 Commits
.github/workflows		.github/workflows
docs		docs
grl		grl
grl_pipelines		grl_pipelines
.gitignore		.gitignore
.style.yapf		.style.yapf
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
requirements-doc.txt		requirements-doc.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docs

docs

grl

grl

grl_pipelines

grl_pipelines

.gitignore

.gitignore

.style.yapf

.style.yapf

LICENSE

LICENSE

README.md

README.md

README.zh.md

README.zh.md

requirements-doc.txt

requirements-doc.txt

setup.py

setup.py

Repository files navigation

Generative Reinforcement Learning (GRL)

Features

Installation

Quick Start

Contributing

License

About

Releases

Packages

Contributors 3

Languages

License

zjowowen/GenerativeRL_Preview

Folders and files

Latest commit

History

Repository files navigation

Generative Reinforcement Learning (GRL)

Features

Installation

Quick Start

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages