Skip to content

Causal RL: Reverse-Environment Network Integrated Actor-Critic Algorithm

Notifications You must be signed in to change notification settings

ccnets-team/causal-rl

Repository files navigation

Causal Reinforcement Learning Framework by CCNets

Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge

Table of Contents

🎈 Overview

Introduction

Causal RL is an innovative Reinforcement Learning framework that utilizes three networks: Actor, Critic, and Reverse Environment, to learn the causal relationships between states, actions, and values while maximizing cumulative rewards. This introduction provides detailed descriptions of the framework's key features to help users leverage the full potential of Causal RL.

Key Points

  1. Introduction of Causal RL integrating reverse-environment network into Actor-Critic framework learning the causal relationships between states, actions, and values, while maximizing accumulative rewards.

  2. Language Model Training with Reverse Causal Mask: Causal RL utilizes a reverse mask during training to enhance its understanding of causal relationships between states and actions, leading to improved learning efficiency and strategic effectiveness.

  3. Efficient Parameter Tuning: CausalRL offers a preset pipeline for parameter tuning for benchmarking, reducing the effort required in initial setups.

❗️ Dependencies

conda create -name crl python=3.9.18
conda activate crl
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install mlagents==0.30
pip install protobuf==3.20
pip install gymnasium==0.29.1
pip install mujoco==3.1.1
pip install jupyter
pip install transformers==4.34.1

📥 Installation

  • Steps to install the framework.
  • Note: Ensure you have the required dependencies installed as listed in the "Dependencies" section above.

Installation Steps:

  1. Clone the repository:

    git clone https://github.com/ccnets-team/causal-rl.git
  2. Navigate to the directory and install the required packages:

    cd causal-rl
    pip install -r requirements.txt

🏃 Quick Start

  • A basic example to help users get up and running immediately.

1. Import Library

# main.ipynb
from utils.setting.env_settings import analyze_env
import torch

device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")

2. Set Environment

from utils.setting.env_settings import analyze_env

env_config, rl_params = analyze_env(env_name = "HumanoidStandup-v4")

3. Initializing and Running Causal RL Training Process

from causal_rl import CausalRL

with CausalRL(env_config, rl_params, device, use_print = False, use_wandb = False) as causal_rl:
    causal_rl.train(resume_training = False, use_graphics = False) 
    causal_rl.test(max_episodes = 100, use_graphics = False)
    # Temporary setting of use_graphics = True is not supported due to recent `Mujoco` module update changes.
    

📖 Features

1. Manageable RL Parameters

CausalRL facilitates structured management of RL Parameters, allowing users to easily organize, store, and compare parameters, which provides more coherent configurations for diverse RL problems.

# main.ipynb
from utils.setting.env_settings import analyze_env

env_config, rl_params = analyze_env(env_name = "HumanoidStandup-v4")

rl_params.algorithm.gpt_seq_length = 16
rl_params.normalization.state_normalizer = "running_mean_std"

2. Enhancing CausalRL with GPT

from nn.gpt import GPT

class NetworkParameters:
    def __init__(self, num_layers=5, d_model=256, dropout=0.01, 
                network_type=GPT):
        self.critic_network = network_type  
        self.actor_network = network_type  
        self.rev_env_network = network_type 
  • Advanced Sequence Learning: GPT excels in processing sequence data, aiding agents in predicting future states and actions based on past events. This is particularly useful in strategy games like chess or Go.

  • Complex Pattern Recognition: GPT's deep neural network structure is adept at learning and recognizing complex patterns, enabling agents to make informed decisions in intricate environments.

  • Long-term Strategy Development: GPT's proficiency in learning long-term dependencies is crucial for strategic planning in fields like robotics or drones, focusing on goals like energy efficiency and safe navigation.

  • Reverse Causal Masking Benefits: While GPT's actor and critic networks use forward masking, the reverse_env network employs reverse masking. This approach enhances the understanding and prediction of current states from past data, beneficial in complex scenarios like robotics for optimal decision-making.

4. CausalRL Variants: Adaptable Methods for Diverse Learning Environments

  • These methods in CausalRL offer flexible options to fit various learning environments, allowing users to choose the best approach for their specific needs.
# rl_params.py
class TrainingParameters:
  def __init__(self, trainer_name = 'causal_rl', trainer_variant = 'hybrid', ...)
  • CausalRL - Classic

      - *Training Flow: state -> action -> state*
    
    • More advantageous when the state size is significantly larger than the action size.

      Click to see diagram
  • CausalRL - Inverse

      -  *Training Flow: action -> state -> action*
    
    • More advantageous when there is not a significant difference between state size and action size.

    • Faster compared to classic and hybrid.

      Click to see diagram
  • CausalRL - Hybrid

    • A hybrid approach, combining aspects of both classic and inverse variants

      • Training Flow: state -> action -> state
      • Training Flow: action -> state -> action
    • The training process is relatively complex, resulting in a slightly slower learning speed compared to classic and inverse.

      Click to see diagram



✔️ Algorithm Feature Checklist

📗 Algorithms Implementation

Algorithm Implemantation Docs
CausalRL causal_rl.py Patent
Advantage Actor-Critic (A2C) a2c.py Docs
Deep Deterministic Policy Gradient (DDPG) ddpg.py Docs
Deep Q-Network (DQN) dqn.py Docs
Soft Actor-Critic (SAC) sac.py Docs
Twin Delayed Deep Deterministic Policy Gradient (TD3) td3.py Docs



📈 CausalRL Benchmarks

Discover the capabilities of CausalRL algorithms in various OpenAI Gym environments. Our benchmarks, adhering to optimized industrial requirements and running 100K steps with a 64 batch size, provide in-depth performance insights. Explore the detailed metrics:

Download and Use Model(W&B)

  • Install the WandB Package
pip install wandb # If not installed
  • Inintialize and Run WandB
import wandb
run = wandb.init() # you may need to enter your API Key
  • Download Artifact from WandB
Artifact_Name = ... # including the specific model and version you want, and assign it
artifact = run.use_artifact('rl_tune/causal-rl-gym/{Artifact_Name}', type='model')
artifact_dir = artifact.download()



🔎 API Documentation

  • We're currently in the process of building our official documentation webpage to better assist you. In the meantime, if you have any specific questions or need clarifications, feel free to reach out through our other support channels. We appreciate your patience and understanding!

🌟 Contribution Guidelines

Click to see more
  • We warmly welcome contributions from everyone! Here's how you can contribute:

contribution process

When you submit a Pull Request (PR) to our project, here's the process it goes through:

  1. Initial Check: We first check if the PR is valid.
    • If not, it's rejected.
    • If valid, it proceeds to review.
  2. Review Process:
    • If changes are needed, you'll receive feedback. Please make the necessary adjustments to your PR and resubmit. This review-feedback cycle may repeat until the PR is satisfactory.
    • If no changes are needed, the PR is approved.
  3. Testing:
    • Approved PRs undergo testing.
    • If tests pass, your PR gets merged! 🎉
    • If tests fail, you'll receive feedback. Adjust your PR accordingly and it will go through the review process again.

Your contributions are invaluable to us. Please ensure you address feedback promptly to streamline the merge process.

🐞 Issue Reporting Policy

Thank you for taking the time to report issues and provide feedback. This helps improve our project for everyone! To ensure that your issue is handled efficiently, please follow the guidelines below:

1. Choose the Right Template:

We provide three issue templates to streamline the reporting process:

  1. Bug Report: Use this template if you've found a bug or something isn't working as expected. Please provide as much detail as possible to help us reproduce and fix the bug.
  2. Feature Request: If you have an idea for a new feature or think something could be improved, this is the template to use. Describe the feature, its benefits, and how you envision it.
  3. Custom Issue Template: For all other issues or general feedback, use this template. Make sure to provide sufficient context and detail.

2. Search First:

Before submitting a new issue, please search the existing issues to avoid duplicates. If you find a similar issue, you can add your information or 👍 the issue to show your support.

3. Be Clear and Concise:

  • Title: Use a descriptive title that summarizes the issue.
  • Description: Provide as much detail as necessary, but try to be concise. If reporting a bug, include steps to reproduce, expected behavior, and actual behavior.
  • Screenshots: If applicable, add screenshots to help explain the issue.

4. Use Labels:

If possible, categorize your issue using the appropriate GitHub labels. This helps us prioritize and address issues faster.

5. Stay Engaged:

After submitting an issue, please check back periodically. Maintainers or other contributors may ask for further information or provide updates.

Thank you for helping improve our project! Your feedback and contributions are invaluable.

✉️ Support & Contact

Facing issues or have questions about our framework? We're here to help!

  1. Issue Tracker:
    • If you've encountered a bug or have a feature request, please open an issue on our GitHub Issues page. Be sure to check existing issues to avoid duplicates.
  2. Social Media:
    • Stay updated with announcements and news by following us on LinkedIn.
  3. Emergency Contact:
    • If there are security concerns or critical issues, contact our emergency team at support@ccnets.org.

Please be respectful and constructive in all interactions.

LICENSE

CAUSALRL is dual-licensed under the GNU General Public License version 3(GPLv3) and a separate Commercial License.

Please consult the LICENSE files in the repository for more detailed information on the licensing of CAUSALRL.