Skip to content
@PKU-Alignment

PKU-Alignment

Loves Sharing and Open-Source, Making AI Safer.

PKU-Alignment

Large language models (LLM) have immense potential in the field of general intelligence but come with significant risks. As a research team at Peking University, we are actively focusing on alignment techniques for large language models, such as safe-alignment to enhance the model's safety and reduce its toxicity.

Welcome to follow our AI Safety project:

Pinned

  1. omnisafe omnisafe Public

    OmniSafe is an infrastructural framework for accelerating SafeRL research.

    Python 855 124

  2. safety-gymnasium safety-gymnasium Public

    NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

    Python 320 47

  3. safe-rlhf safe-rlhf Public

    Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

    Python 1.2k 100

  4. Safe-Policy-Optimization Safe-Policy-Optimization Public

    NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms

    Python 291 38

Repositories

Showing 10 of 10 repositories
  • omnisafe Public

    OmniSafe is an infrastructural framework for accelerating SafeRL research.

    Python 855 Apache-2.0 124 10 6 Updated May 9, 2024
  • safety-gymnasium Public

    NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

    Python 320 Apache-2.0 47 8 3 Updated Apr 30, 2024
  • safe-rlhf Public

    Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

    Python 1,164 Apache-2.0 100 14 0 Updated Apr 20, 2024
  • ProAgent Public

    ProAgent: Building Proactive Cooperative Agents with Large Language Models

    JavaScript 36 MIT 1 1 0 Updated Apr 8, 2024
  • SafeDreamer Public

    ICLR 2024: SafeDreamer: Safe Reinforcement Learning with World Models

    Python 27 Apache-2.0 2 0 0 Updated Apr 8, 2024
  • Safe-Policy-Optimization Public

    NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms

    Python 291 Apache-2.0 38 0 0 Updated Mar 20, 2024
  • AlignmentSurvey Public

    AI Alignment: A Comprehensive Survey

    114 0 0 0 Updated Nov 2, 2023
  • beavertails Public

    BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

    Makefile 78 Apache-2.0 3 0 1 Updated Oct 27, 2023
  • .github Public
    0 0 0 0 Updated May 31, 2023
  • ReDMan Public

    ReDMan is an open-source simulation platform that provides a standardized implementation of safe RL algorithms for Reliable Dexterous Manipulation.

    Python 15 Apache-2.0 2 0 0 Updated May 2, 2023