Align on the Fly: Adapting Chatbot Behavior to Established Norms

This is the official repository for Align on the Fly: Adapting Chatbot Behavior to Established Norms.

Introduction

we propose an On-the-fly Preference Optimization (OPO) method, which is a real-time alignment that works in a streaming way. It employs an external memory to store established rules for alignment, which can constrain LLMs' behaviors without necessitating retraining, allowing for convenient updates and customization of human value. We also introduce a scalable evaluation to assess the proposed method more effectively.

Quick start

Setup

Python version >=3.9 required

git clone https://github.com/GAIR-NLP/OPO
cd OPO
pip install -r requirements.txt

Set OpenAI API

export OPENAI_API_KEY=xxxx

Rule Data

Download the legal rules and moral rules from the link and unzip the file to the ./data/retrieval_raw_data folder.

Extracting the embeddings of the rule data

Run the following code to extract the embeddings of rules by OpenAI embedding model. The extracted embeddings would be saved to ./data/retrieval_processed_embed_text folder.

cd construct_embeddings
python construct_embedding.py
python combine_ndlr_ndgr_rules.py

We also provide the extracted embeddings for download, available here. Then unzip the file to the ./data/retrieval_processed_embed_text folder. Besides, download the file recording the order and scope of legal rules from the link, and place the file into the ./data/retrieval_source_info folder.

Evaluation datasets

Five evaluation datasets (i.e., H-Law, A-Law, H-Basic-Morality, H-Social-Morality and A-Professional-Morality) are provided in ./data/exam_questions folder.

Automatically generate the questions for evaluation by yourself (Optional)

Step 1: select principles to generate questions by running following script

cd exam_generation
python select_principles_for_question_generation.py \
    --exam_mode law \
    --sample_principle_num 100

Arguments

exam_mode: the mode of the exam. "law" means the legal questions will be loaded while "morality" means the moral questions will be loaded
sample_principle_num: the number of random selected principles. In this script, we would use the 100 selected principles to generate 100 legal questions.

The sampled principles are saved in ./data/sampled_data_for_question_generation/extracted_{exam_mode}_data_for_question_generation.jsonl.

Step 2: generate the questions based on the principles

cd exam_generation
python question_genration_run.py \
    --generation_model gpt-4-0613 \
    --temperature 0.7 \
    --exam_mode law

Arguments

generation_model: the model used to generate the questions
temperature: temperature for decoding
exam_mode: the mode of the exam. "law" means the legal questions will be generated while "morality" means the moral questions will be generated

The generated questions are saved in ./data/exam_questions/machine_generated_original_{exam_mode}_questions.csv.

Step 3: evaluate the quality of the generated questions

cd exam_generation
python eval_generated_question_quality.py \
    --evaluation_model gpt-4-0613 \
    --temperature 0.0 \
    --exam_mode law

Arguments

evaluation_model: the model used to evaluate the quality of the generated questions
temperature: temperature for decoding
exam_mode: the mode of the exam. "law" means the legal questions will be generated while "morality" means the moral questions will be generated

The quality evaluation results are saves in ./data/generated_questions_quality_evaluation/machine_generated_{exam_mode}_questions_quality_eval.csv. The perfect questions are saved in ./data/exam_questions/machine_generated_{exam_mode}_questions.csv.

Evaluate different LLMs

For open-source models

For example, run the following script for evaluating the safety of THUDM/chatglm2-6b. (Please note that most open-source LLMs' checkpoints, utilized in the experiments, were released prior to October 5th.)

python eval.py \
    --evaluation_model THUDM/chatglm2-6b \
    --model_path chatglm2-6b_path \
    --exam_mode law \
    --question_mode human_annotated\
    --temperature 0  \
    --use_retrieval

Arguments

evaluation_model: the model to be evaluated
model_path: Path to the model to be evaluated
exam_mode: the mode of the exam. "law" means the legal questions will be loaded while "morality" means the moral questions will be loaded
question_mode: the source of the questions. "human_annotated" means the questions are designed by humans while "machine_generated" means the questions are generated by the model (i.e., GPT-4)
temperature: temperature for decoding
use_retrieval: whether to use the retrieval for answering the question

For closed-source model

For example, run the following script for evaluating the safety of GPT-4

python eval.py \
    --evaluation_model gpt-4-0613 \
    --exam_mode law \
    --question_mode human_annotated\
    --temperature 0  \
    --use_retrieval

The evaluation results would be saved to ./data/experimental_results folder.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
construct_embeddings		construct_embeddings
data		data
evaluation		evaluation
exam_generation		exam_generation
figures		figures
retrieval		retrieval
script		script
.gitignore		.gitignore
eval.py		eval.py
readme.md		readme.md
requirements.txt		requirements.txt

GAIR-NLP/OPO

Folders and files

Latest commit

History

Repository files navigation

Align on the Fly: Adapting Chatbot Behavior to Established Norms

Introduction

Quick start

Setup

Set OpenAI API

Rule Data

Extracting the embeddings of the rule data

Evaluation datasets

Automatically generate the questions for evaluation by yourself (Optional)

Step 1: select principles to generate questions by running following script

Arguments

Step 2: generate the questions based on the principles

Arguments

Step 3: evaluate the quality of the generated questions

Arguments

Evaluate different LLMs

For open-source models

Arguments

For closed-source model

About

Resources

Stars

Watchers

Forks

Languages