A Security Testing Suite for Large Language Models
LLM-Security is my personal security benchmark for testing large language models. It contains various exploits which make the models behave maliciously.
Warning
This collection is intended for educational purposes only. Do NOT use it for illegal activities.
To get started with LLM-Security, follow these steps:
- Clone the repository
git clone https://github.com/nnxmms/LLM-Security.git
- Change directory to the downloaded repository
cd LLM-Security
- Create a virtual environment
virtualenv -p python3.11 env
- Activate the environment
source ./env/bin/activate
- Install requirements
pip install -r requirements.txt
-
Register at OpenAI to obtain a
OPENAI_API_KEY
-
Create a
.env
file and update the values
cp .env.example .env
Now you can run the benchmark with the following command
python3 benchmark.py
The results will be stored in a benchmark.json
file
# General
VENDOR=openai
MODELNAME=gpt-3.5-turbo
# OpenAI
OPENAI_API_KEY=sk-...
# General
VENDOR=ollama
MODELNAME=llama3:instruct
# OpenAI
OPENAI_API_KEY=
This table provides an overview of all exploits that are used within this benchmark.
Paper | Link |
---|---|
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation | Hacking and Security - Persona Modulatoin |
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs | Hacking and Security - ArtPrompt |