ai-alignment

Here are 26 public repositories matching this topic...

MinghuiChen43 / awesome-trustworthy-deep-learning

A curated list of trustworthy deep learning papers. Daily updating...

Updated May 17, 2024

agencyenterprise / PromptInject

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022

machine-learning agi language-models ai-safety adversarial-attacks ai-alignment ml-safety gpt-3 large-language-models prompt-engineering chain-of-thought agi-alignment

Updated Feb 26, 2024
Python

tomekkorbak / pretraining-with-human-feedback

Star

Code accompanying the paper Pretraining Language Models with Human Preferences

reinforcement-learning gpt language-models ai-safety ai-alignment pretraining decision-transformers rlhf

Updated Feb 13, 2024
Python

Giskard-AI / awesome-ai-safety

Sponsor

Star

📚 A curated list of papers & technical articles on AI Quality & Safety

Updated Oct 13, 2023

wesg52 / sparse-probing-paper

Star

Sparse probing paper full code.

ai-safety interpretability ai-alignment mechanistic-interpretability

Updated Dec 17, 2023
Jupyter Notebook

dit7ya / awesome-ai-alignment

Star

A curated list of awesome resources for getting-started-with and staying-in-touch-with Artificial Intelligence Alignment research.

awesome awesome-list ai-safety ai-alignment

Updated Jul 14, 2023

IQTLabs / daisybell

Star

Scan your AI/ML models for problems before you put them into production.

cybersecurity ai-safety bias-correction bias-detection ai-alignment model-poison ai-assurance

Updated May 23, 2024
Python

lets-make-safe-ai / make-safe-ai

Star

How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

ai agi artificial-intelligence artificial-general-intelligence ai-safety ai-alignment

Updated Mar 29, 2023

riceissa / aiwatch

Star

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

mysql php database dataset ai-safety data-portal aisafety ai-alignment

Updated May 21, 2024
HTML

EzgiKorkmaz / adversarial-reinforcement-learning

Star

Reading list for adversarial perspective and robustness in deep reinforcement learning.

Updated Sep 18, 2023

UCSC-VLAA / Sight-Beyond-Text

Star

This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

alignment vlm ai-alignment vision-language vicuna llm mllm llava llama2

Updated Sep 15, 2023
Python

rmoehn / amplification

Star

An implementation of iterated distillation and amplification

transformer ida supervised-learning ai-safety ai-alignment

Updated Jun 22, 2022
Python

phelps-sg / llm-cooperation

Sponsor

Star

Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023

economics ai-safety gametheory experimental-economics behavioral-economics prisoners-dilemma ai-alignment experimental-psychology social-dilemmas gpt-3 gpt-4 llm principal-agent-problem

Updated Mar 1, 2024
Python

RLHFlow / Directional-Preference-Alignment

Star

Directional Preference Alignment

ai-alignment large-language-models rlhf

Updated May 23, 2024

ai-fail-safe / honeypot

Star

a project to detect environment tampering on the part of an agent

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Oct 31, 2022

riceissa / miri-top-contributors

Star

sql ai-safety ai-alignment donations-list-website

Updated Sep 2, 2018
HTML

rmoehn / jursey

Star

Q&A system with reflection and automation, similar to Patchwork, Affable, Mosaic

reflection ida datomic hch ai-alignment factored-cognition

Updated Mar 10, 2019
Clojure

rmoehn / farlamp

Star

IDA with RL and overseer failures

ida research-project ai-alignment

Updated Jul 31, 2021
TeX

ai-fail-safe / safe-reward

Star

a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Nov 8, 2022
Python

EveryOneIsGross / bbBOT

Star

bbBOT is a felixble persona based branching binary sentiment chatbot.

openai tree-structure chatbot-framework ai-alignment python-ai openai-chatgpt

Updated Jul 13, 2023
Python

Improve this page

Add a description, image, and links to the ai-alignment topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-alignment topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-alignment

Here are 26 public repositories matching this topic...

MinghuiChen43 / awesome-trustworthy-deep-learning

agencyenterprise / PromptInject

tomekkorbak / pretraining-with-human-feedback

Giskard-AI / awesome-ai-safety

wesg52 / sparse-probing-paper

dit7ya / awesome-ai-alignment

IQTLabs / daisybell

lets-make-safe-ai / make-safe-ai

riceissa / aiwatch

EzgiKorkmaz / adversarial-reinforcement-learning

UCSC-VLAA / Sight-Beyond-Text

rmoehn / amplification

phelps-sg / llm-cooperation

RLHFlow / Directional-Preference-Alignment

ai-fail-safe / honeypot

riceissa / miri-top-contributors

rmoehn / jursey

rmoehn / farlamp

ai-fail-safe / safe-reward

EveryOneIsGross / bbBOT

Improve this page

Add this topic to your repo