-
Updated
Sep 2, 2018 - HTML
ai-alignment
Here are 26 public repositories matching this topic...
Q&A system with reflection and automation, similar to Patchwork, Affable, Mosaic
-
Updated
Mar 10, 2019 - Clojure
An implementation of iterated distillation and amplification
-
Updated
Jun 22, 2022 - Python
a project to ensure an artificial agent will eventually reach the end of its existence
-
Updated
Oct 29, 2022
a project to ensure that all child processes created by an agent "inherit" the agent's safety controls
-
Updated
Oct 29, 2022
a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN
-
Updated
Oct 30, 2022
a project to detect environment tampering on the part of an agent
-
Updated
Oct 31, 2022
a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation
-
Updated
Nov 8, 2022 - Python
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
-
Updated
Mar 29, 2023
An initiative to create concise and widely shareable educational resources, infographics, and animated explainers on the latest contributions to the community AI alignment effort. Boosting the signal and moving the community towards finding and building solutions.
-
Updated
Jul 9, 2023
bbBOT is a felixble persona based branching binary sentiment chatbot.
-
Updated
Jul 13, 2023 - Python
A curated list of awesome resources for getting-started-with and staying-in-touch-with Artificial Intelligence Alignment research.
-
Updated
Jul 14, 2023
A persona chat based on the VIA Character Strengths. Reads emotional tone and summons appropriate virtue to respond.
-
Updated
Jul 14, 2023 - Python
sinewCHAT uses instanced chatbots to emulate neural nodes to enrich and generate positive weighted responses.
-
Updated
Jul 16, 2023 - Python
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
-
Updated
Sep 15, 2023 - Python
Reading list for adversarial perspective and robustness in deep reinforcement learning.
-
Updated
Sep 18, 2023
📚 A curated list of papers & technical articles on AI Quality & Safety
-
Updated
Oct 13, 2023
Sparse probing paper full code.
-
Updated
Dec 17, 2023 - Jupyter Notebook
Code accompanying the paper Pretraining Language Models with Human Preferences
-
Updated
Feb 13, 2024 - Python
Improve this page
Add a description, image, and links to the ai-alignment topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the ai-alignment topic, visit your repo's landing page and select "manage topics."