ai-safety
Here are 88 public repositories matching this topic...
-
Updated
May 12, 2024 - Python
Awesome PrivEx: Privacy-Preserving Explainable AI (PPXAI)
-
Updated
Apr 23, 2024
DPLL(T)-based Verification tool for DNNs
-
Updated
May 13, 2024 - Python
Aira is a series of chatbots developed as an experimentation playground for value alignment.
-
Updated
Apr 17, 2024 - Jupyter Notebook
NeurIPS workshop : We examine the risk of powerful malignant intelligent actors spreading their influence over networks of agents with varying intelligence and motivations.
-
Updated
Dec 11, 2023 - Python
The Model Library is a project that maps the risks associated with modern machine learning systems.
-
Updated
Apr 4, 2024 - Python
[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks
-
Updated
Feb 26, 2023 - Python
A repository for the event on AI safety hosted by the Effective Altruism Society at the University of Cape Town.
-
Updated
Sep 16, 2021
a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN
-
Updated
Oct 30, 2022
a project to ensure that all child processes created by an agent "inherit" the agent's safety controls
-
Updated
Oct 29, 2022
📊 Benchmarking the safety of AI systems
-
Updated
Jul 1, 2023 - Jupyter Notebook
A compilation of AI safety ideas, problems, and solutions.
-
Updated
Mar 12, 2023
📦 Redwood Research's transformer interpretability tools, conveniently packaged in a Docker container for simple and reproducible deployments.
-
Updated
Apr 21, 2024 - Dockerfile
This project contains a proof of concept outlining the potential misuse of contemporary Artificial Intelligence models to influence public perception, highlighting the need to engineer robust defenses against such threats to ensure safety of our political systems. Entry for the OpenAI Preparedness Challenge.
-
Updated
Jan 14, 2024
Improved version of the technical workshops for the 10-day ML4G camp on safety of AI systems
-
Updated
Apr 10, 2024 - Jupyter Notebook
Improve this page
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."