-
Updated
May 12, 2024 - Python
ai-safety
Here are 90 public repositories matching this topic...
Hardened AI Assurance reference platform
-
Updated
Jan 23, 2023 - Python
Short story about artificial general intelligence (firstly an english homework).
-
Updated
Dec 13, 2018
A repository for the event on AI safety hosted by the Effective Altruism Society at the University of Cape Town.
-
Updated
Sep 16, 2021
📦 Redwood Research's transformer interpretability tools, conveniently packaged in a Docker container for simple and reproducible deployments.
-
Updated
Apr 21, 2024 - Dockerfile
This project contains a proof of concept outlining the potential misuse of contemporary Artificial Intelligence models to influence public perception, highlighting the need to engineer robust defenses against such threats to ensure safety of our political systems. Entry for the OpenAI Preparedness Challenge.
-
Updated
Jan 14, 2024
-
Updated
Sep 2, 2018 - HTML
R code for Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety
-
Updated
Feb 9, 2024 - R
Analysis of the survey "Towards best practices in AGI safety and governance: A survey of expert opinion"
-
Updated
May 11, 2023 - Jupyter Notebook
In-depth evaluation of the ETHICS utilitarianism task dataset. An assessment of approaches to improved interpretability (SHAP, Bayesian transformers).
-
Updated
Jun 3, 2021 - Jupyter Notebook
-
Updated
Feb 19, 2024 - Jupyter Notebook
materials related to ideas on reading materials, events and, in general, the form of MIRIxPrague
-
Updated
Apr 2, 2018
Implementation of adaptive constrained RL algorithms. Child repository of @lasgroup/safe-adaptation-gym
-
Updated
Oct 5, 2022 - Python
The Model Library is a project that maps the risks associated with modern machine learning systems.
-
Updated
Apr 4, 2024 - Python
a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN
-
Updated
Oct 30, 2022
a project to ensure that all child processes created by an agent "inherit" the agent's safety controls
-
Updated
Oct 29, 2022
Improve this page
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."