ai-safety

This project contains a proof of concept outlining the potential misuse of contemporary Artificial Intelligence models to influence public perception, highlighting the need to engineer robust defenses against such threats to ensure safety of our political systems. Entry for the OpenAI Preparedness Challenge.

research openai ai-safety preparedness

Updated Jan 14, 2024

riceissa / miri-top-contributors

Star

sql ai-safety ai-alignment donations-list-website

Updated Sep 2, 2018
HTML

gserapio / intersectional-ai-safety

Star

R code for Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety

r multilevel-models brms ai-safety conversational-ai responsible-ai google-research

Updated Feb 9, 2024
R

governanceai / AGI-safety-and-governance-practices

Star

Analysis of the survey "Towards best practices in AGI safety and governance: A survey of expert opinion"

artificial-intelligence ai-safety ai-governance artificial-intelligence-governance artificial-intelligence-safety expert-survey

Updated May 11, 2023
Jupyter Notebook

Omegastick / credit-hacking

Star

Eliciting credit hacking behaviours in large language models

ai-safety llm

Updated Sep 14, 2023
Python

danielmamay / nlp-ethics

Star

In-depth evaluation of the ETHICS utilitarianism task dataset. An assessment of approaches to improved interpretability (SHAP, Bayesian transformers).

machine-learning natural-language-processing ai-safety

Updated Jun 3, 2021
Jupyter Notebook

campbellborder / spar-aaron-dolphin

Star

ai-safety sycophancy

Updated Feb 19, 2024
Jupyter Notebook

MIRIxPrague / organization

Star

materials related to ideas on reading materials, events and, in general, the form of MIRIxPrague

workshops ai-safety reading-materials

Updated Apr 2, 2018

EffectiveAltruismUCT / indabaX-ai-safety-workshop-2023

Star

IndabaX AI Safety Workshop 2023

ai africa ai-safety

Updated Jul 15, 2023

lasgroup / safe-adaptation-agents

Star

Implementation of adaptive constrained RL algorithms. Child repository of @lasgroup/safe-adaptation-gym

machine-learning reinforcement-learning ai-safety meta-learning safe-adaptation

Updated Oct 5, 2022
Python

HorizonEventsAgency / tracker

Star

Automated tracking of events related to AI safety

ai-safety events-tracker

Updated Aug 23, 2023

Nkluge-correa / Model-Library

Star

The Model Library is a project that maps the risks associated with modern machine learning systems.

ai deep-learning ai-safety large-language-models

Updated Apr 4, 2024
Python

ai-fail-safe / mulligan

Star

a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Oct 30, 2022

ai-fail-safe / gene-drive

Star

a project to ensure that all child processes created by an agent "inherit" the agent's safety controls

failsafe ai-safety ai-alignment fail-safe

Updated Oct 29, 2022

Improve this page

Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-safety

Here are 90 public repositories matching this topic...

dynaroars / vnncomp-benchmark-generation

IQTLabs / aia-platform

Servan42 / AI_story

ztjona / ztjona.github.io

ea-uct / ai-safety-event-2021

AlexTMjugador / redwoodresearch-interp-docker

oscaem / preparedness-challenge

riceissa / miri-top-contributors

gserapio / intersectional-ai-safety

governanceai / AGI-safety-and-governance-practices

Omegastick / credit-hacking

danielmamay / nlp-ethics

campbellborder / spar-aaron-dolphin

MIRIxPrague / organization

EffectiveAltruismUCT / indabaX-ai-safety-workshop-2023

lasgroup / safe-adaptation-agents

HorizonEventsAgency / tracker

Nkluge-correa / Model-Library

ai-fail-safe / mulligan

ai-fail-safe / gene-drive

Improve this page

Add this topic to your repo