ai-safety

NeurIPS workshop : We examine the risk of powerful malignant intelligent actors spreading their influence over networks of agents with varying intelligence and motivations.

ai-safety multi-agents

Updated Dec 11, 2023
Python

Nkluge-correa / Model-Library

Star

The Model Library is a project that maps the risks associated with modern machine learning systems.

ai deep-learning ai-safety large-language-models

Updated Apr 4, 2024
Python

Dunchead / ai-safety

Star

Mapping AI risks and possible solutions

ai ai-safety ai-risk

Updated May 6, 2024
JavaScript

lancopku / DAN

Star

[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

natural-language-processing ai-safety backdoor-attacks backdoor-defense

Updated Feb 26, 2023
Python

ztjona / ztjona.github.io

Star

My personal website.

machine-learning deep-learning ai-safety

Updated Mar 19, 2024
HTML

ea-uct / ai-safety-event-2021

Star

A repository for the event on AI safety hosted by the Effective Altruism Society at the University of Cape Town.

ai-safety effective-altruism

Updated Sep 16, 2021

a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Oct 30, 2022

ai-fail-safe / gene-drive

Star

a project to ensure that all child processes created by an agent "inherit" the agent's safety controls

failsafe ai-safety ai-alignment fail-safe

Updated Oct 29, 2022

esbenkc / benchmarks

Star

📊 Benchmarking the safety of AI systems

ai hackathon ai-safety alignment-jam

Updated Jul 1, 2023
Jupyter Notebook

Jakobovski / ai-safety-cheatsheet

Star

A compilation of AI safety ideas, problems, and solutions.

agi artificial-intelligence alignment ai-safety agi-safety

Updated Mar 12, 2023

AlexTMjugador / redwoodresearch-interp-docker

Sponsor

Star

📦 Redwood Research's transformer interpretability tools, conveniently packaged in a Docker container for simple and reproducible deployments.

docker ai ai-safety redwood-research ai-interpretability

Updated Apr 21, 2024
Dockerfile

oscaem / preparedness-challenge

Star

This project contains a proof of concept outlining the potential misuse of contemporary Artificial Intelligence models to influence public perception, highlighting the need to engineer robust defenses against such threats to ensure safety of our political systems. Entry for the OpenAI Preparedness Challenge.

research openai ai-safety preparedness

Updated Jan 14, 2024

EffiSciencesResearch / ML4G-2.0

Star

Improved version of the technical workshops for the 10-day ML4G camp on safety of AI systems

machine-learning exercises ai-safety

Updated Apr 10, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-safety

Here are 88 public repositories matching this topic...

EffectiveAltruismUCT / indabaX-ai-safety-workshop-2023

endlessloop2 / UC-AI-Thinkathon-2023

dynaroars / vnncomp-benchmark-generation

HorizonEventsAgency / tracker

tamlhp / awesome-privex

dynaroars / neuralsat

Nkluge-correa / Aira

romaingrx / Second-Order-Jailbreak

Nkluge-correa / Model-Library

Dunchead / ai-safety

lancopku / DAN

ztjona / ztjona.github.io

ea-uct / ai-safety-event-2021

ai-fail-safe / mulligan

ai-fail-safe / gene-drive

esbenkc / benchmarks

Jakobovski / ai-safety-cheatsheet

AlexTMjugador / redwoodresearch-interp-docker

oscaem / preparedness-challenge

EffiSciencesResearch / ML4G-2.0

Improve this page

Add this topic to your repo