RSS

AI Safety Camp

TagLast edit: Dec 30, 2024, 10:23 AM by Dakara

AI Safety Camp is a non-profit initiative to run programs for diversely skilled researchers who want to try collaborate on an open problem for reducing AI existential risk.

AISC 2024 - Pro­ject Summaries

NickyPNov 27, 2023, 10:32 PM
48 points
3 comments18 min readLW link

AI Safety Camp 2024

Linda LinseforsNov 18, 2023, 10:37 AM
15 points
1 comment4 min readLW link
(aisafety.camp)

Re­search Adenda: Model­ling Tra­jec­to­ries of Lan­guage Models

NickyPNov 13, 2023, 2:33 PM
27 points
0 comments12 min readLW link

How teams went about their re­search at AI Safety Camp edi­tion 5

RemmeltJun 28, 2021, 3:15 PM
24 points
0 comments6 min readLW link

The first AI Safety Camp & onwards

RemmeltJun 7, 2018, 8:13 PM
46 points
0 comments8 min readLW link

Thoughts on AI Safety Camp

Charlie SteinerMay 13, 2022, 7:16 AM
33 points
8 comments7 min readLW link

Ap­pli­ca­tions for AI Safety Camp 2022 Now Open!

adamShimiNov 17, 2021, 9:42 PM
47 points
3 comments1 min readLW link

AI Safety Camp 10

Oct 26, 2024, 11:08 AM
38 points
9 comments18 min readLW link

This might be the last AI Safety Camp

Jan 24, 2024, 9:33 AM
196 points
34 comments1 min readLW link

Trust-max­i­miz­ing AGI

Feb 25, 2022, 3:13 PM
7 points
26 comments9 min readLW link
(universalprior.substack.com)

A Study of AI Science Models

May 13, 2023, 11:25 PM
20 points
0 comments24 min readLW link

AISC Pro­ject: Bench­marks for Stable Reflectivity

jacquesthibsNov 13, 2023, 2:51 PM
17 points
0 comments8 min readLW link

We don’t want to post again “This might be the last AI Safety Camp”

Jan 21, 2025, 12:03 PM
36 points
17 comments1 min readLW link
(manifund.org)

The­ory of Change for AI Safety Camp

Linda LinseforsJan 22, 2025, 10:07 PM
35 points
3 comments7 min readLW link

Machines vs. Memes 2: Memet­i­cally-Mo­ti­vated Model Extensions

naterushMay 31, 2022, 10:03 PM
6 points
0 comments4 min readLW link

AISC 2023, Progress Re­port for March: Team In­ter­pretable Architectures

Apr 2, 2023, 4:19 PM
14 points
0 comments14 min readLW link

Machines vs Memes Part 3: Imi­ta­tion and Memes

ceru23Jun 1, 2022, 1:36 PM
7 points
0 comments7 min readLW link

Steganog­ra­phy and the Cy­cleGAN—al­ign­ment failure case study

Jan CzechowskiJun 11, 2022, 9:41 AM
34 points
0 comments4 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A survey

Jun 22, 2022, 3:05 PM
32 points
1 comment14 min readLW link

AISC9 has ended and there will be an AISC10

Linda LinseforsApr 29, 2024, 10:53 AM
75 points
4 comments2 min readLW link

In­vi­ta­tion to lead a pro­ject at AI Safety Camp (Vir­tual Edi­tion, 2025)

Aug 23, 2024, 2:18 PM
17 points
2 comments4 min readLW link

Towards a for­mal­iza­tion of the agent struc­ture problem

Alex_AltairApr 29, 2024, 8:28 PM
55 points
6 comments14 min readLW link

Ap­ply to lead a pro­ject dur­ing the next vir­tual AI Safety Camp

Sep 13, 2023, 1:29 PM
19 points
0 comments5 min readLW link
(aisafety.camp)

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda LinseforsSep 27, 2023, 9:27 PM
22 points
12 comments4 min readLW link

Ex­trac­tion of hu­man prefer­ences 👨→🤖

arunraja-hubAug 24, 2021, 4:34 PM
18 points
2 comments5 min readLW link

[Aspira­tion-based de­signs] 1. In­for­mal in­tro­duc­tion

Apr 28, 2024, 1:00 PM
44 points
4 comments8 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Sep 29, 2021, 5:09 PM
30 points
7 comments10 min readLW link

An­nounc­ing the sec­ond AI Safety Camp

LachouetteJun 11, 2018, 6:59 PM
34 points
0 comments1 min readLW link

AI Safety Re­search Camp—Pro­ject Proposal

David_KristofferssonFeb 2, 2018, 4:25 AM
29 points
11 comments8 min readLW link

The­o­ries of Mo­du­lar­ity in the Biolog­i­cal Literature

Apr 4, 2022, 12:48 PM
51 points
13 comments7 min readLW link

Pro­ject In­tro: Selec­tion The­o­rems for Modularity

Apr 4, 2022, 12:59 PM
73 points
20 comments16 min readLW link

Open Prob­lems in Nega­tive Side Effect Minimization

May 6, 2022, 9:37 AM
12 points
6 comments17 min readLW link

AISC team re­port: Soft-op­ti­miza­tion, Bayes and Goodhart

Jun 27, 2023, 6:05 AM
38 points
2 comments15 min readLW link

A Friendly Face (Another Failure Story)

Jun 20, 2023, 10:31 AM
65 points
21 comments16 min readLW link

Introduction

Jun 30, 2023, 8:45 PM
8 points
0 comments2 min readLW link

Pos­i­tive Attractors

Jun 30, 2023, 8:43 PM
6 points
0 comments13 min readLW link

In­her­ently In­ter­pretable Architectures

Jun 30, 2023, 8:43 PM
4 points
0 comments7 min readLW link

The Con­trol Prob­lem: Un­solved or Un­solv­able?

RemmeltJun 2, 2023, 3:42 PM
55 points
46 comments14 min readLW link

“Want­ing” and “lik­ing”

Mateusz BagińskiAug 30, 2023, 2:52 PM
23 points
3 comments29 min readLW link

How teams went about their re­search at AI Safety Camp edi­tion 8

Sep 9, 2023, 4:34 PM
28 points
0 comments13 min readLW link

AISC5 Ret­ro­spec­tive: Mechanisms for Avoid­ing Tragedy of the Com­mons in Com­mon Pool Re­source Problems

Sep 27, 2021, 4:46 PM
8 points
3 comments7 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

Jun 8, 2021, 5:12 PM
65 points
11 comments7 min readLW link

Ac­knowl­edg­ing Hu­man Prefer­ence Types to Sup­port Value Learning

NandiNov 13, 2018, 6:57 PM
34 points
4 comments9 min readLW link

Em­piri­cal Ob­ser­va­tions of Ob­jec­tive Ro­bust­ness Failures

Jun 23, 2021, 11:23 PM
63 points
5 comments9 min readLW link

Dis­cus­sion: Ob­jec­tive Ro­bust­ness and In­ner Align­ment Terminology

Jun 23, 2021, 11:25 PM
73 points
7 comments9 min readLW link

A sur­vey of tool use and work­flows in al­ign­ment research

Mar 23, 2022, 11:44 PM
45 points
4 comments1 min readLW link

Machines vs Memes Part 1: AI Align­ment and Memetics

Harriet FarlowMay 31, 2022, 10:03 PM
19 points
1 comment6 min readLW link

AI takeover table­top RPG: “The Treach­er­ous Turn”

Daniel KokotajloNov 30, 2022, 7:16 AM
53 points
5 comments1 min readLW link

Re­sults from a sur­vey on tool use and work­flows in al­ign­ment research

Dec 19, 2022, 3:19 PM
79 points
2 comments19 min readLW link

A de­scrip­tive, not pre­scrip­tive, overview of cur­rent AI Align­ment Research

Jun 6, 2022, 9:59 PM
139 points
21 comments7 min readLW link

AI Safety Camp, Vir­tual Edi­tion 2023

Linda LinseforsJan 6, 2023, 11:09 AM
40 points
10 comments3 min readLW link
(aisafety.camp)

AI Safety Camp: Ma­chine Learn­ing for Scien­tific Dis­cov­ery

Eleni AngelouJan 6, 2023, 3:21 AM
3 points
0 comments1 min readLW link

Fund­ing case: AI Safety Camp 10

Dec 12, 2023, 9:08 AM
66 points
5 comments6 min readLW link
(manifund.org)

Paper re­view: “The Un­rea­son­able Effec­tive­ness of Easy Train­ing Data for Hard Tasks”

Vassil TashevFeb 29, 2024, 6:44 PM
11 points
0 comments4 min readLW link

In­ter­view: Ap­pli­ca­tions w/​ Alice Rigg

jacobhaimesDec 19, 2023, 7:03 PM
12 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Think­ing About Propen­sity Evaluations

Aug 19, 2024, 9:23 AM
9 points
0 comments27 min readLW link

A Tax­on­omy Of AI Sys­tem Evaluations

Aug 19, 2024, 9:07 AM
13 points
0 comments14 min readLW link

INTERVIEW: StakeOut.AI w/​ Dr. Peter Park

jacobhaimesMar 4, 2024, 4:35 PM
6 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Train­ing-time do­main au­tho­riza­tion could be helpful for safety

May 25, 2024, 3:10 PM
15 points
4 comments7 min readLW link

A Re­view of Weak to Strong Gen­er­al­iza­tion [AI Safety Camp]

sevdeawesomeMar 7, 2024, 5:16 PM
13 points
0 comments9 min readLW link

Some rea­sons to start a pro­ject to stop harm­ful AI

RemmeltAug 22, 2024, 4:23 PM
5 points
0 comments2 min readLW link

In­duc­ing hu­man-like bi­ases in moral rea­son­ing LMs

Feb 20, 2024, 4:28 PM
23 points
3 comments14 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/​ Dr. Peter Park

jacobhaimesMar 18, 2024, 9:21 PM
5 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Pod­cast in­ter­view se­ries fea­tur­ing Dr. Peter Park

jacobhaimesMar 26, 2024, 12:25 AM
3 points
0 comments2 min readLW link
(into-ai-safety.github.io)

Im­mu­niza­tion against harm­ful fine-tun­ing attacks

Jun 6, 2024, 3:17 PM
4 points
0 comments12 min readLW link

“Open Source AI” is a lie, but it doesn’t have to be

jacobhaimesApr 30, 2024, 11:10 PM
18 points
5 comments6 min readLW link
(jacob-haimes.github.io)

Whirlwind Tour of Chain of Thought Liter­a­ture Rele­vant to Au­tomat­ing Align­ment Re­search.

sevdeawesomeJul 1, 2024, 5:50 AM
25 points
0 comments17 min readLW link

Self-Other Over­lap: A Ne­glected Ap­proach to AI Alignment

Jul 30, 2024, 4:22 PM
215 points
49 comments12 min readLW link

Fund­ing Case: AI Safety Camp 11

Dec 23, 2024, 8:51 AM
60 points
4 comments6 min readLW link
(manifund.org)

Em­brac­ing com­plex­ity when de­vel­op­ing and eval­u­at­ing AI re­spon­si­bly

Aliya AmirovaOct 11, 2024, 5:46 PM
2 points
9 comments9 min readLW link

Rep­re­sen­ta­tion Eng­ineer­ing has Its Prob­lems, but None Seem Unsolvable

Lukasz G BartoszczeFeb 26, 2025, 7:53 PM
15 points
1 comment3 min readLW link

Agency over­hang as a proxy for Sharp left turn

Nov 7, 2024, 12:14 PM
6 points
0 comments5 min readLW link

For­mal­ize the Hash­iness Model of AGI Un­con­tain­abil­ity

RemmeltNov 9, 2024, 4:10 PM
3 points
0 comments1 min readLW link
(docs.google.com)

No­body Asks the Mon­key: Why Hu­man Agency Mat­ters in the AI Age

Miloš BorenovićDec 3, 2024, 2:16 PM
1 point
0 comments2 min readLW link
(open.substack.com)

Build­ing AI safety bench­mark en­vi­ron­ments on themes of uni­ver­sal hu­man values

Roland PihlakasJan 3, 2025, 4:24 AM
18 points
3 comments8 min readLW link
(docs.google.com)

In­tro to On­to­ge­netic Curriculum

ErisApr 13, 2023, 5:15 PM
20 points
1 comment2 min readLW link

Paths to failure

Apr 25, 2023, 8:03 AM
29 points
1 comment8 min readLW link

Con­trol Sym­me­try: why we might want to start in­ves­ti­gat­ing asym­met­ric al­ign­ment interventions

domenicrosatiNov 11, 2023, 5:27 PM
25 points
1 comment2 min readLW link

The Science Al­gorithm AISC Project

Johannes C. MayerNov 13, 2023, 12:52 PM
12 points
0 comments1 min readLW link
(docs.google.com)

AISC pro­ject: Satis­fIA – AI that satis­fies with­out over­do­ing it

Jobst HeitzigNov 11, 2023, 6:22 PM
12 points
0 comments1 min readLW link
(docs.google.com)

AISC pro­ject: TinyEvals

Jett JaniakNov 22, 2023, 8:47 PM
22 points
0 comments4 min readLW link

AISC pro­ject: How promis­ing is au­tomat­ing al­ign­ment re­search? (liter­a­ture re­view)

Bogdan Ionut CirsteaNov 28, 2023, 2:47 PM
4 points
1 comment1 min readLW link
(docs.google.com)

Agen­tic Mess (A Failure Story)

Jun 6, 2023, 1:09 PM
46 points
5 comments13 min readLW link