RSS

AI Safety Camp

TagLast edit: 10 May 2022 6:23 UTC by Remmelt

AI Safety Camp (AISC) is a non-profit initiative to run programs for diversely skilled researchers who want to try collaborate on an open problem for reducing AI existential risk.

Official Website

AISC 2024 - Pro­ject Summaries

NickyP27 Nov 2023 22:32 UTC
48 points
3 comments18 min readLW link

AISC Pro­ject: Model­ling Tra­jec­to­ries of Lan­guage Models

NickyP13 Nov 2023 14:33 UTC
27 points
0 comments12 min readLW link

AI Safety Camp 2024

Linda Linsefors18 Nov 2023 10:37 UTC
15 points
1 comment4 min readLW link
(aisafety.camp)

How teams went about their re­search at AI Safety Camp edi­tion 5

Remmelt28 Jun 2021 15:15 UTC
24 points
0 comments6 min readLW link

The first AI Safety Camp & onwards

Remmelt7 Jun 2018 20:13 UTC
46 points
0 comments8 min readLW link

Thoughts on AI Safety Camp

Charlie Steiner13 May 2022 7:16 UTC
33 points
8 comments7 min readLW link

Ap­pli­ca­tions for AI Safety Camp 2022 Now Open!

adamShimi17 Nov 2021 21:42 UTC
47 points
3 comments1 min readLW link

A Study of AI Science Models

13 May 2023 23:25 UTC
20 points
0 comments24 min readLW link

This might be the last AI Safety Camp

24 Jan 2024 9:33 UTC
195 points
34 comments1 min readLW link

Trust-max­i­miz­ing AGI

25 Feb 2022 15:13 UTC
7 points
26 comments9 min readLW link
(universalprior.substack.com)

AI Safety Camp 10

26 Oct 2024 11:08 UTC
38 points
9 comments18 min readLW link

[Aspira­tion-based de­signs] 1. In­for­mal in­tro­duc­tion

28 Apr 2024 13:00 UTC
41 points
4 comments8 min readLW link

Towards a for­mal­iza­tion of the agent struc­ture problem

Alex_Altair29 Apr 2024 20:28 UTC
55 points
5 comments14 min readLW link

Machines vs. Memes 2: Memet­i­cally-Mo­ti­vated Model Extensions

naterush31 May 2022 22:03 UTC
6 points
0 comments4 min readLW link

Machines vs Memes Part 3: Imi­ta­tion and Memes

ceru231 Jun 2022 13:36 UTC
7 points
0 comments7 min readLW link

Steganog­ra­phy and the Cy­cleGAN—al­ign­ment failure case study

Jan Czechowski11 Jun 2022 9:41 UTC
34 points
0 comments4 min readLW link

Reflec­tion Mechanisms as an Align­ment tar­get: A survey

22 Jun 2022 15:05 UTC
32 points
1 comment14 min readLW link

AISC9 has ended and there will be an AISC10

Linda Linsefors29 Apr 2024 10:53 UTC
75 points
4 comments2 min readLW link

AISC Pro­ject: Bench­marks for Stable Reflectivity

jacquesthibs13 Nov 2023 14:51 UTC
17 points
0 comments8 min readLW link

In­vi­ta­tion to lead a pro­ject at AI Safety Camp (Vir­tual Edi­tion, 2025)

23 Aug 2024 14:18 UTC
17 points
2 comments4 min readLW link

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda Linsefors27 Sep 2023 21:27 UTC
22 points
12 comments4 min readLW link

Ex­trac­tion of hu­man prefer­ences 👨→🤖

arunraja-hub24 Aug 2021 16:34 UTC
18 points
2 comments5 min readLW link

A brief re­view of the rea­sons multi-ob­jec­tive RL could be im­por­tant in AI Safety Research

Ben Smith29 Sep 2021 17:09 UTC
30 points
7 comments10 min readLW link

An­nounc­ing the sec­ond AI Safety Camp

Lachouette11 Jun 2018 18:59 UTC
34 points
0 comments1 min readLW link

AI Safety Re­search Camp—Pro­ject Proposal

David_Kristoffersson2 Feb 2018 4:25 UTC
29 points
11 comments8 min readLW link

The­o­ries of Mo­du­lar­ity in the Biolog­i­cal Literature

4 Apr 2022 12:48 UTC
51 points
13 comments7 min readLW link

Pro­ject In­tro: Selec­tion The­o­rems for Modularity

4 Apr 2022 12:59 UTC
73 points
20 comments16 min readLW link

Open Prob­lems in Nega­tive Side Effect Minimization

6 May 2022 9:37 UTC
12 points
6 comments17 min readLW link

AISC 2023, Progress Re­port for March: Team In­ter­pretable Architectures

2 Apr 2023 16:19 UTC
14 points
0 comments14 min readLW link

Ap­ply to lead a pro­ject dur­ing the next vir­tual AI Safety Camp

13 Sep 2023 13:29 UTC
19 points
0 comments5 min readLW link
(aisafety.camp)

A Friendly Face (Another Failure Story)

20 Jun 2023 10:31 UTC
65 points
21 comments16 min readLW link

Introduction

30 Jun 2023 20:45 UTC
7 points
0 comments2 min readLW link

Pos­i­tive Attractors

30 Jun 2023 20:43 UTC
6 points
0 comments13 min readLW link

In­her­ently In­ter­pretable Architectures

30 Jun 2023 20:43 UTC
4 points
0 comments7 min readLW link

The Con­trol Prob­lem: Un­solved or Un­solv­able?

Remmelt2 Jun 2023 15:42 UTC
54 points
46 comments14 min readLW link

“Want­ing” and “lik­ing”

Mateusz Bagiński30 Aug 2023 14:52 UTC
23 points
3 comments29 min readLW link

AISC5 Ret­ro­spec­tive: Mechanisms for Avoid­ing Tragedy of the Com­mons in Com­mon Pool Re­source Problems

27 Sep 2021 16:46 UTC
8 points
3 comments7 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

8 Jun 2021 17:12 UTC
65 points
11 comments7 min readLW link

Ac­knowl­edg­ing Hu­man Prefer­ence Types to Sup­port Value Learning

Nandi13 Nov 2018 18:57 UTC
34 points
4 comments9 min readLW link

Em­piri­cal Ob­ser­va­tions of Ob­jec­tive Ro­bust­ness Failures

23 Jun 2021 23:23 UTC
63 points
5 comments9 min readLW link

Dis­cus­sion: Ob­jec­tive Ro­bust­ness and In­ner Align­ment Terminology

23 Jun 2021 23:25 UTC
73 points
7 comments9 min readLW link

A sur­vey of tool use and work­flows in al­ign­ment research

23 Mar 2022 23:44 UTC
45 points
4 comments1 min readLW link

Machines vs Memes Part 1: AI Align­ment and Memetics

Harriet Farlow31 May 2022 22:03 UTC
19 points
1 comment6 min readLW link

AI takeover table­top RPG: “The Treach­er­ous Turn”

Daniel Kokotajlo30 Nov 2022 7:16 UTC
53 points
5 comments1 min readLW link

Re­sults from a sur­vey on tool use and work­flows in al­ign­ment research

19 Dec 2022 15:19 UTC
79 points
2 comments19 min readLW link

A de­scrip­tive, not pre­scrip­tive, overview of cur­rent AI Align­ment Research

6 Jun 2022 21:59 UTC
139 points
21 comments7 min readLW link

AI Safety Camp, Vir­tual Edi­tion 2023

Linda Linsefors6 Jan 2023 11:09 UTC
40 points
10 comments3 min readLW link
(aisafety.camp)

AI Safety Camp: Ma­chine Learn­ing for Scien­tific Dis­cov­ery

Eleni Angelou6 Jan 2023 3:21 UTC
3 points
0 comments1 min readLW link

How teams went about their re­search at AI Safety Camp edi­tion 8

9 Sep 2023 16:34 UTC
28 points
0 comments13 min readLW link

Fund­ing case: AI Safety Camp

12 Dec 2023 9:08 UTC
66 points
5 comments6 min readLW link
(manifund.org)

Paper re­view: “The Un­rea­son­able Effec­tive­ness of Easy Train­ing Data for Hard Tasks”

Vassil Tashev29 Feb 2024 18:44 UTC
11 points
0 comments4 min readLW link

In­ter­view: Ap­pli­ca­tions w/​ Alice Rigg

jacobhaimes19 Dec 2023 19:03 UTC
12 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Think­ing About Propen­sity Evaluations

19 Aug 2024 9:23 UTC
8 points
0 comments27 min readLW link

A Tax­on­omy Of AI Sys­tem Evaluations

19 Aug 2024 9:07 UTC
12 points
0 comments14 min readLW link

INTERVIEW: StakeOut.AI w/​ Dr. Peter Park

jacobhaimes4 Mar 2024 16:35 UTC
6 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Train­ing-time do­main au­tho­riza­tion could be helpful for safety

25 May 2024 15:10 UTC
15 points
4 comments7 min readLW link

A Re­view of Weak to Strong Gen­er­al­iza­tion [AI Safety Camp]

sevdeawesome7 Mar 2024 17:16 UTC
13 points
0 comments9 min readLW link

Some rea­sons to start a pro­ject to stop harm­ful AI

Remmelt22 Aug 2024 16:23 UTC
5 points
0 comments2 min readLW link

In­duc­ing hu­man-like bi­ases in moral rea­son­ing LMs

20 Feb 2024 16:28 UTC
23 points
3 comments14 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/​ Dr. Peter Park

jacobhaimes18 Mar 2024 21:21 UTC
5 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Pod­cast in­ter­view se­ries fea­tur­ing Dr. Peter Park

jacobhaimes26 Mar 2024 0:25 UTC
3 points
0 comments2 min readLW link
(into-ai-safety.github.io)

Im­mu­niza­tion against harm­ful fine-tun­ing attacks

6 Jun 2024 15:17 UTC
4 points
0 comments12 min readLW link

“Open Source AI” is a lie, but it doesn’t have to be

jacobhaimes30 Apr 2024 23:10 UTC
18 points
5 comments6 min readLW link
(jacob-haimes.github.io)

Whirlwind Tour of Chain of Thought Liter­a­ture Rele­vant to Au­tomat­ing Align­ment Re­search.

sevdeawesome1 Jul 2024 5:50 UTC
25 points
0 comments17 min readLW link

Self-Other Over­lap: A Ne­glected Ap­proach to AI Alignment

30 Jul 2024 16:22 UTC
193 points
43 comments12 min readLW link

Em­brac­ing com­plex­ity when de­vel­op­ing and eval­u­at­ing AI re­spon­si­bly

Aliya Amirova11 Oct 2024 17:46 UTC
2 points
9 comments9 min readLW link

Agency over­hang as a proxy for Sharp left turn

7 Nov 2024 12:14 UTC
5 points
0 comments5 min readLW link

For­mal­ize the Hash­iness Model of AGI Un­con­tain­abil­ity

Remmelt9 Nov 2024 16:10 UTC
6 points
0 comments1 min readLW link
(docs.google.com)

No­body Asks the Mon­key: Why Hu­man Agency Mat­ters in the AI Age

Miloš Borenović3 Dec 2024 14:16 UTC
1 point
0 comments2 min readLW link
(open.substack.com)

In­tro to On­to­ge­netic Curriculum

Eris13 Apr 2023 17:15 UTC
19 points
1 comment2 min readLW link

Paths to failure

25 Apr 2023 8:03 UTC
29 points
1 comment8 min readLW link

Con­trol Sym­me­try: why we might want to start in­ves­ti­gat­ing asym­met­ric al­ign­ment interventions

domenicrosati11 Nov 2023 17:27 UTC
25 points
1 comment2 min readLW link

The Science Al­gorithm AISC Project

Johannes C. Mayer13 Nov 2023 12:52 UTC
12 points
0 comments1 min readLW link
(docs.google.com)

AISC pro­ject: Satis­fIA – AI that satis­fies with­out over­do­ing it

Jobst Heitzig11 Nov 2023 18:22 UTC
12 points
0 comments1 min readLW link
(docs.google.com)

AISC pro­ject: TinyEvals

Jett Janiak22 Nov 2023 20:47 UTC
22 points
0 comments4 min readLW link

AISC pro­ject: How promis­ing is au­tomat­ing al­ign­ment re­search? (liter­a­ture re­view)

Bogdan Ionut Cirstea28 Nov 2023 14:47 UTC
4 points
1 comment1 min readLW link
(docs.google.com)

Agen­tic Mess (A Failure Story)

6 Jun 2023 13:09 UTC
46 points
5 comments13 min readLW link

AISC team re­port: Soft-op­ti­miza­tion, Bayes and Goodhart

27 Jun 2023 6:05 UTC
37 points
2 comments15 min readLW link