AI Control

Mar 24, 2024, 10:29 PM

This is a collection of posts about AI Control, an approach to AI safety that focuses on safety measures aimed at preventing powerful AIs from causing unacceptably bad outcomes even if powerful AIs are misaligned and intentionally try to subvert those safety measures.

These posts are useful to understand the AI Control approach, its upsides, and downsides. They only cover a small fraction of AI safety work relevant to AI control.

The case for ensuring that powerful AIs are controlled

ryan_greenblatt and Buck

Jan 24, 2024, 4:11 PM

273 points

70 comments28 min readLW link

AI Control: Improving Safety Despite Intentional Subversion

Buck, Fabien Roger, ryan_greenblatt and Kshitij Sachan

Dec 13, 2023, 3:51 PM

236 points

24 comments10 min readLW link 4 reviews

Untrusted smart models and trusted dumb models

BuckNov 4, 2023, 3:06 AM

87 points

17 comments6 min readLW link 1 review

Catching AIs red-handed

ryan_greenblatt and Buck

Jan 5, 2024, 5:43 PM

110 points

27 comments17 min readLW link

Would catching your AIs trying to escape convince AI developers to slow down or undeploy?

BuckAug 26, 2024, 4:46 PM

305 points

77 comments4 min readLW link

AI catastrophes and rogue deployments

BuckJun 3, 2024, 5:04 PM

120 points

16 comments8 min readLW link

Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy

Buck and ryan_greenblatt

Jul 26, 2023, 5:02 PM

98 points

19 comments1 min readLW link 1 review

Auditing failures vs concentrated failures

ryan_greenblatt and Fabien Roger

Dec 11, 2023, 2:47 AM

47 points

1 comment7 min readLW link 1 review

Protocol evaluations: good analogies vs control

Fabien RogerFeb 19, 2024, 6:00 PM

42 points

10 comments11 min readLW link

How useful is “AI Control” as a framing on AI X-Risk?

habryka and ryan_greenblatt

Mar 14, 2024, 6:06 PM

70 points

4 comments34 min readLW link

Fields that I reference when thinking about AI takeover prevention

BuckAug 13, 2024, 11:08 PM

144 points

16 comments10 min readLW link

(redwoodresearch.substack.com)

New report: Safety Cases for AI

joshcMar 20, 2024, 4:45 PM

89 points

14 comments1 min readLW link

(twitter.com)

Notes on control evaluations for safety cases

ryan_greenblatt, Buck and Fabien Roger

Feb 28, 2024, 4:15 PM

49 points

0 comments32 min readLW link

Toy models of AI control for concentrated catastrophe prevention

Fabien Roger and Buck

Feb 6, 2024, 1:38 AM

51 points

2 comments7 min readLW link

Games for AI Control

charlie_griffin and Buck

Jul 11, 2024, 6:40 PM

43 points

0 comments5 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer