ryan_greenblatt

Karma: 14,437

I’m the chief scientist at Redwood Research.

To be legible, evidence of misalignment probably has to be behavioral

ryan_greenblattApr 15, 2025, 6:14 PM

54 points

14 comments3 min readLW link

Why do misalignment risks increase as AIs get more capable?

ryan_greenblattApr 11, 2025, 3:06 AM

33 points

6 comments3 min readLW link

An overview of areas of control work

ryan_greenblattMar 25, 2025, 10:02 PM

31 points

0 comments28 min readLW link

An overview of control measures

ryan_greenblattMar 24, 2025, 11:16 PM

40 points

0 comments26 min readLW link

Notes on countermeasures for exploration hacking (aka sandbagging)

ryan_greenblattMar 24, 2025, 6:39 PM

52 points

6 comments8 min readLW link

Notes on handling non-concentrated failures with AI control: high level methods and different regimes

ryan_greenblattMar 24, 2025, 1:00 AM

22 points

3 comments16 min readLW link

Prioritizing threats for AI control

ryan_greenblattMar 19, 2025, 5:09 PM

48 points

2 comments10 min readLW link

Will alignment-faking Claude accept a deal to reveal its misalignment?

ryan_greenblatt and Kyle Fish

Jan 31, 2025, 4:49 PM

197 points

28 comments12 min readLW link

AI companies are unlikely to make high-assurance safety cases if timelines are short

ryan_greenblattJan 23, 2025, 6:41 PM

145 points

5 comments13 min readLW link

How will we update about scheming?

ryan_greenblattJan 6, 2025, 8:21 PM

169 points

20 comments36 min readLW link

A breakdown of AI capability levels focused on AI R&D labor acceleration

ryan_greenblattDec 22, 2024, 8:56 PM

104 points

5 comments6 min readLW link

Alignment Faking in Large Language Models

ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman and Buck

Dec 18, 2024, 5:19 PM

483 points

75 comments10 min readLW link

Getting 50% (SoTA) on ARC-AGI with GPT-4o

ryan_greenblattJun 17, 2024, 6:44 PM

263 points

50 comments13 min readLW link

Memorizing weak examples can elicit strong behavior out of password-locked models

Fabien Roger and ryan_greenblatt

Jun 6, 2024, 11:54 PM

58 points

5 comments7 min readLW link

[Paper] Stress-testing capability elicitation with password-locked models

Fabien Roger and ryan_greenblatt

Jun 4, 2024, 2:52 PM

85 points

10 comments12 min readLW link

(arxiv.org)

Thoughts on SB-1047

ryan_greenblattMay 29, 2024, 11:26 PM

60 points

1 comment11 min readLW link

How useful is “AI Control” as a framing on AI X-Risk?

habryka and ryan_greenblatt

Mar 14, 2024, 6:06 PM

70 points

4 comments34 min readLW link

Notes on control evaluations for safety cases

ryan_greenblatt, Buck and Fabien Roger

Feb 28, 2024, 4:15 PM

49 points

0 comments32 min readLW link

Preventing model exfiltration with upload limits

ryan_greenblattFeb 6, 2024, 4:29 PM

69 points

22 comments14 min readLW link

The case for ensuring that powerful AIs are controlled

ryan_greenblatt and Buck

Jan 24, 2024, 4:11 PM

275 points

73 comments28 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer