RSS

Joe Carlsmith

Karma: 5,033

Senior research analyst at Open Philanthropy. Doctorate in philosophy from the University of Oxford. Opinions my own.

Video and tran­script of talk on au­tomat­ing al­ign­ment research

Joe CarlsmithApr 30, 2025, 5:43 PM
21 points
0 comments24 min readLW link
(joecarlsmith.com)

Can we safely au­to­mate al­ign­ment re­search?

Joe CarlsmithApr 30, 2025, 5:37 PM
45 points
25 comments48 min readLW link
(joecarlsmith.com)

AI for AI safety

Joe CarlsmithMar 14, 2025, 3:00 PM
78 points
13 comments17 min readLW link
(joecarlsmith.substack.com)

Paths and waysta­tions in AI safety

Joe CarlsmithMar 11, 2025, 6:52 PM
41 points
1 comment11 min readLW link
(joecarlsmith.substack.com)

When should we worry about AI power-seek­ing?

Joe CarlsmithFeb 19, 2025, 7:44 PM
20 points
0 comments18 min readLW link
(joecarlsmith.substack.com)

What is it to solve the al­ign­ment prob­lem?

Joe CarlsmithFeb 13, 2025, 6:42 PM
31 points
6 comments19 min readLW link
(joecarlsmith.substack.com)

How do we solve the al­ign­ment prob­lem?

Joe CarlsmithFeb 13, 2025, 6:27 PM
63 points
8 comments6 min readLW link
(joecarlsmith.substack.com)

Fake think­ing and real thinking

Joe CarlsmithJan 28, 2025, 8:05 PM
106 points
11 comments38 min readLW link

Takes on “Align­ment Fak­ing in Large Lan­guage Models”

Joe CarlsmithDec 18, 2024, 6:22 PM
105 points
7 comments62 min readLW link

In­cen­tive de­sign and ca­pa­bil­ity elicitation

Joe CarlsmithNov 12, 2024, 8:56 PM
31 points
0 comments12 min readLW link

Op­tion control

Joe CarlsmithNov 4, 2024, 5:54 PM
28 points
0 comments54 min readLW link

Mo­ti­va­tion control

Joe CarlsmithOct 30, 2024, 5:15 PM
45 points
7 comments52 min readLW link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe CarlsmithOct 28, 2024, 9:57 PM
54 points
5 comments32 min readLW link

Video and tran­script of pre­sen­ta­tion on Oth­er­ness and con­trol in the age of AGI

Joe CarlsmithOct 8, 2024, 10:30 PM
35 points
1 comment27 min readLW link

What is it to solve the al­ign­ment prob­lem? (Notes)

Joe CarlsmithAug 24, 2024, 9:19 PM
69 points
18 comments53 min readLW link

Value frag­ility and AI takeover

Joe CarlsmithAug 5, 2024, 9:28 PM
76 points
5 comments30 min readLW link

A frame­work for think­ing about AI power-seeking

Joe Carlsmith24 Jul 2024 22:41 UTC
62 points
15 comments16 min readLW link

Lov­ing a world you don’t trust

Joe Carlsmith18 Jun 2024 19:31 UTC
135 points
13 comments33 min readLW link

On “first crit­i­cal tries” in AI alignment

Joe Carlsmith5 Jun 2024 0:19 UTC
54 points
8 comments14 min readLW link

On attunement

Joe Carlsmith25 Mar 2024 12:47 UTC
100 points
12 comments22 min readLW link