RSS

Joe Carlsmith

Karma: 4,420

Senior research analyst at Open Philanthropy. Doctorate in philosophy from the University of Oxford. Opinions my own.

In­cen­tive de­sign and ca­pa­bil­ity elicitation

Joe Carlsmith12 Nov 2024 20:56 UTC
31 points
0 comments12 min readLW link

Op­tion control

Joe Carlsmith4 Nov 2024 17:54 UTC
26 points
0 comments54 min readLW link

Mo­ti­va­tion control

Joe Carlsmith30 Oct 2024 17:15 UTC
43 points
7 comments52 min readLW link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe Carlsmith28 Oct 2024 21:57 UTC
52 points
5 comments32 min readLW link

Video and tran­script of pre­sen­ta­tion on Oth­er­ness and con­trol in the age of AGI

Joe Carlsmith8 Oct 2024 22:30 UTC
35 points
1 comment27 min readLW link

What is it to solve the al­ign­ment prob­lem?

Joe Carlsmith24 Aug 2024 21:19 UTC
68 points
17 comments53 min readLW link

Value frag­ility and AI takeover

Joe Carlsmith5 Aug 2024 21:28 UTC
76 points
5 comments30 min readLW link

A frame­work for think­ing about AI power-seeking

Joe Carlsmith24 Jul 2024 22:41 UTC
62 points
15 comments16 min readLW link

Lov­ing a world you don’t trust

Joe Carlsmith18 Jun 2024 19:31 UTC
134 points
13 comments33 min readLW link

On “first crit­i­cal tries” in AI alignment

Joe Carlsmith5 Jun 2024 0:19 UTC
54 points
8 comments14 min readLW link

On attunement

Joe Carlsmith25 Mar 2024 12:47 UTC
92 points
8 comments22 min readLW link

Video and tran­script of pre­sen­ta­tion on Schem­ing AIs

Joe Carlsmith22 Mar 2024 15:52 UTC
32 points
1 comment32 min readLW link

On green

Joe Carlsmith21 Mar 2024 17:38 UTC
263 points
35 comments31 min readLW link