RSS

Joe Carlsmith

Karma: 4,420

Senior research analyst at Open Philanthropy. Doctorate in philosophy from the University of Oxford. Opinions my own.

In­cen­tive de­sign and ca­pa­bil­ity elicitation

Joe Carlsmith12 Nov 2024 20:56 UTC
31 points
0 comments12 min readLW link

Op­tion control

Joe Carlsmith4 Nov 2024 17:54 UTC
26 points
0 comments54 min readLW link

Mo­ti­va­tion control

Joe Carlsmith30 Oct 2024 17:15 UTC
43 points
7 comments52 min readLW link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe Carlsmith28 Oct 2024 21:57 UTC
52 points
5 comments32 min readLW link

Video and tran­script of pre­sen­ta­tion on Oth­er­ness and con­trol in the age of AGI

Joe Carlsmith8 Oct 2024 22:30 UTC
35 points
1 comment27 min readLW link

What is it to solve the al­ign­ment prob­lem?

Joe Carlsmith24 Aug 2024 21:19 UTC
68 points
17 comments53 min readLW link

Value frag­ility and AI takeover

Joe Carlsmith5 Aug 2024 21:28 UTC
76 points
5 comments30 min readLW link

A frame­work for think­ing about AI power-seeking

Joe Carlsmith24 Jul 2024 22:41 UTC
62 points
15 comments16 min readLW link

Lov­ing a world you don’t trust

Joe Carlsmith18 Jun 2024 19:31 UTC
134 points
13 comments33 min readLW link

On “first crit­i­cal tries” in AI alignment

Joe Carlsmith5 Jun 2024 0:19 UTC
54 points
8 comments14 min readLW link

On attunement

Joe Carlsmith25 Mar 2024 12:47 UTC
92 points
8 comments22 min readLW link

Video and tran­script of pre­sen­ta­tion on Schem­ing AIs

Joe Carlsmith22 Mar 2024 15:52 UTC
32 points
1 comment32 min readLW link

On green

Joe Carlsmith21 Mar 2024 17:38 UTC
263 points
35 comments31 min readLW link

On the abo­li­tion of man

Joe Carlsmith18 Jan 2024 18:17 UTC
88 points
18 comments41 min readLW link

Be­ing nicer than Clippy

Joe Carlsmith16 Jan 2024 19:44 UTC
109 points
32 comments27 min readLW link

An even deeper atheism

Joe Carlsmith11 Jan 2024 17:28 UTC
125 points
47 comments15 min readLW link

Does AI risk “other” the AIs?

Joe Carlsmith9 Jan 2024 17:51 UTC
59 points
3 comments8 min readLW link

When “yang” goes wrong

Joe Carlsmith8 Jan 2024 16:35 UTC
72 points
6 comments13 min readLW link

Deep athe­ism and AI risk

Joe Carlsmith4 Jan 2024 18:58 UTC
149 points
22 comments27 min readLW link

Gentle­ness and the ar­tifi­cial Other

Joe Carlsmith2 Jan 2024 18:21 UTC
291 points
33 comments11 min readLW link