Joe Carlsmith

Karma: 5,068

Senior research analyst at Open Philanthropy. Doctorate in philosophy from the University of Oxford. Opinions my own.

Video and transcript of talk on automating alignment research

Joe CarlsmithApr 30, 2025, 5:43 PM

21 points

0 comments24 min readLW link

(joecarlsmith.com)

Can we safely automate alignment research?

Joe CarlsmithApr 30, 2025, 5:37 PM

53 points

29 comments48 min readLW link

(joecarlsmith.com)

AI for AI safety

Joe CarlsmithMar 14, 2025, 3:00 PM

78 points

13 comments17 min readLW link

(joecarlsmith.substack.com)

Paths and waystations in AI safety

Joe CarlsmithMar 11, 2025, 6:52 PM

41 points

1 comment11 min readLW link

(joecarlsmith.substack.com)

When should we worry about AI power-seeking?

Joe CarlsmithFeb 19, 2025, 7:44 PM

20 points

0 comments18 min readLW link

(joecarlsmith.substack.com)

What is it to solve the alignment problem?

Joe CarlsmithFeb 13, 2025, 6:42 PM

31 points

6 comments19 min readLW link

(joecarlsmith.substack.com)

How do we solve the alignment problem?

Joe CarlsmithFeb 13, 2025, 6:27 PM

63 points

9 comments6 min readLW link

(joecarlsmith.substack.com)

Fake thinking and real thinking

Joe CarlsmithJan 28, 2025, 8:05 PM

108 points

13 comments38 min readLW link

Takes on “Alignment Faking in Large Language Models”

Joe CarlsmithDec 18, 2024, 6:22 PM

105 points

7 comments62 min readLW link

Incentive design and capability elicitation

Joe CarlsmithNov 12, 2024, 8:56 PM

31 points

0 comments12 min readLW link

Option control

Joe CarlsmithNov 4, 2024, 5:54 PM

28 points

0 comments54 min readLW link

Motivation control

Joe CarlsmithOct 30, 2024, 5:15 PM

45 points

7 comments52 min readLW link

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)

Joe CarlsmithOct 28, 2024, 9:57 PM

54 points

5 comments32 min readLW link

Video and transcript of presentation on Otherness and control in the age of AGI

Joe CarlsmithOct 8, 2024, 10:30 PM

35 points

1 comment27 min readLW link

What is it to solve the alignment problem? (Notes)

Joe CarlsmithAug 24, 2024, 9:19 PM

69 points

18 comments53 min readLW link

Value fragility and AI takeover

Joe CarlsmithAug 5, 2024, 9:28 PM

76 points

5 comments30 min readLW link

A framework for thinking about AI power-seeking

Joe CarlsmithJul 24, 2024, 10:41 PM

62 points

15 comments16 min readLW link

Loving a world you don’t trust

Joe CarlsmithJun 18, 2024, 7:31 PM

135 points

13 comments33 min readLW link

On “first critical tries” in AI alignment

Joe CarlsmithJun 5, 2024, 12:19 AM

54 points

8 comments14 min readLW link

On attunement

Joe CarlsmithMar 25, 2024, 12:47 PM

100 points

12 comments22 min readLW link