peterbarnett

Karma: 2,801

Researcher at MIRI

EA and AI safety

https://peterbarnett.org/

AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions

peterbarnett and Aaron_Scher

May 1, 2025, 10:46 PM

105 points

7 comments8 min readLW link

(techgov.intelligence.org)

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI

Jeremy Gillen and peterbarnett

Jan 26, 2024, 7:22 AM

161 points

60 comments57 min readLW link

Trying to align humans with inclusive genetic fitness

peterbarnettJan 11, 2024, 12:13 AM

23 points

5 comments10 min readLW link

Labs should be explicit about why they are building AGI

peterbarnettOct 17, 2023, 9:09 PM

214 points

18 comments1 min readLW link 1 review

Thomas Kwa’s MIRI research experience

Thomas Kwa, peterbarnett, Vivek Hebbar, Jeremy Gillen, Bird Concept and Raemon

Oct 2, 2023, 4:42 PM

173 points

53 comments1 min readLW link

Doing oversight from the very start of training seems hard

peterbarnettSep 20, 2022, 5:21 PM

14 points

3 comments3 min readLW link

Confusions in My Model of AI Risk

peterbarnettJul 7, 2022, 1:05 AM

22 points

9 comments5 min readLW link

Scott Aaronson is joining OpenAI to work on AI safety

peterbarnettJun 18, 2022, 4:06 AM

117 points

31 comments1 min readLW link

(scottaaronson.blog)

A Story of AI Risk: InstructGPT-N

peterbarnettMay 26, 2022, 11:22 PM

24 points

0 comments8 min readLW link

Why I’m Worried About AI

peterbarnettMay 23, 2022, 9:13 PM

22 points

2 comments12 min readLW link

Framings of Deceptive Alignment

peterbarnettApr 26, 2022, 4:25 AM

32 points

7 comments5 min readLW link

How to become an AI safety researcher

peterbarnettApr 15, 2022, 11:41 AM

25 points

0 comments14 min readLW link

Thoughts on Dangerous Learned Optimization

peterbarnettFeb 19, 2022, 10:46 AM

4 points

2 comments4 min readLW link

peterbarnett’s Shortform

peterbarnettFeb 16, 2022, 5:24 PM

3 points

28 comments LW link

Alignment Problems All the Way Down

peterbarnettJan 22, 2022, 12:19 AM

29 points

7 comments11 min readLW link

[Question] What questions do you have about doing work on AI safety?

peterbarnettDec 21, 2021, 4:36 PM

13 points

8 comments1 min readLW link

Some motivations to gradient hack

peterbarnettDec 17, 2021, 3:06 AM

8 points

0 comments6 min readLW link

Understanding Gradient Hacking

peterbarnettDec 10, 2021, 3:58 PM

41 points

5 comments30 min readLW link

When Should the Fire Alarm Go Off: A model for optimal thresholds

peterbarnettApr 28, 2021, 12:27 PM

40 points

4 comments5 min readLW link

(peterbarnett.org)

Does making unsteady incremental progress work?

peterbarnettMar 5, 2021, 7:23 AM

8 points

4 comments1 min readLW link

(peterbarnett.org)