RSS

peterbarnett

Karma: 2,554

Researcher at MIRI

EA and AI safety

https://​​peterbarnett.org/​​

Without fun­da­men­tal ad­vances, mis­al­ign­ment and catas­tro­phe are the de­fault out­comes of train­ing pow­er­ful AI

26 Jan 2024 7:22 UTC
160 points
60 comments57 min readLW link

Try­ing to al­ign hu­mans with in­clu­sive ge­netic fitness

peterbarnett11 Jan 2024 0:13 UTC
23 points
5 comments10 min readLW link

Labs should be ex­plicit about why they are build­ing AGI

peterbarnett17 Oct 2023 21:09 UTC
195 points
17 comments1 min readLW link

Thomas Kwa’s MIRI re­search experience

2 Oct 2023 16:42 UTC
172 points
53 comments1 min readLW link

Do­ing over­sight from the very start of train­ing seems hard

peterbarnett20 Sep 2022 17:21 UTC
14 points
3 comments3 min readLW link

Con­fu­sions in My Model of AI Risk

peterbarnett7 Jul 2022 1:05 UTC
22 points
9 comments5 min readLW link

Scott Aaron­son is join­ing OpenAI to work on AI safety

peterbarnett18 Jun 2022 4:06 UTC
117 points
31 comments1 min readLW link
(scottaaronson.blog)

A Story of AI Risk: In­struc­tGPT-N

peterbarnett26 May 2022 23:22 UTC
24 points
0 comments8 min readLW link

Why I’m Wor­ried About AI

peterbarnett23 May 2022 21:13 UTC
22 points
2 comments12 min readLW link

Fram­ings of De­cep­tive Alignment

peterbarnett26 Apr 2022 4:25 UTC
32 points
7 comments5 min readLW link

How to be­come an AI safety researcher

peterbarnett15 Apr 2022 11:41 UTC
23 points
0 comments14 min readLW link

Thoughts on Danger­ous Learned Optimization

peterbarnett19 Feb 2022 10:46 UTC
4 points
2 comments4 min readLW link

pe­ter­bar­nett’s Shortform

peterbarnett16 Feb 2022 17:24 UTC
3 points
27 comments1 min readLW link

Align­ment Prob­lems All the Way Down

peterbarnett22 Jan 2022 0:19 UTC
29 points
7 comments11 min readLW link

[Question] What ques­tions do you have about do­ing work on AI safety?

peterbarnett21 Dec 2021 16:36 UTC
13 points
8 comments1 min readLW link

Some mo­ti­va­tions to gra­di­ent hack

peterbarnett17 Dec 2021 3:06 UTC
8 points
0 comments6 min readLW link

Un­der­stand­ing Gra­di­ent Hacking

peterbarnett10 Dec 2021 15:58 UTC
41 points
5 comments30 min readLW link

When Should the Fire Alarm Go Off: A model for op­ti­mal thresholds

peterbarnett28 Apr 2021 12:27 UTC
40 points
4 comments5 min readLW link
(peterbarnett.org)

Does mak­ing un­steady in­cre­men­tal progress work?

peterbarnett5 Mar 2021 7:23 UTC
8 points
4 comments1 min readLW link
(peterbarnett.org)

Sum­mary of AI Re­search Con­sid­er­a­tions for Hu­man Ex­is­ten­tial Safety (ARCHES)

peterbarnett9 Dec 2020 23:28 UTC
11 points
0 comments13 min readLW link