OpenAI: Facts from a Weekend

ZviNov 20, 2023, 3:30 PM
271 points
165 comments9 min readLW link
(thezvi.wordpress.com)

Dear Self; we need to talk about ambition

ElizabethAug 27, 2023, 11:10 PM
270 points
28 comments8 min readLW link2 reviews
(acesounderglass.com)

The Base Rate Times, news through pre­dic­tion markets

vandemonianJun 6, 2023, 5:42 PM
268 points
41 comments4 min readLW link1 review

Paus­ing AI Devel­op­ments Isn’t Enough. We Need to Shut it All Down

Eliezer YudkowskyApr 8, 2023, 12:36 AM
268 points
44 comments12 min readLW link1 review

My May 2023 pri­ori­ties for AI x-safety: more em­pa­thy, more unifi­ca­tion of con­cerns, and less vil­ifi­ca­tion of OpenAI

Andrew_CritchMay 24, 2023, 12:02 AM
268 points
39 comments8 min readLW link

Dis­cus­sion with Nate Soares on a key al­ign­ment difficulty

HoldenKarnofskyMar 13, 2023, 9:20 PM
265 points
43 comments22 min readLW link1 review

Con­stel­la­tions are Younger than Continents

Jeffrey HeningerDec 19, 2023, 6:12 AM
263 points
21 comments2 min readLW link

[SEE NEW EDITS] No, *You* Need to Write Clearer

Nicholas / Heather KrossApr 29, 2023, 5:04 AM
262 points
65 comments5 min readLW link
(www.thinkingmuchbetter.com)

“Care­fully Boot­strapped Align­ment” is or­ga­ni­za­tion­ally hard

RaemonMar 17, 2023, 6:00 PM
262 points
23 comments11 min readLW link1 review

UFO Bet­ting: Put Up or Shut Up

RatsWrongAboutUAPJun 13, 2023, 4:05 AM
259 points
216 comments2 min readLW link1 review

Align­ment Im­pli­ca­tions of LLM Suc­cesses: a De­bate in One Act

Zack_M_DavisOct 21, 2023, 3:22 PM
258 points
55 comments13 min readLW link2 reviews

My Model Of EA Burnout

LoganStrohlJan 25, 2023, 5:52 PM
258 points
50 comments5 min readLW link1 review

Men­tal Health and the Align­ment Prob­lem: A Com­pila­tion of Re­sources (up­dated April 2023)

May 10, 2023, 7:04 PM
256 points
54 comments21 min readLW link

Thoughts on the im­pact of RLHF research

paulfchristianoJan 25, 2023, 5:23 PM
253 points
102 comments9 min readLW link

You Don’t Ex­ist, Duncan

Duncan Sabien (Deactivated)Feb 2, 2023, 8:37 AM
252 points
107 comments9 min readLW link

Deep Deceptiveness

So8resMar 21, 2023, 2:51 AM
251 points
60 comments14 min readLW link1 review

My Assess­ment of the Chi­nese AI Safety Community

Lao MeinApr 25, 2023, 4:21 AM
250 points
94 comments3 min readLW link

My views on “doom”

paulfchristianoApr 27, 2023, 5:50 PM
250 points
37 comments2 min readLW link1 review
(ai-alignment.com)

Yes, It’s Sub­jec­tive, But Why All The Crabs?

johnswentworthJul 28, 2023, 7:35 PM
250 points
15 comments6 min readLW link

I hired 5 peo­ple to sit be­hind me and make me pro­duc­tive for a month

Simon BerensFeb 5, 2023, 1:19 AM
249 points
83 comments10 min readLW link
(www.simonberens.com)

On AutoGPT

ZviApr 13, 2023, 12:30 PM
248 points
47 comments20 min readLW link
(thezvi.wordpress.com)

Les­sons On How To Get Things Right On The First Try

Jun 19, 2023, 11:58 PM
245 points
57 comments10 min readLW link1 review

Munk AI de­bate: con­fu­sions and pos­si­ble cruxes

Steven ByrnesJun 27, 2023, 2:18 PM
244 points
21 comments8 min readLW link

Book Re­view: Go­ing Infinite

ZviOct 24, 2023, 3:00 PM
242 points
113 comments97 min readLW link1 review
(thezvi.wordpress.com)

Nat­u­ral Ab­strac­tions: Key claims, The­o­rems, and Critiques

Mar 16, 2023, 4:37 PM
241 points
23 comments45 min readLW link3 reviews

Sum-thresh­old attacks

TsviBTSep 8, 2023, 5:13 PM
238 points
55 comments10 min readLW link
(tsvibt.blogspot.com)

Self-driv­ing car bets

paulfchristianoJul 29, 2023, 6:10 PM
236 points
44 comments5 min readLW link
(sideways-view.com)

AI Con­trol: Im­prov­ing Safety De­spite In­ten­tional Subversion

Dec 13, 2023, 3:51 PM
236 points
24 comments10 min readLW link4 reviews

Cul­ti­vat­ing a state of mind where new ideas are born

Henrik KarlssonJul 27, 2023, 9:16 AM
235 points
21 comments14 min readLW link2 reviews
(www.henrikkarlsson.xyz)

More in­for­ma­tion about the dan­ger­ous ca­pa­bil­ity eval­u­a­tions we did with GPT-4 and Claude.

Beth BarnesMar 19, 2023, 12:25 AM
233 points
54 comments8 min readLW link
(evals.alignment.org)

Policy dis­cus­sions fol­low strong con­tex­tu­al­iz­ing norms

Richard_NgoApr 1, 2023, 11:51 PM
230 points
61 comments3 min readLW link

What are the re­sults of more parental su­per­vi­sion and less out­door play?

juliawiseNov 25, 2023, 12:52 PM
228 points
31 comments5 min readLW link

AGI in sight: our look at the game board

Feb 18, 2023, 10:17 PM
227 points
135 comments6 min readLW link
(andreamiotti.substack.com)

Ways I Ex­pect AI Reg­u­la­tion To In­crease Ex­tinc­tion Risk

1a3ornJul 4, 2023, 5:32 PM
225 points
32 comments7 min readLW link

Ele­ments of Ra­tion­al­ist Discourse

Rob BensingerFeb 12, 2023, 7:58 AM
224 points
49 comments3 min readLW link1 review

Re­cur­sive Mid­dle Man­ager Hell

RaemonJan 1, 2023, 4:33 AM
224 points
46 comments11 min readLW link1 review

An­nounc­ing MIRI’s new CEO and lead­er­ship team

Gretta DulebaOct 10, 2023, 7:22 PM
222 points
52 comments3 min readLW link

Thoughts on re­spon­si­ble scal­ing poli­cies and regulation

paulfchristianoOct 24, 2023, 10:21 PM
221 points
33 comments6 min readLW link

Catch­ing the Eye of Sauron

Casey B.Apr 7, 2023, 12:40 AM
221 points
68 comments4 min readLW link

What I would do if I wasn’t at ARC Evals

LawrenceCSep 5, 2023, 7:19 PM
220 points
10 comments13 min readLW link1 review

UDT shows that de­ci­sion the­ory is more puz­zling than ever

Wei DaiSep 13, 2023, 12:26 PM
218 points
56 comments1 min readLW link

AI pres­i­dents dis­cuss AI al­ign­ment agendas

Sep 9, 2023, 6:55 PM
217 points
23 comments1 min readLW link
(www.youtube.com)

Orthog­o­nal: A new agent foun­da­tions al­ign­ment organization

Tamsin LeakeApr 19, 2023, 8:17 PM
217 points
4 comments1 min readLW link
(orxl.org)

Ene­mies vs Malefactors

So8resFeb 28, 2023, 11:38 PM
217 points
69 commentsLW link4 reviews

An­nounc­ing Apollo Research

May 30, 2023, 4:17 PM
217 points
11 comments8 min readLW link

Eliezer Yud­kowsky’s Let­ter in Time Magazine

ZviApr 5, 2023, 6:00 PM
214 points
86 comments14 min readLW link
(thezvi.wordpress.com)

Up­dates and Reflec­tions on Op­ti­mal Ex­er­cise af­ter Nearly a Decade

romeostevensitJun 8, 2023, 11:02 PM
213 points
57 comments2 min readLW link1 review

An AI risk ar­gu­ment that res­onates with NYTimes readers

Julian BradshawMar 12, 2023, 11:09 PM
212 points
14 comments1 min readLW link

Con­scious­ness as a con­fla­tion­ary al­li­ance term for in­trin­si­cally val­ued in­ter­nal experiences

Andrew_CritchJul 10, 2023, 8:09 AM
212 points
54 comments11 min readLW link2 reviews

Ac­tu­ally, Othello-GPT Has A Lin­ear Emer­gent World Representation

Neel NandaMar 29, 2023, 10:13 PM
211 points
26 comments19 min readLW link
(neelnanda.io)