Shal­low re­view of live agen­das in al­ign­ment & safety

27 Nov 2023 11:10 UTC
332 points
69 comments29 min readLW link

So­cial Dark Matter

Duncan Sabien (Deactivated)16 Nov 2023 20:00 UTC
326 points
116 comments34 min readLW link1 review

OpenAI: The Bat­tle of the Board

Zvi22 Nov 2023 17:30 UTC
281 points
83 comments11 min readLW link
(thezvi.wordpress.com)

AI Timelines

10 Nov 2023 5:28 UTC
279 points
97 comments51 min readLW link1 review

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasper4 Nov 2023 20:08 UTC
275 points
42 comments3 min readLW link

OpenAI: Facts from a Weekend

Zvi20 Nov 2023 15:30 UTC
271 points
165 comments9 min readLW link
(thezvi.wordpress.com)

The 101 Space You Will Always Have With You

Screwtape29 Nov 2023 4:56 UTC
253 points
21 comments6 min readLW link1 review

What are the re­sults of more parental su­per­vi­sion and less out­door play?

juliawise25 Nov 2023 12:52 UTC
226 points
31 comments5 min readLW link

Abil­ity to solve long-hori­zon tasks cor­re­lates with want­ing things in the be­hav­iorist sense

So8res24 Nov 2023 17:37 UTC
195 points
84 comments5 min readLW link1 review

Pro­pa­ganda or Science: A Look at Open Source AI and Bioter­ror­ism Risk

1a3orn2 Nov 2023 18:20 UTC
193 points
79 comments23 min readLW link

Sam Alt­man fired from OpenAI

LawrenceC17 Nov 2023 20:42 UTC
192 points
75 comments1 min readLW link
(openai.com)

The other side of the tidal wave

KatjaGrace3 Nov 2023 5:40 UTC
186 points
86 comments1 min readLW link
(worldspiritsockpuppet.com)

Think­ing By The Clock

Screwtape8 Nov 2023 7:40 UTC
185 points
27 comments8 min readLW link

You can just spon­ta­neously call peo­ple you haven’t met in years

lc13 Nov 2023 5:21 UTC
165 points
21 comments1 min readLW link

Vote on In­ter­est­ing Disagreements

Ben Pace7 Nov 2023 21:35 UTC
159 points
129 comments1 min readLW link

My thoughts on the so­cial re­sponse to AI risk

Matthew Barnett1 Nov 2023 21:17 UTC
156 points
37 comments10 min readLW link

Mo­ral Real­ity Check (a short story)

jessicata26 Nov 2023 5:03 UTC
148 points
45 comments21 min readLW link1 review
(unstableontology.com)

Does davi­dad’s up­load­ing moon­shot work?

3 Nov 2023 2:21 UTC
146 points
35 comments25 min readLW link

Loudly Give Up, Don’t Quietly Fade

Screwtape13 Nov 2023 23:30 UTC
144 points
11 comments6 min readLW link

How to (hope­fully eth­i­cally) make money off of AGI

6 Nov 2023 23:35 UTC
142 points
88 comments32 min readLW link1 review

EA orgs’ le­gal struc­ture in­hibits risk tak­ing and in­for­ma­tion shar­ing on the margin

Elizabeth5 Nov 2023 19:13 UTC
136 points
17 comments4 min readLW link

In­tegrity in AI Gover­nance and Advocacy

3 Nov 2023 19:52 UTC
134 points
57 comments23 min readLW link

Apoca­lypse in­surance, and the hardline liber­tar­ian take on AI risk

So8res28 Nov 2023 2:09 UTC
133 points
40 comments7 min readLW link1 review

8 ex­am­ples in­form­ing my pes­simism on up­load­ing with­out re­verse engineering

Steven Byrnes3 Nov 2023 20:03 UTC
117 points
12 comments12 min readLW link

How much to up­date on re­cent AI gov­er­nance moves?

16 Nov 2023 23:46 UTC
112 points
5 comments29 min readLW link

Ex­pe­riences and learn­ings from both sides of the AI safety job market

Marius Hobbhahn15 Nov 2023 15:40 UTC
110 points
4 comments18 min readLW link

Stuxnet, not Skynet: Hu­man­ity’s dis­em­pow­er­ment by AI

Roko4 Nov 2023 22:23 UTC
107 points
24 comments6 min readLW link

My techno-op­ti­mism [By Vi­talik Bu­terin]

habryka27 Nov 2023 23:53 UTC
107 points
17 comments2 min readLW link
(www.lesswrong.com)

Pick­ing Men­tors For Re­search Programmes

Raymond D10 Nov 2023 13:01 UTC
106 points
8 comments4 min readLW link

New LessWrong fea­ture: Dialogue Matching

jacobjacob16 Nov 2023 21:27 UTC
106 points
22 comments3 min readLW link

One Day Sooner

Screwtape2 Nov 2023 19:00 UTC
106 points
7 comments8 min readLW link

De­cep­tion Chess: Game #1

3 Nov 2023 21:13 UTC
104 points
21 comments8 min readLW link1 review

On the Ex­ec­u­tive Order

Zvi1 Nov 2023 14:20 UTC
100 points
4 comments30 min readLW link
(thezvi.wordpress.com)

Learn­ing-the­o­retic agenda read­ing list

Vanessa Kosoy9 Nov 2023 17:25 UTC
98 points
0 comments2 min readLW link

The Soul Key

Richard_Ngo4 Nov 2023 17:51 UTC
97 points
9 comments8 min readLW link
(www.narrativeark.xyz)

Kids or No kids

Kids or no kids14 Nov 2023 18:37 UTC
95 points
10 comments13 min readLW link

Public Call for In­ter­est in Math­e­mat­i­cal Alignment

Davidmanheim22 Nov 2023 13:22 UTC
89 points
9 comments1 min readLW link

Growth and Form in a Toy Model of Superposition

8 Nov 2023 11:08 UTC
89 points
7 comments14 min readLW link

Large Lan­guage Models can Strate­gi­cally De­ceive their Users when Put Un­der Pres­sure.

ReaderM15 Nov 2023 16:36 UTC
89 points
9 comments2 min readLW link1 review
(arxiv.org)

Un­trusted smart mod­els and trusted dumb models

Buck4 Nov 2023 3:06 UTC
87 points
17 comments6 min readLW link1 review

Coup probes: Catch­ing catas­tro­phes with probes trained off-policy

Fabien Roger17 Nov 2023 17:58 UTC
85 points
9 comments11 min readLW link1 review

Dario Amodei’s pre­pared re­marks from the UK AI Safety Sum­mit, on An­thropic’s Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds1 Nov 2023 18:10 UTC
85 points
1 comment4 min readLW link
(www.anthropic.com)

Say­ing the quiet part out loud: trad­ing off x-risk for per­sonal immortality

disturbance2 Nov 2023 17:43 UTC
83 points
89 comments5 min readLW link

My Crit­i­cism of Sin­gu­lar Learn­ing Theory

Joar Skalse19 Nov 2023 15:19 UTC
83 points
56 comments12 min readLW link

Agent Boundaries Aren’t Markov Blan­kets. [Un­less they’re non-causal; see com­ments.]

abramdemski20 Nov 2023 18:23 UTC
82 points
11 comments2 min readLW link

Bostrom Goes Unheard

Zvi13 Nov 2023 14:11 UTC
81 points
9 comments18 min readLW link

New re­port: “Schem­ing AIs: Will AIs fake al­ign­ment dur­ing train­ing in or­der to get power?”

Joe Carlsmith15 Nov 2023 17:16 UTC
80 points
26 comments30 min readLW link

An­nounc­ing Athena—Women in AI Align­ment Research

Claire Short7 Nov 2023 21:46 UTC
80 points
2 comments3 min readLW link

Self-Refer­en­tial Prob­a­bil­is­tic Logic Ad­mits the Payor’s Lemma

Yudhister Kumar28 Nov 2023 10:27 UTC
80 points
14 comments6 min readLW link

Thomas Kwa’s re­search journal

23 Nov 2023 5:11 UTC
79 points
1 comment6 min readLW link