The hos­tile telepaths problem

Valentine27 Oct 2024 15:26 UTC
339 points
74 comments15 min readLW link

I got dysen­tery so you don’t have to

eukaryote22 Oct 2024 4:55 UTC
305 points
4 comments17 min readLW link
(eukaryotewritesblog.com)

Overview of strong hu­man in­tel­li­gence am­plifi­ca­tion methods

TsviBT8 Oct 2024 8:37 UTC
264 points
141 comments10 min readLW link

The Hopium Wars: the AGI En­tente Delusion

Max Tegmark13 Oct 2024 17:00 UTC
199 points
55 comments9 min readLW link

Why I’m not a Bayesian

Richard_Ngo6 Oct 2024 15:22 UTC
184 points
92 comments10 min readLW link
(www.mindthefuture.info)

Three Sub­tle Ex­am­ples of Data Leakage

abstractapplic1 Oct 2024 20:45 UTC
171 points
16 comments4 min readLW link

The Sum­moned Heroine’s Pre­dic­tion Mar­kets Keep Pro­vid­ing Fi­nan­cial Ser­vices To The De­mon King!

abstractapplic26 Oct 2024 12:34 UTC
157 points
16 comments7 min readLW link

A Rocket–In­ter­pretabil­ity Analogy

plex21 Oct 2024 13:55 UTC
149 points
31 comments1 min readLW link

Arith­metic is an un­der­rated world-mod­el­ing technology

dynomight17 Oct 2024 14:00 UTC
147 points
32 comments6 min readLW link
(dynomight.net)

Over­com­ing Bias Anthology

Arjun Panickssery20 Oct 2024 2:01 UTC
146 points
13 comments2 min readLW link
(overcoming-bias-anthology.com)

My mo­ti­va­tion and the­ory of change for work­ing in AI healthtech

Andrew_Critch12 Oct 2024 0:36 UTC
145 points
36 comments13 min readLW link

Mo­men­tum of Light in Glass

Ben9 Oct 2024 20:19 UTC
141 points
44 comments11 min readLW link

Cir­cuits in Su­per­po­si­tion: Com­press­ing many small neu­ral net­works into one

14 Oct 2024 13:06 UTC
126 points
8 comments13 min readLW link

BIG-Bench Ca­nary Con­tam­i­na­tion in GPT-4

Jozdien22 Oct 2024 15:40 UTC
123 points
13 comments4 min readLW link

Miles Brundage re­signed from OpenAI, and his AGI readi­ness team was disbanded

garrison23 Oct 2024 23:40 UTC
118 points
1 comment7 min readLW link
(garrisonlovely.substack.com)

A bird’s eye view of ARC’s research

Jacob_Hilton23 Oct 2024 15:50 UTC
116 points
12 comments7 min readLW link
(www.alignment.org)

I turned de­ci­sion the­ory prob­lems into memes about trolleys

Tapatakt30 Oct 2024 20:13 UTC
103 points
20 comments1 min readLW link

LLMs can learn about them­selves by introspection

18 Oct 2024 16:12 UTC
102 points
38 comments9 min readLW link

Ad­vice for journalists

Nathan Young7 Oct 2024 16:46 UTC
100 points
53 comments9 min readLW link
(nathanpmyoung.substack.com)

Be­hav­ioral red-team­ing is un­likely to pro­duce clear, strong ev­i­dence that mod­els aren’t scheming

Buck10 Oct 2024 13:36 UTC
100 points
4 comments13 min readLW link

The case for un­learn­ing that re­moves in­for­ma­tion from LLM weights

Fabien Roger14 Oct 2024 14:08 UTC
96 points
14 comments6 min readLW link

Sab­o­tage Eval­u­a­tions for Fron­tier Models

18 Oct 2024 22:33 UTC
93 points
55 comments6 min readLW link
(assets.anthropic.com)

Finish­ing The SB-1047 Doc­u­men­tary In 6 Weeks

Michaël Trazzi28 Oct 2024 20:17 UTC
93 points
5 comments4 min readLW link
(manifund.org)

In­for­ma­tion vs Assurance

johnswentworth20 Oct 2024 23:16 UTC
93 points
7 comments2 min readLW link

Catas­trophic sab­o­tage as a ma­jor threat model for hu­man-level AI systems

evhub22 Oct 2024 20:57 UTC
90 points
8 comments15 min readLW link

Three No­tions of “Power”

johnswentworth30 Oct 2024 6:10 UTC
88 points
43 comments4 min readLW link

Re­search up­date: Towards a Law of Iter­ated Ex­pec­ta­tions for Heuris­tic Estimators

Eric Neyman7 Oct 2024 19:29 UTC
87 points
2 comments22 min readLW link

Self-Help Corner: Loop Detection

adamShimi2 Oct 2024 8:33 UTC
87 points
6 comments2 min readLW link
(formethods.substack.com)

There is a globe in your LLM

jacob_drori8 Oct 2024 0:43 UTC
86 points
4 comments1 min readLW link

5 home­grown EA pro­jects, seek­ing small donors

Austin Chen28 Oct 2024 23:24 UTC
85 points
4 comments1 min readLW link

Self-pre­dic­tion acts as an emer­gent regularizer

23 Oct 2024 22:27 UTC
84 points
4 comments4 min readLW link

New­som Ve­toes SB 1047

Zvi1 Oct 2024 12:20 UTC
84 points
6 comments32 min readLW link
(thezvi.wordpress.com)

Values Are Real Like Harry Potter

9 Oct 2024 23:42 UTC
81 points
17 comments5 min readLW link

Scaf­fold­ing for “Notic­ing Me­tacog­ni­tion”

Raemon9 Oct 2024 17:54 UTC
78 points
4 comments17 min readLW link

Bit­ter les­sons about lu­cid dreaming

avturchin16 Oct 2024 21:27 UTC
76 points
62 comments2 min readLW link

What is malev­olence? On the na­ture, mea­sure­ment, and dis­tri­bu­tion of dark traits

23 Oct 2024 8:41 UTC
76 points
15 comments1 min readLW link

Gw­ern: Why So Few Matt Lev­ines?

kave29 Oct 2024 1:07 UTC
76 points
10 comments1 min readLW link
(gwern.net)

Why I quit effec­tive al­tru­ism, and why Ti­mothy Tel­leen-Law­ton is stay­ing (for now)

Elizabeth22 Oct 2024 18:20 UTC
75 points
78 comments1 min readLW link
(acesounderglass.com)

Ra­tion­al­ity Quotes—Fall 2024

Screwtape10 Oct 2024 18:37 UTC
75 points
22 comments1 min readLW link

Could ran­domly choos­ing peo­ple to serve as rep­re­sen­ta­tives lead to bet­ter gov­ern­ment?

John Huang21 Oct 2024 17:10 UTC
75 points
13 comments10 min readLW link

Video lec­tures on the learn­ing-the­o­retic agenda

Vanessa Kosoy27 Oct 2024 12:01 UTC
75 points
0 comments1 min readLW link
(www.youtube.com)

In­tro­duc­ing Transluce — A Let­ter from the Founders

jsteinhardt23 Oct 2024 18:10 UTC
74 points
2 comments3 min readLW link
(bounded-regret.ghost.io)

[Question] In­ter­est in Leet­code, but for Ra­tion­al­ity?

Gregory 16 Oct 2024 17:54 UTC
73 points
20 comments2 min readLW link

A Nar­row Path: a plan to deal with AI ex­tinc­tion risk

7 Oct 2024 13:02 UTC
73 points
11 comments2 min readLW link
(www.narrowpath.co)

Joshua Achiam Public State­ment Analysis

Zvi10 Oct 2024 12:50 UTC
73 points
14 comments21 min readLW link
(thezvi.wordpress.com)

The Mask Comes Off: At What Price?

Zvi21 Oct 2024 23:50 UTC
71 points
16 comments8 min readLW link
(thezvi.wordpress.com)

Au­toma­tion collapse

21 Oct 2024 14:50 UTC
70 points
10 comments7 min readLW link

If far-UV is so great, why isn’t it ev­ery­where?

Austin Chen19 Oct 2024 18:56 UTC
70 points
23 comments1 min readLW link
(strainhardening.substack.com)

EIS XIV: Is mechanis­tic in­ter­pretabil­ity about to be prac­ti­cally use­ful?

scasper11 Oct 2024 22:13 UTC
67 points
4 comments7 min readLW link

On Shifgrethor

JustisMills27 Oct 2024 15:30 UTC
66 points
18 comments2 min readLW link
(justismills.substack.com)