The hos­tile telepaths problem

ValentineOct 27, 2024, 3:26 PM
382 points
89 comments15 min readLW link

I got dysen­tery so you don’t have to

eukaryoteOct 22, 2024, 4:55 AM
320 points
6 comments17 min readLW link
(eukaryotewritesblog.com)

Overview of strong hu­man in­tel­li­gence am­plifi­ca­tion methods

TsviBTOct 8, 2024, 8:37 AM
279 points
144 comments10 min readLW link

The Hopium Wars: the AGI En­tente Delusion

Max TegmarkOct 13, 2024, 5:00 PM
222 points
60 comments9 min readLW link

Why I’m not a Bayesian

Richard_NgoOct 6, 2024, 3:22 PM
209 points
101 comments10 min readLW link
(www.mindthefuture.info)

In­for­ma­tion vs Assurance

johnswentworthOct 20, 2024, 11:16 PM
187 points
17 comments2 min readLW link

My mo­ti­va­tion and the­ory of change for work­ing in AI healthtech

Andrew_CritchOct 12, 2024, 12:36 AM
178 points
37 comments14 min readLW link

Three Sub­tle Ex­am­ples of Data Leakage

abstractapplicOct 1, 2024, 8:45 PM
172 points
16 comments4 min readLW link

Over­com­ing Bias Anthology

Arjun PanicksseryOct 20, 2024, 2:01 AM
169 points
14 comments2 min readLW link
(overcoming-bias-anthology.com)

The Sum­moned Heroine’s Pre­dic­tion Mar­kets Keep Pro­vid­ing Fi­nan­cial Ser­vices To The De­mon King!

abstractapplicOct 26, 2024, 12:34 PM
164 points
16 comments7 min readLW link

A Rocket–In­ter­pretabil­ity Analogy

plexOct 21, 2024, 1:55 PM
155 points
31 comments1 min readLW link

Arith­metic is an un­der­rated world-mod­el­ing technology

dynomightOct 17, 2024, 2:00 PM
152 points
33 comments6 min readLW link
(dynomight.net)

Mo­men­tum of Light in Glass

BenOct 9, 2024, 8:19 PM
143 points
44 comments11 min readLW link

Cir­cuits in Su­per­po­si­tion: Com­press­ing many small neu­ral net­works into one

Oct 14, 2024, 1:06 PM
130 points
9 comments13 min readLW link

BIG-Bench Ca­nary Con­tam­i­na­tion in GPT-4

JozdienOct 22, 2024, 3:40 PM
123 points
14 comments4 min readLW link

A bird’s eye view of ARC’s research

Jacob_HiltonOct 23, 2024, 3:50 PM
119 points
12 comments7 min readLW link
(www.alignment.org)

Miles Brundage re­signed from OpenAI, and his AGI readi­ness team was disbanded

garrisonOct 23, 2024, 11:40 PM
118 points
1 comment7 min readLW link
(garrisonlovely.substack.com)

I turned de­ci­sion the­ory prob­lems into memes about trolleys

TapataktOct 30, 2024, 8:13 PM
104 points
23 comments1 min readLW link

LLMs can learn about them­selves by introspection

Oct 18, 2024, 4:12 PM
102 points
38 comments9 min readLW link

Ad­vice for journalists

Nathan YoungOct 7, 2024, 4:46 PM
100 points
53 comments9 min readLW link
(nathanpmyoung.substack.com)

Be­hav­ioral red-team­ing is un­likely to pro­duce clear, strong ev­i­dence that mod­els aren’t scheming

BuckOct 10, 2024, 1:36 PM
100 points
4 comments13 min readLW link

The case for un­learn­ing that re­moves in­for­ma­tion from LLM weights

Fabien RogerOct 14, 2024, 2:08 PM
96 points
18 comments6 min readLW link

Sab­o­tage Eval­u­a­tions for Fron­tier Models

Oct 18, 2024, 10:33 PM
95 points
56 comments6 min readLW link
(assets.anthropic.com)

Finish­ing The SB-1047 Doc­u­men­tary In 6 Weeks

Michaël TrazziOct 28, 2024, 8:17 PM
94 points
7 comments4 min readLW link
(manifund.org)

What is malev­olence? On the na­ture, mea­sure­ment, and dis­tri­bu­tion of dark traits

Oct 23, 2024, 8:41 AM
93 points
23 commentsLW link

Catas­trophic sab­o­tage as a ma­jor threat model for hu­man-level AI systems

evhubOct 22, 2024, 8:57 PM
92 points
13 comments15 min readLW link

Self-pre­dic­tion acts as an emer­gent regularizer

Oct 23, 2024, 10:27 PM
91 points
9 comments4 min readLW link

Three No­tions of “Power”

johnswentworthOct 30, 2024, 6:10 AM
89 points
44 comments4 min readLW link

There is a globe in your LLM

jacob_droriOct 8, 2024, 12:43 AM
89 points
4 comments1 min readLW link

Scaf­fold­ing for “Notic­ing Me­tacog­ni­tion”

RaemonOct 9, 2024, 5:54 PM
88 points
4 comments17 min readLW link

Self-Help Corner: Loop Detection

adamShimiOct 2, 2024, 8:33 AM
88 points
6 comments2 min readLW link
(formethods.substack.com)

Re­search up­date: Towards a Law of Iter­ated Ex­pec­ta­tions for Heuris­tic Estimators

Eric NeymanOct 7, 2024, 7:29 PM
87 points
2 comments22 min readLW link

5 home­grown EA pro­jects, seek­ing small donors

Austin ChenOct 28, 2024, 11:24 PM
85 points
4 commentsLW link

Values Are Real Like Harry Potter

Oct 9, 2024, 11:42 PM
85 points
21 comments5 min readLW link

New­som Ve­toes SB 1047

ZviOct 1, 2024, 12:20 PM
84 points
6 comments32 min readLW link
(thezvi.wordpress.com)

[In­tu­itive self-mod­els] 4. Trance

Steven ByrnesOct 8, 2024, 1:30 PM
82 points
7 comments24 min readLW link

Ra­tion­al­ity Quotes—Fall 2024

ScrewtapeOct 10, 2024, 6:37 PM
79 points
27 comments1 min readLW link

Gw­ern: Why So Few Matt Lev­ines?

kaveOct 29, 2024, 1:07 AM
78 points
10 comments1 min readLW link
(gwern.net)

[In­tu­itive self-mod­els] 3. The Homunculus

Steven ByrnesOct 2, 2024, 3:20 PM
78 points
38 comments25 min readLW link

Bit­ter les­sons about lu­cid dreaming

avturchinOct 16, 2024, 9:27 PM
77 points
62 comments2 min readLW link

Why I quit effec­tive al­tru­ism, and why Ti­mothy Tel­leen-Law­ton is stay­ing (for now)

ElizabethOct 22, 2024, 6:20 PM
76 points
82 comments1 min readLW link
(acesounderglass.com)

Brief anal­y­sis of OP Tech­ni­cal AI Safety Funding

22tomOct 25, 2024, 7:37 PM
76 points
5 comments1 min readLW link

Video lec­tures on the learn­ing-the­o­retic agenda

Vanessa KosoyOct 27, 2024, 12:01 PM
75 points
0 comments1 min readLW link
(www.youtube.com)

Could ran­domly choos­ing peo­ple to serve as rep­re­sen­ta­tives lead to bet­ter gov­ern­ment?

John HuangOct 21, 2024, 5:10 PM
75 points
13 comments10 min readLW link

In­tro­duc­ing Transluce — A Let­ter from the Founders

jsteinhardtOct 23, 2024, 6:10 PM
74 points
3 comments3 min readLW link
(bounded-regret.ghost.io)

[Question] In­ter­est in Leet­code, but for Ra­tion­al­ity?

Gregory Oct 16, 2024, 5:54 PM
74 points
20 comments2 min readLW link

A Nar­row Path: a plan to deal with AI ex­tinc­tion risk

Oct 7, 2024, 1:02 PM
73 points
12 comments2 min readLW link
(www.narrowpath.co)

Joshua Achiam Public State­ment Analysis

ZviOct 10, 2024, 12:50 PM
73 points
14 comments21 min readLW link
(thezvi.wordpress.com)

The Mask Comes Off: At What Price?

ZviOct 21, 2024, 11:50 PM
72 points
16 comments8 min readLW link
(thezvi.wordpress.com)

Au­toma­tion collapse

Oct 21, 2024, 2:50 PM
72 points
9 comments7 min readLW link