The Talk: a brief ex­pla­na­tion of sex­ual dimorphism

MalmesburySep 18, 2023, 4:23 PM
520 points
75 comments16 min readLW link3 reviews

In­side Views, Im­pos­tor Syn­drome, and the Great LARP

johnswentworthSep 25, 2023, 4:08 PM
335 points
53 comments5 min readLW link

Shar­ing In­for­ma­tion About Nonlinear

Ben PaceSep 7, 2023, 6:51 AM
323 points
323 comments34 min readLW link

EA Ve­gan Ad­vo­cacy is not truth­seek­ing, and it’s ev­ery­one’s problem

ElizabethSep 28, 2023, 11:30 PM
317 points
250 comments22 min readLW link2 reviews
(acesounderglass.com)

Sum-thresh­old attacks

TsviBTSep 8, 2023, 5:13 PM
238 points
55 comments10 min readLW link
(tsvibt.blogspot.com)

What I would do if I wasn’t at ARC Evals

LawrenceCSep 5, 2023, 7:19 PM
220 points
10 comments13 min readLW link1 review

UDT shows that de­ci­sion the­ory is more puz­zling than ever

Wei DaiSep 13, 2023, 12:26 PM
218 points
56 comments1 min readLW link

AI pres­i­dents dis­cuss AI al­ign­ment agendas

Sep 9, 2023, 6:55 PM
217 points
23 comments1 min readLW link
(www.youtube.com)

The King and the Golem

Richard_NgoSep 25, 2023, 7:51 PM
190 points
19 comments5 min readLW link1 review
(narrativeark.substack.com)

A Golden Age of Build­ing? Ex­cerpts and les­sons from Em­pire State, Pen­tagon, Skunk Works and SpaceX

Bird ConceptSep 1, 2023, 4:03 AM
188 points
26 comments24 min readLW link1 review

How to Catch an AI Liar: Lie De­tec­tion in Black-Box LLMs by Ask­ing Un­re­lated Questions

Sep 28, 2023, 6:53 PM
187 points
39 comments3 min readLW link1 review

There should be more AI safety orgs

Marius HobbhahnSep 21, 2023, 2:53 PM
181 points
25 comments17 min readLW link

De­fund­ing My Mistake

ymeskhoutSep 4, 2023, 2:43 PM
175 points
41 comments6 min readLW link

Meta Ques­tions about Metaphilosophy

Wei DaiSep 1, 2023, 1:17 AM
161 points
80 comments3 min readLW link

“Di­a­mon­doid bac­te­ria” nanobots: deadly threat or dead-end? A nan­otech in­ves­ti­ga­tion

titotalSep 29, 2023, 2:01 PM
160 points
79 commentsLW link
(titotal.substack.com)

Sparse Au­toen­coders Find Highly In­ter­pretable Direc­tions in Lan­guage Models

Sep 21, 2023, 3:30 PM
159 points
8 comments5 min readLW link

Co­hab­itive Games so Far

mako yassSep 28, 2023, 3:41 PM
131 points
146 comments19 min readLW link2 reviews
(makopool.com)

One Minute Every Moment

abramdemskiSep 1, 2023, 8:23 PM
125 points
23 comments3 min readLW link

The small­est pos­si­ble but­ton (or: moth traps!)

Neil Sep 2, 2023, 3:24 PM
122 points
18 comments3 min readLW link
(neilwarren.substack.com)

Paper: LLMs trained on “A is B” fail to learn “B is A”

Sep 23, 2023, 7:55 PM
121 points
74 comments4 min readLW link
(arxiv.org)

Mak­ing AIs less likely to be spiteful

Sep 26, 2023, 2:12 PM
118 points
7 comments10 min readLW link

In­ter­pret­ing OpenAI’s Whisper

EllenaRSep 24, 2023, 5:53 PM
116 points
13 comments7 min readLW link

“X dis­tracts from Y” as a thinly-dis­guised fight over group sta­tus /​ politics

Steven ByrnesSep 25, 2023, 3:18 PM
112 points
14 comments8 min readLW link

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

Sep 4, 2023, 12:54 PM
109 points
16 comments5 min readLW link
(arxiv.org)

Ac­tAdd: Steer­ing Lan­guage Models with­out Optimization

Sep 6, 2023, 5:21 PM
105 points
3 comments2 min readLW link
(arxiv.org)

PSA: The com­mu­nity is in Berkeley/​Oak­land, not “the Bay Area”

maiaSep 11, 2023, 3:59 PM
104 points
7 comments1 min readLW link

Re­pro­duc­ing ARC Evals’ re­cent re­port on lan­guage model agents

Thomas BroadleySep 1, 2023, 4:52 PM
104 points
17 comments3 min readLW link
(thomasbroadley.com)

Ex­plain­ing grokking through cir­cuit efficiency

Sep 8, 2023, 2:39 PM
101 points
11 comments3 min readLW link
(arxiv.org)

Would You Work Harder In The Least Con­ve­nient Pos­si­ble World?

FirinnSep 22, 2023, 5:17 AM
99 points
100 comments9 min readLW link2 reviews

Clos­ing Notes on Non­lin­ear Investigation

Ben PaceSep 15, 2023, 10:44 PM
97 points
47 comments11 min readLW link

Atoms to Agents Proto-Lectures

johnswentworthSep 22, 2023, 6:22 AM
96 points
14 comments2 min readLW link
(www.youtube.com)

An­nounc­ing FAR Labs, an AI safety cowork­ing space

Ben GoldhaberSep 29, 2023, 4:52 PM
95 points
0 comments1 min readLW link

Log­i­cal Share Splitting

DaemonicSigilSep 11, 2023, 4:08 AM
93 points
16 comments9 min readLW link
(pbement.com)

I com­piled a ebook of `Pro­ject Lawful` for eBook readers

OrwellGoesShoppingSep 15, 2023, 6:09 PM
90 points
4 comments1 min readLW link
(www.mikescher.com)

AI #31: It Can Do What Now?

ZviSep 28, 2023, 4:00 PM
90 points
6 comments40 min readLW link
(thezvi.wordpress.com)

Bench­marks for De­tect­ing Mea­sure­ment Tam­per­ing [Red­wood Re­search]

Sep 5, 2023, 4:44 PM
87 points
22 comments20 min readLW link1 review
(arxiv.org)

High­lights: Went­worth, Shah, and Mur­phy on “Re­tar­get­ing the Search”

RobertMSep 14, 2023, 2:18 AM
87 points
4 comments8 min readLW link

An­thropic’s Re­spon­si­ble Scal­ing Policy & Long-Term Benefit Trust

Zac Hatfield-DoddsSep 19, 2023, 3:09 PM
83 points
26 comments3 min readLW link1 review
(www.anthropic.com)

[Question] How have you be­come more hard-work­ing?

Chi NguyenSep 25, 2023, 12:37 PM
82 points
42 commentsLW link

Me­mory band­width con­straints im­ply economies of scale in AI inference

Ege ErdilSep 17, 2023, 2:01 PM
79 points
34 comments4 min readLW link

Nav­i­gat­ing an ecosys­tem that might or might not be bad for the world

Sep 15, 2023, 11:58 PM
79 points
20 comments1 min readLW link

Find Hot French Food Near Me: A Fol­low-up

aphyerSep 6, 2023, 12:32 PM
75 points
19 comments2 min readLW link

Luck based medicine: an­gry el­dritch sugar gods edition

ElizabethSep 19, 2023, 4:40 AM
75 points
14 comments9 min readLW link
(acesounderglass.com)

Text Posts from the Kids Group: 2023 I

jefftkSep 5, 2023, 2:00 AM
75 points
3 comments7 min readLW link
(www.jefftk.com)

AI #30: Dalle-3 and GPT-3.5-In­struct-Turbo

ZviSep 21, 2023, 12:00 PM
75 points
8 comments47 min readLW link
(thezvi.wordpress.com)

[Question] How to talk about rea­sons why AGI might not be near?

Kaj_SotalaSep 17, 2023, 8:18 AM
73 points
19 comments2 min readLW link

High-level in­ter­pretabil­ity: de­tect­ing an AI’s objectives

Sep 28, 2023, 7:30 PM
72 points
4 comments21 min readLW link

A quick up­date from Nonlinear

KatWoodsSep 7, 2023, 9:28 PM
72 points
23 comments2 min readLW link

In­fluence func­tions—why, what and how

Nina PanicksserySep 15, 2023, 8:42 PM
71 points
6 comments8 min readLW link

Have At­ten­tion Spans Been De­clin­ing?

niplavSep 8, 2023, 2:11 PM
71 points
22 comments17 min readLW link1 review