My AI Model Delta Com­pared To Christiano

johnswentworthJun 12, 2024, 6:19 PM
191 points
73 comments4 min readLW link

What’s Go­ing on With OpenAI’s Mes­sag­ing?

ozziegooenMay 21, 2024, 2:22 AM
191 points
13 commentsLW link

Two easy things that maybe Just Work to im­prove AI discourse

Bird ConceptJun 8, 2024, 3:51 PM
190 points
35 comments2 min readLW link

OMMC An­nounces RIP

Apr 1, 2024, 11:20 PM
189 points
5 comments2 min readLW link

A ba­sic sys­tems ar­chi­tec­ture for AI agents that do au­tonomous research

BuckSep 23, 2024, 1:58 PM
189 points
16 comments8 min readLW link

My In­ter­view With Cade Metz on His Re­port­ing About Slate Star Codex

Zack_M_DavisMar 26, 2024, 5:18 PM
189 points
187 comments6 min readLW link

Shal­low re­view of tech­ni­cal AI safety, 2024

Dec 29, 2024, 12:01 PM
189 points
34 comments41 min readLW link

On Not Pul­ling The Lad­der Up Be­hind You

ScrewtapeApr 26, 2024, 9:58 PM
188 points
21 comments9 min readLW link

Skills from a year of Pur­pose­ful Ra­tion­al­ity Practice

RaemonSep 18, 2024, 2:05 AM
187 points
18 comments7 min readLW link

In­for­ma­tion vs Assurance

johnswentworthOct 20, 2024, 11:16 PM
187 points
17 comments2 min readLW link

Daniel Kah­ne­man has died

DanielFilanMar 27, 2024, 3:59 PM
186 points
11 comments1 min readLW link
(www.washingtonpost.com)

Con­tra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”

Ricki HeicklenFeb 22, 2024, 11:56 PM
186 points
5 comments4 min readLW link
(bayesshammai.substack.com)

This is already your sec­ond chance

MalmesburyJul 28, 2024, 5:13 PM
185 points
13 comments8 min readLW link

Why Would Belief-States Have A Frac­tal Struc­ture, And Why Would That Mat­ter For In­ter­pretabil­ity? An Explainer

Apr 18, 2024, 12:27 AM
185 points
21 comments7 min readLW link

Hum­ming is not a free $100 bill

ElizabethJun 6, 2024, 8:10 PM
185 points
6 comments3 min readLW link
(acesounderglass.com)

Strug­gling like a Shadowmoth

RaemonSep 24, 2024, 12:47 AM
183 points
38 comments7 min readLW link

In­tro­duc­ing Align­ment Stress-Test­ing at Anthropic

evhubJan 12, 2024, 11:51 PM
182 points
23 comments2 min readLW link

Con­tra pa­pers claiming su­per­hu­man AI forecasting

Sep 12, 2024, 6:10 PM
182 points
16 comments7 min readLW link

Every “Every Bay Area House Party” Bay Area House Party

Richard_NgoFeb 16, 2024, 6:53 PM
181 points
6 comments4 min readLW link

Safety con­sul­ta­tions for AI lab employees

Zach Stein-PerlmanJul 27, 2024, 3:00 PM
181 points
4 comments1 min readLW link

[Question] Why is o1 so de­cep­tive?

abramdemskiSep 27, 2024, 5:27 PM
180 points
24 comments3 min readLW link

My mo­ti­va­tion and the­ory of change for work­ing in AI healthtech

Andrew_CritchOct 12, 2024, 12:36 AM
178 points
37 comments14 min readLW link

Toward a Broader Con­cep­tion of Ad­verse Selection

Ricki HeicklenMar 14, 2024, 10:40 PM
177 points
61 comments13 min readLW link
(bayesshammai.substack.com)

FHI (Fu­ture of Hu­man­ity In­sti­tute) has shut down (2005–2024)

gwernApr 17, 2024, 1:54 PM
176 points
22 comments1 min readLW link
(www.futureofhumanityinstitute.org)

WTH is Cere­brolysin, ac­tu­ally?

Aug 6, 2024, 8:40 PM
175 points
23 comments17 min readLW link

When Is In­surance Worth It?

kqrDec 19, 2024, 7:07 PM
173 points
71 comments4 min readLW link
(entropicthoughts.com)

Ti­maeus’s First Four Months

Feb 28, 2024, 5:01 PM
173 points
6 comments6 min readLW link

Three Sub­tle Ex­am­ples of Data Leakage

abstractapplicOct 1, 2024, 8:45 PM
172 points
16 comments4 min readLW link

Did Christo­pher Hitchens change his mind about wa­ter­board­ing?

Isaac KingSep 15, 2024, 8:28 AM
171 points
22 comments7 min readLW link

‘Em­piri­cism!’ as Anti-Epistemology

Eliezer YudkowskyMar 14, 2024, 2:02 AM
171 points
92 comments25 min readLW link

Re­con­sider the anti-cav­ity bac­te­ria if you are Asian

Lao MeinApr 15, 2024, 7:02 AM
170 points
43 comments4 min readLW link

o1: A Tech­ni­cal Primer

Jesse HooglandDec 9, 2024, 7:09 PM
170 points
19 comments9 min readLW link
(www.youtube.com)

And All the Shog­goths Merely Players

Zack_M_DavisFeb 10, 2024, 7:56 PM
170 points
57 comments12 min readLW link

Over­com­ing Bias Anthology

Arjun PanicksseryOct 20, 2024, 2:01 AM
169 points
14 comments2 min readLW link
(overcoming-bias-anthology.com)

Recom­men­da­tion: re­ports on the search for miss­ing hiker Bill Ewasko

eukaryoteJul 31, 2024, 10:15 PM
169 points
28 comments14 min readLW link
(eukaryotewritesblog.com)

Masterpiece

Richard_NgoFeb 13, 2024, 11:10 PM
166 points
21 comments4 min readLW link
(www.narrativeark.xyz)

Gra­di­ent Rout­ing: Mask­ing Gra­di­ents to Lo­cal­ize Com­pu­ta­tion in Neu­ral Networks

6 Dec 2024 22:19 UTC
165 points
12 comments11 min readLW link
(arxiv.org)

You can re­move GPT2’s Lay­erNorm by fine-tun­ing for an hour

StefanHex8 Aug 2024 18:33 UTC
165 points
11 comments8 min readLW link

Boy­cott OpenAI

PeterMcCluskey18 Jun 2024 19:52 UTC
164 points
26 comments1 min readLW link
(bayesianinvestor.com)

The Sum­moned Heroine’s Pre­dic­tion Mar­kets Keep Pro­vid­ing Fi­nan­cial Ser­vices To The De­mon King!

abstractapplic26 Oct 2024 12:34 UTC
164 points
16 comments7 min readLW link

Tips for Em­piri­cal Align­ment Research

Ethan Perez29 Feb 2024 6:04 UTC
163 points
4 comments23 min readLW link

Con­nect­ing the Dots: LLMs can In­fer & Ver­bal­ize La­tent Struc­ture from Train­ing Data

21 Jun 2024 15:54 UTC
163 points
13 comments8 min readLW link
(arxiv.org)

An­nounc­ing ILIAD — The­o­ret­i­cal AI Align­ment Conference

5 Jun 2024 9:37 UTC
163 points
18 comments2 min readLW link

Many ar­gu­ments for AI x-risk are wrong

TurnTrout5 Mar 2024 2:31 UTC
162 points
87 comments12 min readLW link

The Me­dian Re­searcher Problem

johnswentworth2 Nov 2024 20:16 UTC
161 points
70 comments1 min readLW link

o1 is a bad idea

abramdemski11 Nov 2024 21:20 UTC
161 points
39 comments2 min readLW link

Without fun­da­men­tal ad­vances, mis­al­ign­ment and catas­tro­phe are the de­fault out­comes of train­ing pow­er­ful AI

26 Jan 2024 7:22 UTC
161 points
60 comments57 min readLW link

Sy­co­phancy to sub­ter­fuge: In­ves­ti­gat­ing re­ward tam­per­ing in large lan­guage models

17 Jun 2024 18:41 UTC
161 points
22 comments8 min readLW link
(arxiv.org)

Mak­ing ev­ery re­searcher seek grants is a bro­ken model

jasoncrawford26 Jan 2024 16:06 UTC
159 points
41 comments4 min readLW link
(rootsofprogress.org)

Deep­Mind’s “​​Fron­tier Safety Frame­work” is weak and unambitious

Zach Stein-Perlman18 May 2024 3:00 UTC
159 points
14 comments4 min readLW link