Truth­seek­ing is the ground in which other prin­ci­ples grow

ElizabethMay 27, 2024, 1:09 AM
248 points
16 comments16 min readLW link

Prin­ci­ples for the AGI Race

William_SAug 30, 2024, 2:29 PM
248 points
17 comments18 min readLW link

My Clients, The Liars

ymeskhoutMar 5, 2024, 9:06 PM
247 points
86 comments7 min readLW link

Ilya Sutskever and Jan Leike re­sign from OpenAI [up­dated]

Zach Stein-PerlmanMay 15, 2024, 12:45 AM
246 points
95 comments2 min readLW link

Re­fusal in LLMs is me­di­ated by a sin­gle direction

Apr 27, 2024, 11:13 AM
246 points
95 comments10 min readLW link

AI com­pa­nies aren’t re­ally us­ing ex­ter­nal evaluators

Zach Stein-PerlmanMay 24, 2024, 4:01 PM
242 points
15 comments4 min readLW link

Believ­ing In

AnnaSalamonFeb 8, 2024, 7:06 AM
239 points
51 comments13 min readLW link

Ex­plore More: A Bag of Tricks to Keep Your Life on the Rails

Shoshannah TekofskySep 28, 2024, 9:38 PM
235 points
19 comments11 min readLW link
(shoshanigans.substack.com)

“How could I have thought that faster?”

mesaoptimizerMar 11, 2024, 10:56 AM
234 points
32 comments2 min readLW link
(twitter.com)

You are not too “ir­ra­tional” to know your prefer­ences.

DaystarEldNov 26, 2024, 3:01 PM
231 points
50 comments13 min readLW link

The ‘strong’ fea­ture hy­poth­e­sis could be wrong

lewis smithAug 2, 2024, 2:33 PM
231 points
19 comments17 min readLW link

SAE fea­ture ge­om­e­try is out­side the su­per­po­si­tion hypothesis

jake_mendelJun 24, 2024, 4:07 PM
228 points
17 comments11 min readLW link

In­tro­duc­ing AI Lab Watch

Zach Stein-PerlmanApr 30, 2024, 5:00 PM
224 points
30 comments1 min readLW link
(ailabwatch.org)

MIRI 2024 Mis­sion and Strat­egy Update

MaloJan 5, 2024, 12:20 AM
223 points
44 comments8 min readLW link

AGI Safety and Align­ment at Google Deep­Mind: A Sum­mary of Re­cent Work

Aug 20, 2024, 4:22 PM
222 points
33 comments9 min readLW link

The Hopium Wars: the AGI En­tente Delusion

Max TegmarkOct 13, 2024, 5:00 PM
221 points
60 comments9 min readLW link

Modern Trans­form­ers are AGI, and Hu­man-Level

abramdemskiMar 26, 2024, 5:46 PM
219 points
87 comments5 min readLW link

Ayn Rand’s model of “liv­ing money”; and an up­side of burnout

AnnaSalamonNov 16, 2024, 2:59 AM
218 points
58 comments5 min readLW link

LLM Gen­er­al­ity is a Timeline Crux

eggsyntaxJun 24, 2024, 12:52 PM
217 points
119 comments7 min readLW link

CFAR Take­aways: An­drew Critch

RaemonFeb 14, 2024, 1:37 AM
217 points
64 comments5 min readLW link

“Slow” take­off is a ter­rible term for “maybe even faster take­off, ac­tu­ally”

RaemonSep 28, 2024, 11:38 PM
217 points
69 comments1 min readLW link

A Three-Layer Model of LLM Psychology

Jan_KulveitDec 26, 2024, 4:49 PM
216 points
13 comments8 min readLW link

Self-Other Over­lap: A Ne­glected Ap­proach to AI Alignment

Jul 30, 2024, 4:22 PM
215 points
49 comments12 min readLW link

Su­perba­bies: Put­ting The Pie­ces Together

sarahconstantinJul 11, 2024, 8:40 PM
215 points
37 comments10 min readLW link
(sarahconstantin.substack.com)

Un­der­stand­ing Shap­ley Values with Venn Diagrams

Carson LDec 6, 2024, 9:56 PM
214 points
34 commentsLW link
(medium.com)

Op­ti­mistic As­sump­tions, Longterm Plan­ning, and “Cope”

RaemonJul 17, 2024, 10:14 PM
214 points
46 comments7 min readLW link

ChatGPT can learn in­di­rect control

Raymond DMar 21, 2024, 9:11 PM
213 points
27 comments1 min readLW link

Pay Risk Eval­u­a­tors in Cash, Not Equity

Adam SchollSep 7, 2024, 2:37 AM
212 points
19 comments1 min readLW link

Towards more co­op­er­a­tive AI safety strategies

Richard_NgoJul 16, 2024, 4:36 AM
210 points
133 comments4 min readLW link

Mak­ing a con­ser­va­tive case for alignment

Nov 15, 2024, 6:55 PM
208 points
67 comments7 min readLW link

Mechanis­ti­cally Elic­it­ing La­tent Be­hav­iors in Lan­guage Models

Apr 30, 2024, 6:51 PM
208 points
43 comments45 min readLW link

Brute Force Man­u­fac­tured Con­sen­sus is Hid­ing the Crime of the Century

RokoFeb 3, 2024, 8:36 PM
207 points
156 comments9 min readLW link

Why I’m not a Bayesian

Richard_NgoOct 6, 2024, 3:22 PM
207 points
101 comments10 min readLW link
(www.mindthefuture.info)

What TMS is like

SableOct 31, 2024, 12:44 AM
206 points
23 comments6 min readLW link
(affablyevil.substack.com)

Funny Anec­dote of Eliezer From His Sister

Noah BirnbaumApr 22, 2024, 10:05 PM
206 points
6 comments2 min readLW link

The Sun is big, but su­per­in­tel­li­gences will not spare Earth a lit­tle sunlight

Eliezer YudkowskySep 23, 2024, 3:39 AM
205 points
142 comments13 min readLW link

Toward A Math­e­mat­i­cal Frame­work for Com­pu­ta­tion in Superposition

Jan 18, 2024, 9:06 PM
204 points
18 comments63 min readLW link

OpenAI: Fallout

ZviMay 28, 2024, 1:20 PM
204 points
25 comments36 min readLW link
(thezvi.wordpress.com)

Fron­tier Models are Ca­pable of In-con­text Scheming

Dec 5, 2024, 10:11 PM
203 points
24 comments7 min readLW link

Jaan Tal­linn’s 2023 Philan­thropy Overview

jaanMay 20, 2024, 12:11 PM
203 points
5 comments1 min readLW link
(jaan.info)

Com­mu­ni­ca­tions in Hard Mode (My new job at MIRI)

tanagrabeastDec 13, 2024, 8:13 PM
202 points
25 comments5 min readLW link

Maybe An­thropic’s Long-Term Benefit Trust is powerless

Zach Stein-PerlmanMay 27, 2024, 1:00 PM
201 points
21 comments2 min readLW link

Cry­on­ics is free

Mati_RoySep 29, 2024, 5:58 PM
198 points
43 comments2 min readLW link

How I Learned To Stop Trust­ing Pre­dic­tion Mar­kets and Love the Arbitrage

orthonormalAug 6, 2024, 2:32 AM
198 points
30 comments3 min readLW link

Sam Alt­man’s Chip Am­bi­tions Un­der­cut OpenAI’s Safety Strategy

garrisonFeb 10, 2024, 7:52 PM
198 points
52 commentsLW link
(garrisonlovely.substack.com)

The im­pos­si­ble prob­lem of due process

mingyuanJan 16, 2024, 5:18 AM
197 points
64 comments14 min readLW link

This might be the last AI Safety Camp

Jan 24, 2024, 9:33 AM
196 points
34 comments1 min readLW link

[Question] Ex­am­ples of Highly Coun­ter­fac­tual Dis­cov­er­ies?

johnswentworthApr 23, 2024, 10:19 PM
194 points
102 comments1 min readLW link

The Com­pendium, A full ar­gu­ment about ex­tinc­tion risk from AGI

Oct 31, 2024, 12:01 PM
194 points
52 comments2 min readLW link
(www.thecompendium.ai)

Re­sponse to Aschen­bren­ner’s “Si­tu­a­tional Aware­ness”

Rob BensingerJun 6, 2024, 10:57 PM
194 points
27 comments3 min readLW link