In­stru­men­tal Goals Are A Differ­ent And Friendlier Kind Of Thing Than Ter­mi­nal Goals

Jan 24, 2025, 8:20 PM
180 points
61 comments5 min readLW link

Slow­down After 2028: Com­pute, RLVR Uncer­tainty, MoE Data Wall

Vladimir_NesovMay 1, 2025, 1:54 PM
172 points
22 comments5 min readLW link

So how well is Claude play­ing Poké­mon?

Julian BradshawMar 7, 2025, 5:54 AM
171 points
74 comments5 min readLW link

How will we up­date about schem­ing?

ryan_greenblattJan 6, 2025, 8:21 PM
171 points
20 comments37 min readLW link

Sur­pris­ing LLM rea­son­ing failures make me think we still need qual­i­ta­tive break­throughs for AGI

Kaj_SotalaApr 15, 2025, 3:56 PM
168 points
50 comments18 min readLW link

On the Ra­tion­al­ity of Deter­ring ASI

Dan HMar 5, 2025, 4:11 PM
166 points
34 comments4 min readLW link
(nationalsecurity.ai)

Short Timelines Don’t De­value Long Hori­zon Research

Vladimir_NesovApr 9, 2025, 12:42 AM
166 points
24 comments1 min readLW link

Grad­ual Disem­pow­er­ment: Sys­temic Ex­is­ten­tial Risks from In­cre­men­tal AI Development

Jan 30, 2025, 5:03 PM
162 points
58 comments2 min readLW link
(gradual-disempowerment.ai)

Max­i­miz­ing Com­mu­ni­ca­tion, not Traffic

jefftkJan 5, 2025, 1:00 PM
161 points
10 comments1 min readLW link
(www.jefftk.com)

[Question] Have LLMs Gen­er­ated Novel In­sights?

Feb 23, 2025, 6:22 PM
158 points
38 comments2 min readLW link

I make sev­eral mil­lion dol­lars per year and have hun­dreds of thou­sands of fol­low­ers—what is the straight­est line path to uti­liz­ing these re­sources to re­duce ex­is­ten­tial-level AI threats?

shrimpyMar 16, 2025, 4:52 PM
157 points
25 comments1 min readLW link

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

Mar 13, 2025, 7:09 PM
155 points
40 comments6 min readLW link

Fron­tier AI Models Still Fail at Ba­sic Phys­i­cal Tasks: A Man­u­fac­tur­ing Case Study

Adam KarvonenApr 14, 2025, 5:38 PM
154 points
42 comments7 min readLW link
(adamkarvonen.github.io)

Self-fulfilling mis­al­ign­ment data might be poi­son­ing our AI models

TurnTroutMar 2, 2025, 7:51 PM
154 points
27 comments1 min readLW link
(turntrout.com)

Statis­ti­cal Challenges with Mak­ing Su­per IQ babies

Jan Christian RefsgaardMar 2, 2025, 8:26 PM
154 points
26 comments9 min readLW link

It’s been ten years. I pro­pose HPMOR An­niver­sary Par­ties.

ScrewtapeFeb 16, 2025, 1:43 AM
153 points
3 comments1 min readLW link

Don’t ig­nore bad vibes you get from people

Kaj_SotalaJan 18, 2025, 9:20 AM
152 points
50 comments2 min readLW link
(kajsotala.fi)

OpenAI #10: Reflections

ZviJan 7, 2025, 5:00 PM
149 points
7 comments11 min readLW link
(thezvi.wordpress.com)

Con­cep­tual Round­ing Errors

Jan_KulveitMar 26, 2025, 7:00 PM
149 points
15 comments3 min readLW link
(boundedlyrational.substack.com)

Cap­i­tal Own­er­ship Will Not Prevent Hu­man Disempowerment

berenJan 5, 2025, 6:00 AM
149 points
18 comments14 min readLW link

Quotes from the Star­gate press conference

Nikola JurkovicJan 22, 2025, 12:50 AM
149 points
7 comments1 min readLW link
(www.c-span.org)

Meth­ods for strong hu­man germline en­g­ineer­ing

TsviBTMar 3, 2025, 8:13 AM
149 points
28 comments108 min readLW link

A com­pu­ta­tional no-co­in­ci­dence principle

Eric NeymanFeb 14, 2025, 9:39 PM
148 points
38 comments6 min readLW link
(www.alignment.org)

Levels of Friction

ZviFeb 10, 2025, 1:10 PM
148 points
8 comments12 min readLW link
(thezvi.wordpress.com)

Win­ning the power to lose

KatjaGraceMay 20, 2025, 6:40 AM
148 points
37 comments2 min readLW link
(worldspiritsockpuppet.com)

Ac­ti­va­tion space in­ter­pretabil­ity may be doomed

Jan 8, 2025, 12:49 PM
148 points
33 comments8 min readLW link

The Sorry State of AI X-Risk Ad­vo­cacy, and Thoughts on Do­ing Better

Thane RuthenisFeb 21, 2025, 8:15 PM
148 points
51 comments6 min readLW link

Align­ment Fak­ing Re­vis­ited: Im­proved Clas­sifiers and Open Source Extensions

Apr 8, 2025, 5:32 PM
146 points
20 comments12 min readLW link

AI com­pa­nies are un­likely to make high-as­surance safety cases if timelines are short

ryan_greenblattJan 23, 2025, 6:41 PM
145 points
5 comments13 min readLW link

Ap­ply­ing tra­di­tional eco­nomic think­ing to AGI: a trilemma

Steven ByrnesJan 13, 2025, 1:23 AM
144 points
32 comments3 min readLW link

The Most For­bid­den Technique

ZviMar 12, 2025, 1:20 PM
143 points
9 comments17 min readLW link
(thezvi.wordpress.com)

The Hid­den Cost of Our Lies to AI

Nicholas AndresenMar 6, 2025, 5:03 AM
142 points
18 comments7 min readLW link
(substack.com)

Au­dit­ing lan­guage mod­els for hid­den objectives

Mar 13, 2025, 7:18 PM
141 points
15 comments13 min readLW link

Hu­man takeover might be worse than AI takeover

Tom DavidsonJan 10, 2025, 4:53 PM
141 points
55 comments8 min readLW link

OpenAI #12: Bat­tle of the Board Redux

ZviMar 31, 2025, 3:50 PM
141 points
1 comment9 min readLW link
(thezvi.wordpress.com)

Ten peo­ple on the inside

BuckJan 28, 2025, 4:41 PM
139 points
28 comments4 min readLW link

Train­ing AGI in Se­cret would be Un­safe and Unethical

Daniel KokotajloApr 18, 2025, 12:27 PM
139 points
15 comments6 min readLW link

What Indi­ca­tors Should We Watch to Disam­biguate AGI Timelines?

snewmanJan 6, 2025, 7:57 PM
139 points
57 comments13 min readLW link

Plan­ning for Ex­treme AI Risks

joshcJan 29, 2025, 6:33 PM
139 points
5 comments16 min readLW link

[Question] How Much Are LLMs Ac­tu­ally Boost­ing Real-World Pro­gram­mer Pro­duc­tivity?

Thane RuthenisMar 4, 2025, 4:23 PM
137 points
51 comments3 min readLW link

[Fic­tion] [Comic] Effec­tive Altru­ism and Ra­tion­al­ity meet at a Sec­u­lar Sols­tice afterparty

tandemJan 7, 2025, 7:11 PM
137 points
5 comments1 min readLW link

The Failed Strat­egy of Ar­tifi­cial In­tel­li­gence Doomers

Ben PaceJan 31, 2025, 6:56 PM
136 points
78 comments5 min readLW link
(www.palladiummag.com)

Ano­ma­lous To­kens in Deep­Seek-V3 and r1

henryJan 25, 2025, 10:55 PM
136 points
3 comments7 min readLW link

The Mil­ton Fried­man Model of Policy Change

JohnofCharlestonMar 4, 2025, 12:38 AM
136 points
17 comments4 min readLW link

Train­ing on Doc­u­ments About Re­ward Hack­ing In­duces Re­ward Hacking

Jan 21, 2025, 9:32 PM
131 points
15 comments2 min readLW link
(alignment.anthropic.com)

AI Doomerism in 1879

David GrossMay 13, 2025, 2:48 AM
131 points
36 comments8 min readLW link

It’s Okay to Feel Bad for a Bit

moridinamaelMay 10, 2025, 11:24 PM
131 points
26 comments3 min readLW link

Tell me about your­self: LLMs are aware of their learned behaviors

Jan 22, 2025, 12:47 AM
130 points
5 comments6 min readLW link

Build­ing AI Re­search Fleets

Jan 12, 2025, 6:23 PM
130 points
11 comments5 min readLW link

Con­sider not donat­ing un­der $100 to poli­ti­cal candidates

DanielFilanMay 11, 2025, 3:20 AM
130 points
31 comments1 min readLW link
(danielfilan.com)