Grad­ual Disem­pow­er­ment: Sys­temic Ex­is­ten­tial Risks from In­cre­men­tal AI Development

Jan 30, 2025, 5:03 PM
162 points
57 comments2 min readLW link
(gradual-disempowerment.ai)

Max­i­miz­ing Com­mu­ni­ca­tion, not Traffic

jefftkJan 5, 2025, 1:00 PM
161 points
10 comments1 min readLW link
(www.jefftk.com)

I make sev­eral mil­lion dol­lars per year and have hun­dreds of thou­sands of fol­low­ers—what is the straight­est line path to uti­liz­ing these re­sources to re­duce ex­is­ten­tial-level AI threats?

shrimpyMar 16, 2025, 4:52 PM
157 points
25 comments1 min readLW link

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

Mar 13, 2025, 7:09 PM
155 points
40 comments6 min readLW link

[Question] Have LLMs Gen­er­ated Novel In­sights?

Feb 23, 2025, 6:22 PM
155 points
36 comments2 min readLW link

Self-fulfilling mis­al­ign­ment data might be poi­son­ing our AI models

TurnTroutMar 2, 2025, 7:51 PM
154 points
27 comments1 min readLW link
(turntrout.com)

It’s been ten years. I pro­pose HPMOR An­niver­sary Par­ties.

ScrewtapeFeb 16, 2025, 1:43 AM
153 points
3 comments1 min readLW link

Statis­ti­cal Challenges with Mak­ing Su­per IQ babies

Jan Christian RefsgaardMar 2, 2025, 8:26 PM
153 points
26 comments9 min readLW link

Don’t ig­nore bad vibes you get from people

Kaj_SotalaJan 18, 2025, 9:20 AM
150 points
50 comments2 min readLW link
(kajsotala.fi)

Con­cep­tual Round­ing Errors

Jan_KulveitMar 26, 2025, 7:00 PM
149 points
15 comments3 min readLW link
(boundedlyrational.substack.com)

Meth­ods for strong hu­man germline en­g­ineer­ing

TsviBTMar 3, 2025, 8:13 AM
149 points
28 comments108 min readLW link

Quotes from the Star­gate press conference

Nikola JurkovicJan 22, 2025, 12:50 AM
149 points
7 comments1 min readLW link
(www.c-span.org)

OpenAI #10: Reflections

ZviJan 7, 2025, 5:00 PM
149 points
7 comments11 min readLW link
(thezvi.wordpress.com)

A com­pu­ta­tional no-co­in­ci­dence principle

Eric NeymanFeb 14, 2025, 9:39 PM
148 points
38 comments6 min readLW link
(www.alignment.org)

Levels of Friction

ZviFeb 10, 2025, 1:10 PM
148 points
8 comments12 min readLW link
(thezvi.wordpress.com)

Cap­i­tal Own­er­ship Will Not Prevent Hu­man Disempowerment

berenJan 5, 2025, 6:00 AM
148 points
18 comments14 min readLW link

The Sorry State of AI X-Risk Ad­vo­cacy, and Thoughts on Do­ing Better

Thane RuthenisFeb 21, 2025, 8:15 PM
148 points
51 comments6 min readLW link

Ac­ti­va­tion space in­ter­pretabil­ity may be doomed

Jan 8, 2025, 12:49 PM
147 points
32 comments8 min readLW link

Fron­tier AI Models Still Fail at Ba­sic Phys­i­cal Tasks: A Man­u­fac­tur­ing Case Study

Adam KarvonenApr 14, 2025, 5:38 PM
147 points
42 comments7 min readLW link
(adamkarvonen.github.io)

Align­ment Fak­ing Re­vis­ited: Im­proved Clas­sifiers and Open Source Extensions

Apr 8, 2025, 5:32 PM
145 points
20 comments12 min readLW link

AI com­pa­nies are un­likely to make high-as­surance safety cases if timelines are short

ryan_greenblattJan 23, 2025, 6:41 PM
145 points
5 comments13 min readLW link

Ap­ply­ing tra­di­tional eco­nomic think­ing to AGI: a trilemma

Steven ByrnesJan 13, 2025, 1:23 AM
144 points
32 comments3 min readLW link

The Most For­bid­den Technique

ZviMar 12, 2025, 1:20 PM
143 points
9 comments17 min readLW link
(thezvi.wordpress.com)

OpenAI #12: Bat­tle of the Board Redux

ZviMar 31, 2025, 3:50 PM
141 points
1 comment9 min readLW link
(thezvi.wordpress.com)

Au­dit­ing lan­guage mod­els for hid­den objectives

Mar 13, 2025, 7:18 PM
141 points
15 comments13 min readLW link

The Hid­den Cost of Our Lies to AI

Nicholas AndresenMar 6, 2025, 5:03 AM
140 points
18 comments7 min readLW link
(substack.com)

Hu­man takeover might be worse than AI takeover

Tom DavidsonJan 10, 2025, 4:53 PM
140 points
55 comments8 min readLW link

Plan­ning for Ex­treme AI Risks

joshcJan 29, 2025, 6:33 PM
139 points
5 comments16 min readLW link

What Indi­ca­tors Should We Watch to Disam­biguate AGI Timelines?

snewmanJan 6, 2025, 7:57 PM
139 points
57 comments13 min readLW link

Ten peo­ple on the inside

BuckJan 28, 2025, 4:41 PM
139 points
28 comments4 min readLW link

[Fic­tion] [Comic] Effec­tive Altru­ism and Ra­tion­al­ity meet at a Sec­u­lar Sols­tice afterparty

tandemJan 7, 2025, 7:11 PM
137 points
5 comments1 min readLW link

Train­ing AGI in Se­cret would be Un­safe and Unethical

Daniel KokotajloApr 18, 2025, 12:27 PM
137 points
15 comments6 min readLW link

The Mil­ton Fried­man Model of Policy Change

JohnofCharlestonMar 4, 2025, 12:38 AM
136 points
17 comments4 min readLW link

The Failed Strat­egy of Ar­tifi­cial In­tel­li­gence Doomers

Ben PaceJan 31, 2025, 6:56 PM
136 points
78 comments5 min readLW link
(www.palladiummag.com)

Ano­ma­lous To­kens in Deep­Seek-V3 and r1

henryJan 25, 2025, 10:55 PM
136 points
2 comments7 min readLW link

[Question] How Much Are LLMs Ac­tu­ally Boost­ing Real-World Pro­gram­mer Pro­duc­tivity?

Thane RuthenisMar 4, 2025, 4:23 PM
136 points
51 comments3 min readLW link

Train­ing on Doc­u­ments About Re­ward Hack­ing In­duces Re­ward Hacking

Jan 21, 2025, 9:32 PM
131 points
15 comments2 min readLW link
(alignment.anthropic.com)

Build­ing AI Re­search Fleets

Jan 12, 2025, 6:23 PM
129 points
11 comments5 min readLW link

The Paris AI Anti-Safety Summit

ZviFeb 12, 2025, 2:00 PM
129 points
21 comments21 min readLW link
(thezvi.wordpress.com)

Tell me about your­self: LLMs are aware of their learned behaviors

Jan 22, 2025, 12:47 AM
129 points
5 comments6 min readLW link

Some ar­ti­cles in “In­ter­na­tional Se­cu­rity” that I enjoyed

BuckJan 31, 2025, 4:23 PM
129 points
10 comments4 min readLW link

Grad­ual Disem­pow­er­ment, Shell Games and Flinches

Jan_KulveitFeb 2, 2025, 2:47 PM
129 points
36 comments6 min readLW link

AI-en­abled coups: a small group could use AI to seize power

Apr 16, 2025, 4:51 PM
128 points
18 comments7 min readLW link

Ori­ent­ing Toward Wizard Power

johnswentworthMay 8, 2025, 5:23 AM
128 points
12 comments5 min readLW link

Park­in­son’s Law and the Ide­ol­ogy of Statistics

BenquoJan 4, 2025, 3:49 PM
127 points
7 comments8 min readLW link
(benjaminrosshoffman.com)

The Pando Prob­lem: Re­think­ing AI Individuality

Jan_KulveitMar 28, 2025, 9:03 PM
127 points
14 comments13 min readLW link

An­thropic, and tak­ing “tech­ni­cal philos­o­phy” more seriously

RaemonMar 13, 2025, 1:48 AM
125 points
29 comments11 min readLW link

The In­tel­li­gence Curse

lukedragoJan 3, 2025, 7:07 PM
124 points
26 comments18 min readLW link
(lukedrago.substack.com)

[Question] when will LLMs be­come hu­man-level blog­gers?

nostalgebraistMar 9, 2025, 9:10 PM
124 points
34 comments6 min readLW link

How I’ve run ma­jor projects

benkuhnMar 16, 2025, 6:40 PM
123 points
10 comments8 min readLW link
(www.benkuhn.net)